CN114115304A

CN114115304A - Aircraft four-dimensional climbing track planning method and system

Info

Publication number: CN114115304A
Application number: CN202111248553.5A
Authority: CN
Inventors: 张洪海; 周锦伦; 万俊强; 吕文颖; 钟罡
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2022-03-01

Abstract

The invention discloses an aircraft four-dimensional climbing track planning method and system, which are used for obtaining an optional action space in a flight state according to the performance of an aircraft, solving the optimal action in the climbing of the aircraft by adopting a reinforced learning algorithm of a single-step time sequence difference updating evaluation strategy, fitting a climbing track obtained based on the optimal action to obtain an aircraft four-dimensional climbing track, and realizing the high-efficiency and refined aircraft four-dimensional climbing track planning.

Description

Aircraft four-dimensional climbing track planning method and system

Technical Field

The invention relates to a method and a system for planning a four-dimensional climbing flight path of an aircraft, and belongs to the field of flight path planning.

Background

Since the introduction of the concept of Trajectory Based Operation (TBO), four-dimensional Trajectory planning has been rapidly developed as one of its key technologies. The traditional path planning methods such as an A-star algorithm, an ant colony algorithm, a genetic algorithm and the like have complex processing flows on the aircraft time sequence difference, so that the path planning methods are difficult to take advantages in the four-dimensional climbing track planning, and therefore a new aircraft four-dimensional climbing track planning method is urgently needed.

Disclosure of Invention

The invention provides a method and a system for planning a four-dimensional climbing track of an aircraft, which solve the problems disclosed in the background technology.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

an aircraft four-dimensional climbing track planning method comprises the following steps:

acquiring optional actions in various flight states when an aircraft climbs in a simulated flight experiment;

according to the optional actions in the flight state, aiming at the minimum oil consumption and the minimum running time of the cruise waypoints, adopting a reinforced learning algorithm of a single-step time sequence difference updating evaluation strategy to solve the optimal actions of the aircraft in climbing;

and fitting the climbing track obtained based on the optimal action to obtain the four-dimensional climbing track of the aircraft.

The flight conditions include horizontal position, altitude position, velocity, and pitch angle, and the selectable actions include acceleration and pitch angle velocity.

Interactive models in the reinforcement learning algorithm comprise an environment model and an intelligent agent model;

an environment model: calculating the oil consumption and the position change of the aircraft in unit time of operation in the current flight state according to the parameters of the environment where the aircraft is located; calculating an awarding value given by the environment according to the oil consumption and the position change, and taking the awarding value as a parameter for evaluating the current flight state of the aircraft;

the intelligent agent model comprises: an optimal action is selected from the selectable actions.

The reward value calculation formula is as follows:

wherein, r(s)_t,a_t) For aircraft in time sequence t, flight state s_tLower selection action a_tThe reward value given by the environment;

r_t ¹obtaining a positive reward value for the aircraft approaching a cruise waypoint;

r_t ²obtaining a negative reward value for the aircraft after t time sequence;

r_t ³acting on aircraft a_tFuel is consumed and a negative prize value is achieved.

Fitting the climbing track obtained based on the optimal action to obtain the four-dimensional climbing track of the aircraft, wherein the four-dimensional climbing track comprises the following steps:

determining the conversion height and the flat flying distance of the aircraft when the stairs climb according to the performance of the aircraft;

and fitting the climbing track obtained based on the optimal action according to the converted height and the plane flight distance to obtain the four-dimensional climbing track of the aircraft.

An aircraft four-dimensional climb trajectory planning system comprising:

an acquisition module: acquiring optional actions in various flight states when an aircraft climbs in a simulated flight experiment;

a reinforcement learning module: according to the optional actions in the flight state, aiming at the minimum oil consumption and the minimum running time of the cruise waypoints, adopting a reinforced learning algorithm of a single-step time sequence difference updating evaluation strategy to solve the optimal actions of the aircraft in climbing;

a track fitting module: and fitting the climbing track obtained based on the optimal action to obtain the four-dimensional climbing track of the aircraft.

The reward value calculation formula is as follows:

r(s_t,a_t)＝r_t ¹+r_t ²+r_t ³

r_t ²obtaining a negative reward value for the aircraft after t time sequence;

The track fitting module comprises:

a conversion height level flight distance determination module: determining the conversion height and the flat flying distance of the aircraft when the stairs climb according to the performance of the aircraft;

a fitting module: and fitting the climbing track obtained based on the optimal action according to the converted height and the plane flight distance to obtain the four-dimensional climbing track of the aircraft.

The invention achieves the following beneficial effects: according to the method, the optional action space in the flight state is obtained according to the performance of the aircraft, the reinforced learning algorithm of the single-step time sequence difference updating evaluation strategy is adopted, the optimal action of the aircraft in climbing is solved, the climbing track obtained based on the optimal action is fitted, the aircraft four-dimensional climbing track is obtained, and the high-efficiency and refined aircraft four-dimensional climbing track planning is realized.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of aircraft state-motion space one-hot code discretization;

FIG. 3 is a line graph of the average total round award values per 100 rounds of learning by the agent for 50 ten thousand learning sessions;

FIG. 4 is a line graph of the average total round award values per 1000 rounds of learning by the agent for 50 ten thousand learning sessions;

FIG. 5 is a schematic diagram of the effect of track fitting.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

As shown in fig. 1, a method for planning a four-dimensional climbing track of an aircraft includes the following steps:

step 1, acquiring optional actions in various flight states when an aircraft climbs in a simulated flight experiment; the simulated flight experiment is a simulated flight experiment developed based on the real performance of the aircraft;

step 2, according to the optional actions in the flight state, aiming at the minimum fuel consumption and the minimum running time of the cruise waypoints, adopting a single-step time sequence difference to update a reinforcement learning algorithm of an evaluation strategy, and solving the optimal actions of the aircraft during climbing;

and 3, fitting the climbing track obtained based on the optimal action to obtain the four-dimensional climbing track of the aircraft.

According to the method, the optional action space in the flight state is obtained according to the performance of the aircraft, the reinforced learning algorithm of the single-step time sequence difference updating evaluation strategy is adopted, the optimal action of the aircraft in climbing is solved, the climbing track obtained based on the optimal action is fitted, the aircraft four-dimensional climbing track is obtained, and the high-efficiency and refined aircraft four-dimensional climbing track planning is realized.

The flight state of the aircraft is generally obtained by a Flight Management System (FMS) and sent to a pilot dashboard and a controller secondary radar or ADS-B device, so it is not difficult to obtain the flight attitude in the flight path planning. In the simulated flight experiment, pilot operation can be simulated according to the real performance of the aircraft, and the corresponding flight state of the aircraft is obtained.

Aircraft position, speed, etc. are continuous variables, and to derive a limited number of features from a continuous space, one-hot coding (one-hot coding) may be used to process the continuous flight state of the aircraft into discrete flight states, i.e., s (x)_i-δ≤x_i≤x_i+δ)＝s_iAnd δ is the coding granularity size.

The flight state includes horizontal position, altitude position, velocity and pitch angle, and the selectable actions include acceleration and pitch angle velocity, which may be specifically denoted as s_i＝(x_i,y_i,v_i,θ_i)，a_i＝(acc_i,w_i)，s_i∈S′，a_iE.g. A wherein s_iFor i-time flight status of the aircraft, x_i、y_i、v_i、θ_iRespectively representing horizontal position, altitude position, velocity and pitch angle, a_iIs s is_iOne optional action next, acc_i、w_iRespectively representing acceleration and pitch rate, S' being the set of all flight states, a being the set of all optional actions (see fig. 2)

Due to the aerodynamic properties of fixed wing aircraft, the aircraft horizontal velocity does not have to be 0. In order to facilitate algorithm design and accelerate algorithm iterative convergence speed, the aircraft flight state space after discretization should meet the following conditions: the flight state of the aircraft after a unit time changes in a discrete space (dynamic) and still belongs to the discrete space (closed).

Should avoid the occurrence of s_i+1＝s_i＝V(s_i|a_i) Or

In the case of (1), i.e. requiring s_i+1＝V(s_i|a_i),s_i+1≠s_iAnd s_i+1＝V(a_i|s_i),s_i+1E S', is solved by setting a minimum speed, i.e. the coding granularity needs to be such that running the aircraft for one second at the minimum speed will also cause a change in the horizontal position status. Meanwhile, the decision should be based on the state flight state of the aircraft one-hot code; wherein V(s)_i|a_i) Meaning that the aircraft is in flight s_iTime selection action a_iTransition to the next time-sequence flight State s_i+1The state transition equation of (1).

A plurality of optional actions are generally generated in each flight state, the flight state and the optional actions are used as the input of a reinforcement learning algorithm of a single-step time sequence difference updating evaluation strategy, and the optimal actions in the flight state can be obtained through iterative calculation.

The reinforcement learning algorithm of the single-step time sequence differential updating evaluation strategy generally has two interaction models including an environment model and an intelligent agent model.

An environment model: according to parameters (height, atmospheric temperature and the like) of the environment where the aircraft is located, an aircraft performance model and an oil consumption calculation model are built, and the oil consumption and position change of the aircraft in unit time of operation in the current flight state are calculated; and calculating the reward value given by the environment according to the oil consumption and the position change, and taking the reward value as a parameter for evaluating the current flight state of the aircraft, namely evaluating the quality of the current flight state.

The set prize system and prize value calculation may be as follows:

1. reward and penalty system

The aircraft approaches the cruising waypoint and obtains a certain positive reward value r_t ¹(so that the decision is biased to fly to the destination); the aircraft obtains a certain negative reward value r after t time sequence_t ²(reduced flight time); aircraft acting a_tConsuming fuel oil and obtaining a certain negative reward value r_t ³(reduce fuel consumption).

2. Therefore, based on the above-mentioned reward system, the formula for calculating the reward value may be as follows:

r(s_t,a_t)＝r_t ¹+r_t ²+r_t ³

wherein, r(s)_t,a_t) For aircraft in time sequence t, flight state s_tLower selection action a_tThe environment then awards the prize value.

The intelligent agent model comprises: the intelligent agent is used as a decision maker for selecting the flight action of the aircraft, and the optimal action is selected from the selectable actions and executed.

Based on the performance of the aircraft, optional actions of the aircraft in different flight states can be acquired, specifically as follows:

1) determining a specific aircraft model and a speed interval corresponding to the flight height based on an aircraft performance database:

v_min≤v_si+acc_ai≤v_max

v_min＝C_Vmin×V_stall

v_max＝V_MO/M_MO

wherein v is_siFor the current i time sequence aircraft speed, acc_aiRepresenting the acceleration decisions available in the current aircraft flight state, v_max、v_minRespectively representing maximum and minimum steerable speeds, C, under aeronautical performance constraints_Vmin、V_MO、M_MOAll are corresponding performance super-parameters of a specific machine type in a BADA model, wherein V_MORepresents the maximum steerable corrected airspeed, V_stallFor aircraft stall speed, M_MORepresenting the maximum manageable mach number.

2) Determining the acceleration interval corresponding to the specific aircraft model and the flight height based on the aircraft performance database:

0≤acc_ai≤a_max

Thr_maxclimb-D-mg＝ma_max

Thr_maxclimb＝(Thr_maxclimb)_ISA×(1-C_Tc,5×(ΔT-C_Tc,4))

C_D＝C_D0,CR+C_D2,CR×(C_L)²

wherein, a_maxFor maximum aircraft acceleration, Thr_maxclimbMaximum available thrust for the aircraft in climb mode, C_Tc,1、C_Tc,2、C_Tc,3、C_Tc,4、C_Tc,5All the parameters are hyper-parameters set in a corresponding aircraft BADA model, have no specific physical significance, are only parameters used for fitting, delta T is an International Standard Atmospheric (ISA) temperature correction value and is related to local actual temperature, H_pIs the aircraft barometric altitude, D is the drag experienced by the aircraft, ρ is the air density, v is the aircraft velocity (true airspeed), S is the aircraft wing area, C is the aircraft wing area_DIs the aircraft drag coefficient, C_LAs coefficient of lift of the aircraft, C_D0,CR、C_D2,CRFor a parameter in a specific aircraft BADA model, m is the aircraft mass, g₀Taking 9.8m/s as gravity acceleration². 3) Determining a pitch angle (climbing gradient) interval corresponding to a specific aircraft model and a flight height based on an aircraft performance database:

0≤θ_si+w_i≤θ_max

Thr_maxclimb＝(Thr_maxclimb)_ISA×(1-C_Tc,5×(ΔT-C_Tc,4))

wherein, theta_siFor the time-series climb gradient of the aircraft i, w_iPitch angle action available for aircraft i timing, ROC_maxMaximum rate of climb, θ, for aircraft_maxFor maximum climb gradient of the aircraft, M is the aircraft Mach number, T₀For ISA atmospheric temperature, 288.15K is taken, f { M } is a piecewise function related to the aircraft speed, delta T is a standard atmospheric temperature correction value, T is a theoretical temperature after the current aircraft barometric altitude is estimated according to the air temperature vertical decreasing rate, beta_T,<taking-0.0065K/m as the vertical decreasing rate of the air temperature, wherein R is 287.05m²/(K·s²)，κ＝1.4，

Is the derivative of the aircraft vertical velocity with respect to height, m is the aircraft mass, g₀Taking 9.8m/s as gravity acceleration²。

4) Determining a pitch angle speed interval corresponding to a specific aircraft model and a flight height based on an aircraft performance database:

w_min≤w_i≤w_max

wherein, w_maxFor the maximum head-lifting angular velocity (rad/s), w, of the aircraft_minThe maximum low head angular velocity of the aircraft.

Interaction between the two models: the intelligent body selects and executes the optimal action of the aircraft in the current flight state, the environment evaluates the action of the intelligent body according to the set reward function, then the optimal action selection strategy of the intelligent body is updated, the optimal action selection strategy of the aircraft is obtained through continuous interaction of the intelligent body and the environment, and then the optimal climbing flight path is obtained.

As can be seen from fig. 3 and 4, the reward value obtained by the agent through each simulated flight after continuous learning tends to be stable, and the feasibility of solving the four-dimensional flight path planning problem by the reinforcement learning algorithm of the single-step time sequence difference updating optimal strategy is also reflected.

The reinforcement learning algorithm may be embodied as follows:

s1) setting the initial flight state of the aircraft and the position of a target cruising route;

the initial flight state of the aircraft generally selects the height, speed and attitude of the aircraft when the aircraft leaves the runway 15ft away from the ground, and the target cruising waypoint is related to airspace division and flight programming.

S2), after the current aircraft flight state is obtained, generating or updating an available action space in the current aircraft flight state based on the aircraft performance;

s3) calculating the corresponding value (estimated value) of the state-action pair in the agent; wherein, the value is used for measuring the total reward value (global) in the future t time;

s4) making aircraft action selection based on the current flight state of the aircraft;

the intelligent agent decision (action selection) mode specifically comprises the following steps: the roulette wheel determines with a probability P (r ≧ ε) whether the agent takes the action corresponding to the highest expected return for the current state of the aircraft or takes another action.

S5) obtaining the actual state of the aircraft after the selected action is executed-the value (actual value) corresponding to the action.

S6) updating the corresponding value of the state-action pair in the agent according to the actual value:

determining a state s at time t_tE S', t time reward r_tE.g. R, time t action a_tAfter leaving the place A, a Markov decision process model is obtained preliminarily, and a function p is defined, namely S 'multiplied by R multiplied by S' multiplied by A → [0, 1-]State transition probability for markov decision process:

p(s'|s,a)＝Pr[s_t+1＝s',r_t+1＝r|s_t＝s,a_t＝a],s∈S′,a∈A,s'∈S′

then, the state transition probability:

state-action pair desired reward r (s, a):

strategy pi (a | s):

π(a|s)＝Pr(a_t＝a|s_t＝s)

expected return (predicted value) G_t：

Wherein γ ∈ [0,1 ]]For the time-series worth decay parameter, a reward of 1 unit representing the future τ step corresponds to the now-obtained γ^τPr represents a function of the probability of solution;

action cost function q of Bellman expectation equation_π(s | a) represents:

according to the compression mapping principle and Banach fixed-point theorem, there is q_*(s | a) compression operator

So only the following need to be iterated:

the iterative convergence point of the above formula is the fixed point of the Bellman optimal operator, namely the optimal strategy value;

real strategy value U after single step time sequence difference_t:t+1 ^(q)：

U_t:t+1 ^(q)＝r_t+1+γq(s_t+1,a_t+1)＝r_t+1+γⁿq(s_t+n,a_t+n)

Wherein r is_t+1Represents the expected value return of the time sequence of t +1, and q (s, a) represents the reward return brought by taking the action a under the state s of the aircraft;

so the iterative formula has (decrease [ U-q (s, a))]²)：

q(s_t,a_t)←q(s_t,a_t)+α[g_t-q(s_t,a_t)]

Wherein, alpha is the learning rate, g_tThe real return of environmental feedback when the aircraft makes a decision, including oil consumption cost and time cost, is close to the target reward.

S7) judging whether the aircraft reaches the target cruising route point, if so, finishing the step, otherwise, returning to S2);

s8) judging whether the aircraft reaches the set learning times, if so, ending the step, otherwise, returning to S1);

s9) determining the optimal action of the aircraft in each state according to the iterated optimal state-action-value table.

After the optimal action of each time sequence is obtained by adopting a reinforcement learning algorithm, assuming that the aircraft climbs, and determining the conversion height and the flight distance of the aircraft during the climbing of the steps according to the performance of the aircraft;

the transition altitude of the aircraft may be:

wherein, a₀Taking 340m/s and V for ISA atmospheric sound velocity_CASCorrecting airspeed for aircraft, H_transFor optimal transfer of altitude, η, for aircraft_transFor the atmospheric temperature ratio at optimum conversion altitude, delta_transTo the atmospheric pressure ratio at the optimum conversion altitude.

As shown in fig. 3, the climb trajectory obtained by the reinforcement learning has a problem that the aircraft operation changes too frequently, and therefore, the climb trajectory needs to be fitted according to the climb mode; wherein, the climbing mode adopts step climbing.

And determining the flat flight distance of the aircraft at the converted altitude according to the characteristics of the air route or the flight program and the control requirement.

According to the converted height and the average flying distance, fitting a climbing track obtained based on the optimal action by adopting a least square method to obtain a four-dimensional climbing track of the aircraft;

and (3) adopting a linear least square method to fit the flight path:

f(x_i)＝a₁x_i+a₂

wherein J is the sum of the mean square errors of the fitted track point and the original track point, f (-) represents a unary linear regression function, a₁、a₂As a parameter to be fitted, x_iIs the horizontal position of the track sample point, y_iIs the track sample point height position.

Based on the same technical scheme, the invention also discloses a software system of the method, and an aircraft four-dimensional climbing track planning system comprises:

an acquisition module: acquiring optional actions in various flight states when an aircraft climbs in a simulated flight experiment; wherein the flight state comprises a horizontal position, an altitude position, a speed and a pitch angle, and the selectable action comprises an acceleration and a pitch angle speed.

A reinforcement learning module: according to the optional actions in the flight state, aiming at the minimum oil consumption and the minimum running time of the cruise waypoint, the reinforced learning algorithm of the single-step time sequence difference updating evaluation strategy is adopted to solve the optimal actions of the aircraft in climbing.

The reward value calculation formula is as follows:

r(s_t,a_t)＝r_t ¹+r_t ²+r_t ³

r_t ²obtaining a negative reward value for the aircraft after t time sequence;

The track fitting module comprises:

Based on the same technical solution, the present invention also discloses a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to execute an aircraft four-dimensional climb trajectory planning method.

Based on the same technical solution, the present invention also discloses a computing device, which includes one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for executing an aircraft four-dimensional climb trajectory planning.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the claims of the present invention which are filed as the application.

Claims

1. A four-dimensional climbing track planning method for an aircraft is characterized by comprising the following steps: the method comprises the following steps:

2. The aircraft four-dimensional climb trajectory planning method according to claim 1, characterized in that: the flight conditions include horizontal position, altitude position, velocity, and pitch angle, and the selectable actions include acceleration and pitch angle velocity.

3. The aircraft four-dimensional climb trajectory planning method according to claim 1, characterized in that: interactive models in the reinforcement learning algorithm comprise an environment model and an intelligent agent model;

4. The aircraft four-dimensional climb trajectory planning method according to claim 3, characterized in that: the reward value calculation formula is as follows:

r(s_t,a_t)＝r_t ¹+r_t ²+r_t ³

r_t ²obtaining a negative reward value for the aircraft after t time sequence;

5. The aircraft four-dimensional climb trajectory planning method according to claim 1, characterized in that: fitting the climbing track obtained based on the optimal action to obtain the four-dimensional climbing track of the aircraft, wherein the four-dimensional climbing track comprises the following steps:

6. An aircraft four-dimensional climbing track planning system is characterized in that: the method comprises the following steps:

7. The aircraft four-dimensional climb trajectory planning system of claim 6, wherein: the flight conditions include horizontal position, altitude position, velocity, and pitch angle, and the selectable actions include acceleration and pitch angle velocity.

8. The aircraft four-dimensional climb trajectory planning system of claim 6, wherein: interactive models in the reinforcement learning algorithm comprise an environment model and an intelligent agent model;

9. The aircraft four-dimensional climb trajectory planning system of claim 8, wherein: the reward value calculation formula is as follows:

r(s_t,a_t)＝r_t ¹+r_t ²+r_t ³

r_t ²obtaining a negative reward value for the aircraft after t time sequence;

10. The aircraft four-dimensional climb trajectory planning system of claim 6, wherein: the track fitting module comprises: