CN116992761A

CN116992761A - Variable cycle engine maximum thrust control optimization method based on computer

Info

Publication number: CN116992761A
Application number: CN202310893202.2A
Authority: CN
Inventors: 缑蕊嘉; 李臻曜; 冯子懿
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-11-03

Abstract

A variable cycle engine maximum thrust control optimization method based on a computer relates to the technical field of variable cycle engine control. According to the characteristics of the variable cycle engine, a DEORL algorithm is designed, and the algorithm combines an Actor-Critic model and a DQN idea deep reinforcement learning algorithm, so that the algorithm performance is improved by using experience playback and a target network, and the balance of deep exploration and utilization of the environment is realized by using a exploratory expansion technology. The algorithm is completed based on a computer, so that the problem of optimizing control of the minimum oil consumption of the variable cycle engine is solved. The correctness, reliability and efficiency of the algorithm are verified and tested in a computer environment. The DEORL algorithm is used for optimizing the maximum thrust control, and the optimal control variable is output to the variable cycle engine. The thrust of the variable cycle engine can be improved to the maximum extent on the premise of ensuring the safe operation of the variable cycle engine, and the maneuverability of the aircraft is improved.

Description

Variable cycle engine maximum thrust control optimization method based on computer

Technical Field

The invention relates to the technical field of variable cycle engine control, in particular to a variable cycle engine maximum thrust control optimization method based on a computer.

Background

Advanced fighter needs to have the capability of long-range subsonic cruise and the capability of quick response during combat, and future aviation variable cycle engines are continuously developed in three directions of long-cruise mileage, high thrust-weight ratio and wide working range. By researching the speed characteristics of a conventional variable cycle engine, the turbojet engine in a supersonic speed state is found to have higher unit thrust and lower unit fuel consumption rate, and the large bypass ratio turbofan engine in a subsonic speed state has lower unit fuel consumption rate. Considering the performance requirement of a fighter plane propulsion system, the turbofan engine is more suitable for subsonic flight, and the turbojet engine is more suitable for supersonic flight. Thus, there is a better performing variable cycle engine. Under different working states of the variable cycle engine, by adopting different technical means of adjusting the geometric shape, the physical position or the size of the characteristic parts and the like, the performance advantages of the turbofan and the turbojet two different aviation variable cycle engines are integrated, so that the variable cycle engine is ensured to work in a similar configuration of the turbofan engine under the subsonic cruising state, higher economical efficiency is obtained, and in a similar configuration of the turbojet engine under the supersonic combat state, continuous and reliable high unit thrust is obtained, and the purpose of integrating the performance advantages of the turbofan and the turbojet engine is achieved, so that the variable cycle engine has excellent performance in the whole working process.

Variable cycle engines are the core equipment of an aircraft, whose performance directly affects the flight efficiency and safety of the aircraft. With the continuous development of the aviation industry, the requirements on variable cycle engines are also increasing. Currently, the development of variable cycle engines has covered many areas including mechanical design, materials science, thermodynamics, hydrodynamics, and the like. Therefore, the research on the enhanced variable cycle engine control system has important significance for improving the national aviation technology overall level.

The operation of variable cycle engines is complex and variable, and the design of control systems is more challenging because it is affected by a variety of control variables. The conventional control method often cannot meet the requirements, so that the optimization is required by using the emerging technologies such as deep reinforcement learning and the like. In this regard, performance optimizing control of variable cycle engines is an important task. By optimizing the performance of the variable cycle engine, the fuel consumption can be reduced, the thrust can be improved, the operational radius of the aircraft can be increased, and the like. The method is beneficial to improving the overall performance level of the variable cycle engine in China and mastering the world advanced variable cycle engine control technology, and can make an important contribution to national defense safety and civil aviation industry.

The air-raid right plays a vital role in modern warfare, and along with the high-speed development of technology, modern air-raid puts higher demands on fighters, and the demands are mainly reflected in the aspects of wider flight envelope, expansion of combat radius, improvement of maneuverability and flexibility, increase of thrust-weight ratio, reduction of fuel consumption, short-distance starting, improvement of reliability and operability and the like. The maximum thrust control mode of the variable cycle engine aims to improve the thrust of the variable cycle engine as much as possible and improve the maneuverability and flexibility of an airplane on the premise of ensuring the safe operation of the variable cycle engine.

The traditional intelligent optimization algorithm realizes the optimization of the control system by random search based on probability, but has the defects of low convergence speed, easy sinking into local optimum, easy premature and the like. The complex nonlinear control system characteristics of the variable cycle engine and the various control coupling parameters further amplify the disadvantages of the intelligent optimization algorithm. The maximum thrust optimization control of the variable cycle engine is required to realize multi-variable optimization control under various limiting conditions, and the number of corresponding local optimal points is increased sharply, so that the optimal control of the maximum thrust is required to have excellent global optimizing capability and quick optimizing searching capability. Although a certain result is achieved in the research of the optimal control of the accelerating process of the variable cycle engine at home and abroad, a plurality of unresolved technical problems or improvements are still existed.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a variable cycle engine maximum thrust control optimization method based on a computer. The deep exploration optimization reinforcement learning (Deep Exploration and Optimization Reinforcement Learning, DEORL) algorithm is applied to a maximum thrust optimizing control mode of the variable cycle engine, a nonlinear mathematical model of the variable cycle engine is firstly established, and then the maximum thrust optimizing control of the variable cycle engine is carried out by the DEORL, so that the thrust of the variable cycle engine is improved as much as possible on the premise of ensuring the safe operation of the variable cycle engine, and the maneuverability and the flexibility of an airplane are improved.

The invention comprises the following steps:

1) Establishing a nonlinear mathematical model of the variable cycle engine;

2) Determining an objective function and a constraint function of a maximum thrust control mode;

3) Optimizing calculation by DEORL;

4) And outputting the optimal control variable to the variable cycle engine.

In step 1), the nonlinear mathematical model of the variable cycle engine is:

S _t ＝f(a _t )

wherein ,to control the input vector, and also the output of the strategic network, including the mode select shutter MSV opening MSV, regulating the main fuel flow W _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vane angle dvgh,/>The output vector represents the state of the variable cycle engine at the current time, and the state comprises the fuel consumption rate sfc and the variable cycle engine thrust force F, F (·) as nonlinear vector functions for generating the system output.

In step 2), the maximum thrust control mode is to ensure that the thrust of the variable cycle engine is maximum on the premise of ensuring the safe operation of the variable cycle engine, and the mathematical description is as follows:

performance index: maxF

Constraint conditions：g _imin ≤g _i (x)≤g _imax ,i＝1,2,...N

wherein ,g_i (x) For constraint conditions, g is as follows, the temperature before turbine is not over-temperature, the high-pressure compressor is not surging, the high-pressure rotor is not over-rotating, the fan is not over-rotating, the combustion chamber is not rich in oil and is flameout, the oil supply of the main combustion chamber is not more than the maximum oil supply, the throat area of the spray pipe is not less than the minimum area, etc _imin 、g _imax The lower limit value and the upper limit value of the constraint conditions are respectively, and N represents the number of the constraint conditions.

The following nonlinear constraint problem needs to be solved for the maximum thrust control mode:

wherein the variable a is controlled _t ＝[msv,W _f ,A ₉ ,dvgl,dvgh] ^T All the variables take initial values within the corresponding change ranges.

In step 3), the optimization calculation with DEORL has the algorithm flow as follows:

(1) Random initialization of current Actor network μ (s|θ ^μ ) Weight parameter theta of (2) ^μ And the current Critic network Q (s, a|θ ^Q ) Weight parameter theta of (2) ^Q ；

(2) Initializing a target Actor network mu 'and a target Critic network Q', wherein the respective network weight parameters are as follows: θ ^μ′ ←θ ^μ ,θ ^Q′ ←θ ^Q ；

(3) Initializing an experience playback pool R;

(4) When i=1, 2, …, maximum round number, initializing a random process N for action exploration to obtain an initial state s ₀ ；

(5) When t=1, 2, …, T, according to formula a _t ＝μ(s _t |θ ^μ ) +N computing action currently to be performed, environment performing action a _t State transition to s occurs _t+1 And obtains the prize value r _t The current sample (s _t ,a _t ,s _t+1 ,r _t ) Stored in an experience playback pool, and randomly sampled N from the experience playback pool R ₁ Bar sample, as training data for Actor network, critic network, let y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ) Updating Critic network parameters by minimizing a loss function L;

wherein ,y_i For the target Q value, y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ) The method comprises the steps of carrying out a first treatment on the surface of the In calculating y _i When gamma represents discount coefficient, gamma is 0,1]For the target value network Q' (s, a|theta) ^Q′ ) And a target policy network μ' (s|θ) ^μ′ ) The Q network can be kept stable in the training process, and is easier to converge.

Calculating the gradient of the strategy network:

updating the target policy network μ 'and the target value network Q':

θ ^μ′ ＝τθ ^μ +(1-τ)θ ^μ′

θ ^Q′ ＝τθ ^Q +(1-τ)θ ^Q′

wherein ,θ^μ and θ^Q Parameters of the current policy network and the current value network respectively, theta ^μ′ and θ^Q′ Parameters of the target policy network and the target value network, respectively. T.epsilon.0, 1]To update the coefficients, the update step size is represented, balancing between the current network parameters and the target network parameters.

In step 4), the control variable is the mode selection shutter MSV opening MSV, and the main fuel flow W is regulated _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vaneAngle dvgh.

The invention designs a DEORL algorithm, combines an Actor-Critic model and a DQN idea deep reinforcement learning algorithm, and improves algorithm performance by using experience playback and a target network. The experience playback technology can effectively improve the data utilization rate and relieve the relevance among sample data, so that the problems of unstable training and difficult convergence of a network are avoided. Exploratory expansion techniques achieve a balance of deep exploration and utilization of the environment by adding noise. The Actor-Critic model combines the advantages of the strategy gradient and value function method, can update network parameters in a single step, improves algorithm efficiency, and simultaneously avoids the problem that the strategy gradient algorithm converges to a local optimal solution. The algorithm solves the problem of optimizing and controlling the minimum oil consumption of the variable cycle engine by executing a series of instructions and operations by utilizing programming language, data structures and other computer science technologies. The algorithm can efficiently and accurately complete tasks and can exhibit advantages in terms of large-scale data processing and complex computation. The correctness, reliability and efficiency of the algorithm are verified and tested in a computer environment. And the DEORL algorithm is applied to optimizing control of the maximum thrust mode of the variable cycle engine, so that the thrust of the variable cycle engine is improved as much as possible on the premise of ensuring the safe operation of the variable cycle engine, and the maneuverability and the flexibility of the aircraft are improved.

Drawings

FIG. 1 is a schematic illustration of a variable cycle engine configuration of the present invention;

FIG. 2 is a flow chart of the variable cycle engine maximum thrust optimizing control based on the DEORL algorithm of the present invention;

FIG. 3 is a schematic diagram of the maximum thrust control mode of the present invention;

FIG. 4 is a diagram of an Actor-Critic network architecture of the present invention.

Detailed Description

The invention solves the problem of optimizing and controlling the maximum thrust performance of a variable cycle engine. The optimizing problem of the variable cycle engine is to select an optimal control method to find a group of optimal control amounts (the opening degree MSV of the mode selection valve MSV and the main fuel flow W) in order to optimize the combined index of one or more performances of the variable cycle engine _f Area A of tail nozzle ₉ Fan guide vane angle dvgl, compressor guide vane angle dvgh).

And taking a nonlinear mathematical model of a variable cycle engine as a research object, establishing an objective function of a maximum thrust control mode, and performing optimization calculation on the variable cycle engine by using a DEORL algorithm to obtain an optimal control variable meeting the maximum thrust performance index. The maximum thrust control mode refers to the mode of furthest lifting the thrust of the variable cycle engine on the premise of ensuring the safe operation of the variable cycle engine, and is commonly used for climbing, accelerating flying and assault of an airplane.

1. Variable cycle engine working principle and nonlinear model design

The invention takes a double-external-culvert variable cycle engine with a core driving fan stage (Core driven fan stage, CDFS for short) as a main research object, and the main structure of the engine is shown in figure 1 and comprises an air inlet channel, a fan, the core driving fan stage, a high-pressure air compressor, a combustion chamber, a high-pressure turbine, a low-pressure turbine, a mixing chamber, an afterburner and a tail nozzle. Compared with a common double-shaft turbofan engine, the novel double-shaft turbofan engine has the remarkable structural characteristics that CDFS is added between a fan and a high-pressure compressor, and an auxiliary culvert and a main culvert are respectively arranged behind the fan and the CDFS. Under different working states of the variable cycle engine, the air flow of the variable cycle engine external duct and the core engine can be greatly adjusted by changing the angle of the guide vane of the CDFS, so that the circulation parameters such as the internal and external duct air flow, the duct ratio, the supercharging ratio and the like of the variable cycle engine are adjusted, and the thermodynamic cycle adjustment of the variable cycle engine is more flexible.

Compared with the traditional aeroengine, the performance advantage of the variable cycle engine is mainly reflected in that the variable cycle engine is increased due to the fact that the adjustable components are increased, the pneumatic thermodynamic cycle of the variable cycle engine in the working process is regulated by changing the parameters of the adjustable components, the unit fuel consumption rate is obviously reduced when the thrust is basically unchanged, the economic benefit of the variable cycle engine is greatly improved, meanwhile, the adjustable components are increased, the regulating process of a control system is more flexible, and the stability margin of the components such as a fan, a gas compressor is greatly improved.

The variable cycle engine has two typical modes of operation, single/double by which switching is achieved by a variable valve such as mode select valve MSV, FVABI, RVABI. When the MSV is completely opened, the air flow is divided into two parts after passing through the fan, one air flow flows into the auxiliary culvert, and the part of air flow is finally effectively mixed with the main culvert air flow at the outlet section of the main culvert and flows into the main culvert. Another air stream flows into the CDFS, part of this air stream is introduced into the overall culvert via RVABI and the remainder of the air stream will flow into the core machine. Because of the existence of the tail end duct and the RVABI, the total external air flow is divided into two parts at the outlet, one air flow directly flows into the tail nozzle through the tail end duct, and the other air flow enters the mixing chamber, is mixed with the air flow passing through the core machine, is combusted through the afterburner, and flows into the tail nozzle. In the working process, the main culvert and the auxiliary culvert are provided with air flow, so that the double culvert mode is named.

When the mode selection shutter MSV is fully closed, the air flow through the fan flows entirely into the CDFS, the fan operates in the compressor mode, and no more air flows through the secondary culvert, a process designated as a single culvert operation mode.

When the variable cycle engine is switched between different modes of operation, the internal thermodynamic cycle state changes accordingly. In order to ensure that the variable cycle engine can continuously and stably and reliably work, the single-double culvert mode conversion is stably realized, and the following basic conditions should be met in the mode switching process:

(1) The inlet flow of the fan is basically kept unchanged;

(2) The fan boost ratio remains substantially unchanged;

(3) The boost ratio of the core driven fan stage varies steadily with the switching process;

(4) The bypass ratio changes smoothly with the change of MSV displacement;

(5) Ensuring that the backflow margin is always greater than 0, namely that no backflow of air flow around the CDFS exists;

(6) Avoiding continuous overtemperature and overturning and avoiding surging.

In order to meet the above conditions, when the MSV displacement is regulated, other adjustable component parameters should be regulated in a matching way, and the opening degree of the mode selection valve MSV can be expressedThe operating mode of the cycle engine is characterized. The mode switch adjustment strategy that has proven feasible at present is: in the mode switching process from single culvert to double culvert, the MSV displacement is regulated to increase the cross-sectional area of the inlet of the auxiliary culvert, so as to avoid the great reduction of the fan pressure ratio, the angle alpha of the guide vane at the inlet of the CDFS needs to be reduced in a matched manner _i While reducing the adjustable turbine guide angle alpha _t . The mode switching process from double culverts to single culverts is opposite in regulation strategy. When the variable cycle engine works in different working modes, the angle alpha of the CDFS guide vane needs to be adjusted in order to obtain the ideal bypass ratio and ensure that the airflow does not surge or other abnormal working states _i To change the content air flow rate to match the variable cycle engine operating condition.

Because the maximum thrust optimizing control of the variable cycle engine needs to make a control decision according to the current working state parameters of the variable cycle engine, the mathematical model of the variable cycle engine is usually used for replacing the real variable cycle engine when the optimal control method is researched. Because the modeling technology of the variable cycle engine is very mature, the modeling technology is not repeated here, and the established variable cycle engine nonlinear model is directly given:

S _t ＝f(a _t )

2. Maximum thrust optimizing control framework based on DEORL algorithm

Variable cycle hairThe optimizing control technology of the maximum thrust of the engine is a key technology for the comprehensive control of a flight/propulsion system. With the increase of aviation technology investment, the full-right digital electronic control technology is widely applied to a new generation of variable cycle engines. In order to optimize the maximum thrust of the variable cycle engine, maximum thrust optimizing control is generally adopted at the maximum thrust state of the variable cycle engine. The genetic algorithm has the defects of large calculated amount, long time consumption, easiness in early ripening and the like, and is not suitable for being applied to the performance optimization of a complex variable cycle engine. Therefore, the invention designs a variable cycle engine based on a DEORL algorithm for maximum thrust optimizing control, and the basic idea is shown in figure 2. In FIG. 2, a _t An output representing a policy network; s is S _t Representing the state of the variable cycle engine at the current t moment; s is S _t+1 Representing a state of the variable cycle engine at the time t in a way of a _t As input, the t+1 time state will be reached; r is (r) _t Further updates of the network are directed to the prize values calculated from the states and actions at each of the above moments.

There are two types of neural network structures for the DEORL algorithm: policy networks and value networks. Wherein the policy network represents a specific control policy pi (a|s), and comprises four layers of neural network structures in total: input layer, hidden layer 1, hidden layer 2, output layer. Wherein the input layer comprises two neuron nodes for receiving state vectors respectivelyThe number of neurons of the hidden layers 1 and 2 is 30 and 20 respectively, and complex functional relations are fitted; the output layer has two neuron nodes for outputting motion vector +.>The value network represents a state action value function Q (s, a), also comprises a four-layer neural network structure, the input layer is 4 neuron nodes, and the value network represents the sum of the dimensions of a state vector and an action vector.

The remaining thrust is obtained by subtracting the flying resistance from the variable cycle engine thrust. When the variable cycle engine is in working states such as take-off, landing and flying, in order to shorten the time of climbing and accelerating the flight of the aircraft and enable the aircraft to obtain operational advantages, the aircraft needs to obtain the largest possible residual thrust, and the variable cycle engine at the moment needs to generate the largest possible thrust. Therefore, the maximum remaining thrust control mode is also referred to as a maximum thrust control mode. The control targets of the maximum thrust mode are: on the premise of ensuring the safe operation of the variable cycle engine, the thrust of the variable cycle engine is improved to the maximum extent. The precondition for safe operation of a variable cycle engine is that the maximum thrust control mode is limited by maximum turbine inlet temperature, maximum converted air flow, maximum converted rotational speed of the fan, and variable cycle engine surge.

By increasing variable cycle engine air flow W _a And increasing the variable cycle engine pressure ratio pi _c Is the main way to achieve the maximum thrust control mode. Pi of maximum thrust control mode _c and W_a The relationship of (2) is shown in FIG. 3. In the maximum thrust control mode, the main fuel flow W is increased _f While reducing the nozzle area A ₉ Increasing the variable cycle engine pressure ratio pi _c Increasing the fan vane angle dvgl and the compressor vane angle dvgh increases the converted air flow of the variable cycle engine, thereby increasing thrust. Main fuel flow W _f The increase in inlet temperature of the high and low pressure turbines increases and causes the high and low pressure rotational speeds to increase. Therefore, the fan surge margin SMF and the compressor surge margin SMC must be ensured to be greater than the minimum surge margin allowed while increasing the thrust, and the total temperature of the high and low pressure turbine inlet must be lower than its maximum limit temperature, and meet the limit of the maximum rotational speed of the high and low pressure rotors of the variable cycle engine. Fig. 3 shows that from operating point a on the common operating line, the optimization is started, the optimal operating point b is reached, the pressure ratio is increased after the optimization, the thrust is increased, and the minimum surge margin limit or the maximum converted flow, rotation speed or temperature limit boundary is reached.

The mathematical description of the maximum thrust control mode, taking into account constraints, is as follows:

performance index: maxF

Constraint conditions: g _imin ≤g _i (x)≤g _imax ,i＝1,2,...N

wherein ,g_i (x) G is a constraint condition _imin ，g _imax The constraint conditions are respectively lower limit values, upper limit values and N are the number of the constraint conditions.

The DEORL algorithm is applied to the maximum thrust optimizing control of the variable cycle engine.

3. DEORL algorithm principle and design flow

The DEORL algorithm integrates the idea of the DQN algorithm on the basis of an Actor-Critic model structure, and the structure of the DEORL algorithm comprises a value network and a strategy network. Reinforcement learning is usually based on an MDP model, and there is a correlation between data, which is easy to cause unstable training and difficult to converge on a network, so that the DEORL algorithm improves the overall performance of the algorithm by using two modes of an experience playback pool and a target network.

Experience playback pool:

for a neural network of deep reinforcement learning, a certain amount of sample data is required when the weight coefficient of the neuron is updated by using a gradient descent method. If the online interactive learning mode is utilized, the current data needs to be discarded after the current network updating is finished, so that the data utilization rate is greatly reduced, and the intelligent agent needs to perform more interactions with the environment to achieve the final convergence effect. The experience playback technique opens up a buffer of a certain size to transfer state transition information (s _t ,a _t ,r _t ,s _t+1 ) And (5) storing. The state transition sample information sequentially enters the buffer area according to the sequence, and if the buffer area is full, when a new sample enters, the sample with the longest time can be moved out of the buffer area.

Exploratory extension:

the DEORL algorithm is used as a deterministic strategy gradient algorithm, and after an initial state is given, an interaction sequence obtained according to a strategy network is fixed, and an intelligent agent cannot generate different behaviors to deeply explore the environment, so that the strategy cannot be promoted. In order to change the decision process of DEORL from the determination process to the randomness process, adding noise N on the basis of the strategy output action realizes the exploratory expansion, and finally the action a is executed by the environment _t The expression is:

a _t ＝μ(s _t |θ ^μ )+N

n is typically set to gaussian white noise, the average of which is the output value of the policy network. As the training process continues, the noise variance is continually reduced to achieve a balance of exploration and utilization.

The Actor-Critic method:

the strategy gradient algorithm is optimizing according to the gradient of the strategy function, and the parameters are corrected in a small amplitude along the gradient direction, so that the optimizing process is smoother and has small fluctuation, but the relative efficiency is lower. Furthermore, the gradient method also allows the strategy gradient algorithm to easily converge to a locally optimal solution, rather than the desired global optimum. The Actor-Critic model thus yields a better algorithm structure by combining the strategy gradient method with the value function method, as shown in fig. 4.

In the Actor-Critic model structure, the basis of an Actor network is a strategy gradient algorithm, and proper actions can be selected from continuous actions according to the current state; the Critic network is based on a DQN equivalent function method, a punishment value caused by state transition after the action is executed is calculated, and whether the action is reasonable is evaluated.

The Actor-Critic structure can update network parameters in a single step, so that the problem of low model efficiency caused by the round-trip updating of a strategy gradient algorithm is avoided. In a specific interaction process, the Actor network acquires a probability value of each action, and then selects a behavior based on the probability; the Critic network is continuously updated to perfect the reward and punishment value of each action selected under each state; finally, the Actor network updates own parameters according to the punishment and punishment values of the Critic network to the actions, and the new loss function is as follows:

Loss＝r(s _t ,a _t )+γQ(s _t+1 ,μ(s _t+1 |θ ^μ )|θ ^Q )-Q(s _t ,a _t |θ ^Q )

the policy gradient update formula adopted by the Actor network is as follows:

the Critic network updates its parameters by the DQN algorithm, and the gradient update formula is as follows:

wherein ,λ_μ and λ_Q Learning rates of an Actor network and a Critic network respectively; θ ^μ and θ^Q Parameters of an Actor network and a Critic network respectively; r(s) _t ,a _t ) Representing the state s of the agent in the environment _t Lower execution action a _t The prize value size obtained.

DEORL algorithm flow:

according to the theoretical basis, the Actor and the Critic both comprise two structures of an online network and a target network. And generating sample data through interaction between the online strategy network and the environment, storing the data into an experience playback pool, randomly sampling a certain sample from the experience playback pool by the intelligent agent in the next time step, and updating parameters of the online strategy network and the online value network according to the sample data. The algorithm flow of the DEORL is as follows:

(3) Initializing an experience playback pool R;

(5) When t=1, 2, …, T, according to formula a _t ＝μ(s _t |θ ^μ ) +N computing action currently to be performed, environment performing action a _t State transition to s occurs _t+1 And obtains the prize value r _t Will beCurrent sample(s) _t ,a _t ,s _t+1 ,r _t ) Stored in an experience playback pool, and randomly sampled N from the experience playback pool R ₁ Bar sample, as training data for Actor network, critic network, let y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ) Updating Critic network parameters by minimizing a loss function L;

wherein ,y_i For the target Q value, y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ). In calculating y _i When gamma represents discount coefficient, gamma is 0,1]For the target value network Q' (s, a|theta) ^Q′ ) And a target policy network μ' (s|θ) ^μ′ ) The Q network can be kept stable in the training process, and is easier to converge.

Calculating the gradient of the strategy network:

updating the target network μ 'and Q':

θ ^μ′ ＝τθ ^μ +(1-τ)θ ^μ′

θ ^Q′ ＝τθ ^Q +(1-τ)θ ^Q′

In short, the DEORL algorithm is based on an Actor-Critic network framework, and the iterative training of each network is completed through the interaction among the environment, the Actor network and the Critic network under the round and time steps of the loop.

4. Maximum thrust optimizing control based on DEORL algorithm

The maximum thrust control mode of the variable cycle engine is to furthest promote the thrust of the variable cycle engine on the premise of ensuring the safe operation of the variable cycle engine. The invention selects the opening degree MSV of the mode selection valve MSV and the fuel flow W of the main combustion chamber _f Nozzle area A of tail nozzle ₉ The fan guide vane angle dvgl and the compressor guide vane angle dvgh are used as control variables.

In the maximum thrust control mode, the optimization objective is as follows:

max F

in order to ensure the optimality, stability and structural strength of the operating state of the variable cycle engine, specific restrictions must be placed on the use of the variable cycle engine. All these limitations can be divided into two categories, due to limitations in flight conditions, mechanical loads, thermal loads, and aerodynamic loads: the limitation of aerodynamic stability conditions in the working process of the power device components is related to some variable cycle engine components such as a gas compressor, a combustion chamber and the like; the second category is intensity limitation. The necessary strength margin should be maintained under all conditions of use of the variable cycle engine. For steady operation of a variable cycle engine, the rotational speed limit that has the most impact on the turbine blade strength margin is limited. Within a given flight envelope, the pressure and temperature of the variable cycle engine must be limited for structural or aerodynamic considerations. Under normal operating conditions, hyperthermia and overrun are limited.

In summary, the constraint conditions of the variable cycle engine selected by the invention are as follows: the temperature before the turbine is not over-temperature, the high-pressure compressor is not surge, the high-pressure rotor is not over-rotated, the fan is not over-rotated, the combustion chamber is not rich in oil and extinguished, the oil supply of the main combustion chamber is not more than the maximum oil supply of the main combustion chamber, the throat area of the spray pipe is not less than the minimum area of the spray pipe, and the like.

Taking into account the influence of objective functions, constraints and control variables, one needs to find a set of suitable W _f ，A ₉ Dvgl, dvgh, operating variable cycle engineThe maximum thrust point, i.e. the nonlinear constraint problem needs to be solved:

wherein the variable a is controlled _t ＝[msv,W _f ,A ₉ ,dvgl,dvgh] ^T ∈R ⁴ All the variables take initial values within the corresponding change ranges.

Under the maximum thrust mode, the thrust of the variable cycle engine is improved to the maximum extent on the premise of ensuring the safe operation of the variable cycle engine. The object can be described by the following mathematical expression:

max F

this objective function may be converted into the following form:

in the above, K _f Is a positive constant.

The deep exploration optimization reinforcement learning (Deep Exploration and Optimization Reinforcement Learning, DEORL) algorithm combines with the Actor-Critic model and the DQN idea deep reinforcement learning algorithm, utilizing empirical playback and target network to improve algorithm performance. The experience playback technology can effectively improve the data utilization rate and relieve the relevance among sample data, so that the problems of unstable training and difficult convergence of a network are avoided. Exploratory expansion techniques achieve a balance of deep exploration and utilization of the environment by adding noise. The Actor-Critic model combines the advantages of the strategy gradient and value function method, can update network parameters in a single step, improves algorithm efficiency, and simultaneously avoids the problem that the strategy gradient algorithm converges to a local optimal solution. The algorithm solves the problem of optimizing and controlling the minimum oil consumption of the variable cycle engine by executing a series of instructions and operations by utilizing programming language, data structures and other computer science technologies. The algorithm can efficiently and accurately complete tasks and can exhibit advantages in terms of large-scale data processing and complex computation. The correctness, reliability and efficiency of the algorithm are verified and tested in a computer environment. And finally, the DEORL is applied to a maximum thrust optimizing control mode of the variable cycle engine, so that the thrust of the variable cycle engine is improved as much as possible on the premise of ensuring the safe operation of the variable cycle engine, and the maneuverability and the flexibility of the aircraft are improved.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.

Claims

1. The maximum thrust control optimization method of the variable cycle engine based on the computer is characterized by comprising the following steps of:

1) Establishing a nonlinear mathematical model S of a variable cycle engine _t ＝f(a _t )；

2) Determining an objective function and a constraint function of the maximum thrust control mode:

the maximum thrust control mode is to ensure that the thrust of the variable cycle engine is maximum on the premise of ensuring the safe operation of the variable cycle engine, and the mathematical description is as follows:

performance index: max F

Constraint conditions: g _imin ≤g _i (x)≤g _imax ,i＝1,2,...N

wherein ,g_i (x) The constraint conditions include no overtemperature before turbine, no surging of high-pressure compressor, no surging of high-pressure rotor, no surging of fan, no rich oil and flameout of combustion chamber, no oil supply of main combustion chamber exceeding the maximum oil supply, no less throat area of spray pipe than the minimum area, g _imin 、g _imax Respectively a lower limit value and an upper limit value of the constraint condition, wherein N represents the number of the constraint conditions;

the maximum thrust control mode solves the following nonlinear constraint problem:

wherein the variable a is controlled _t ＝[msv,W _f ,A ₉ ,dvgl,dvgh] ^T All the variables take initial values within the corresponding change ranges;

3) With the optimization calculation of DEORL, the algorithm performance is improved by using experience playback and a target network: initializing a weight parameter of a current Actor network and a weight parameter of a current Critic network, initializing an experience playback pool, and initializing a random process for action exploration to obtain an initial state; calculating the action to be executed currently, and updating Critic network parameters by minimizing a loss function; calculating the gradient of the strategy network, and updating the target strategy network and the target value network:

4) And outputting the optimal control variable to the variable cycle engine.

2. The method for optimizing maximum thrust control of a variable cycle engine based on a computer of claim 1, wherein in step 1), the nonlinear mathematical model of the variable cycle engine is:

S _t ＝f(a _t )

3. The optimization method for controlling the maximum thrust of the variable cycle engine based on the computer as claimed in claim 1, wherein in the step 3), the optimization calculation is performed by using DEORL, and the algorithm flow is as follows:

(3) Initializing an experience playback pool R;

wherein ,y_i For the target Q value, y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ) The method comprises the steps of carrying out a first treatment on the surface of the In calculating y _i When gamma represents discount coefficient, gamma is 0,1]For the target value network Q' (s, a|theta) ^Q′ ) And a target policy network μ' (s|θ) ^μ′ ) The Q network can be kept stable in the training process, and is easier to converge;

calculating the gradient of the strategy network:

updating the target policy network μ 'and the target value network Q':

θ ^μ′ ＝τθ ^μ +(1-τ)θ ^μ′

θ ^Q′ ＝τθ ^Q +(1-τ)θ ^Q′

wherein ,θ^μ and θ^Q Parameters of the current policy network and the current value network respectively, theta ^μ′ and θ^Q′ Parameters of the target policy network and the target value network respectively; t.epsilon.0, 1]To update the coefficients, the update step size is represented, balancing between the current network parameters and the target network parameters.

4. A method for optimizing maximum thrust control of a variable cycle engine based on a computer as claimed in claim 1, wherein in step 4), said control variable is a mode selection shutter MSV opening MSV, regulating the main fuel flow W _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vane angle dvgh.