CN116974194A

CN116974194A - Optimal control method for acceleration process of variable cycle engine based on computer

Info

Publication number: CN116974194A
Application number: CN202310893665.9A
Authority: CN
Inventors: 缑蕊嘉; 李臻曜; 冯子懿
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-31

Abstract

A variable cycle engine acceleration process optimal control method based on a computer relates to variable cycle engine control. According to the characteristics of the variable cycle engine, a DEORL algorithm is designed, the algorithm is combined with an Actor-Critic model and a DQN thought deep reinforcement learning algorithm, the performance of the algorithm is improved by using experience playback and a target network, and the balance of deep exploration and utilization of the environment is realized by using a exploratory expansion technology. The algorithm is completed based on a computer, and solves the problem of optimizing control of the minimum oil consumption of the variable cycle engine. The correctness, reliability and efficiency of the algorithm are verified and tested in a computer environment. The DEORL algorithm is used for optimizing the acceleration process, and the optimal control variable is output to the variable cycle engine. The optimal control of the accelerating process of the variable cycle engine can be realized, the accelerating time of the variable cycle engine is shortened on the premise of ensuring the safe operation of the variable cycle engine, the accelerating performance of the variable cycle engine is effectively improved, and the maneuverability of the aircraft is improved.

Description

Optimal control method for acceleration process of variable cycle engine based on computer

Technical Field

The invention relates to the technical field of variable cycle engine control, in particular to an optimal control method for an acceleration process of a variable cycle engine based on a computer.

Background

Modern war requires advanced fighter to have the capacity of long-range subsonic cruise and the capacity of quick response during combat, and future variable cycle engines will continuously develop in three directions of long cruise mileage, high thrust-weight ratio and wide working range. By studying the speed characteristics of a conventional variable cycle engine, researchers find that a turbojet engine in a supersonic state has higher unit thrust and lower unit fuel consumption rate, and a large bypass has lower unit fuel consumption rate than a turbofan engine in a subsonic state. Considering the performance requirements of modern war on a fighter plane propulsion system, a turbofan engine is more suitable for subsonic flight, and a turbojet engine is more suitable for supersonic flight. Thus, there is a better performing variable cycle engine. Under different working states of the variable cycle engine, by adopting different technical means of adjusting the geometric shape, the physical position or the size of the characteristic parts and the like, the performance advantages of the turbofan and the turbojet two different variable cycle engines are integrated, so that the variable cycle engine is ensured to work in a similar configuration of the turbofan engine under the subsonic cruising state, higher economical efficiency is obtained, and the variable cycle engine is ensured to work in a similar configuration of the turbojet engine under the supersonic combat state, thereby obtaining continuous and reliable high unit thrust, achieving the aim of integrating the performance advantages of the turbofan and the turbojet engine, and ensuring that the variable cycle engine has excellent performance in the whole working process.

Variable cycle engines are the core equipment of an aircraft, whose performance directly affects the flight efficiency and safety of the aircraft. With the continuous development of the aviation industry, the requirements on variable cycle engines are also increasing. Currently, the development of variable cycle engines has covered many areas including mechanical design, materials science, thermodynamics, hydrodynamics, and the like. Therefore, the research on the enhanced variable cycle engine control system has important significance for improving the national aviation technology overall level.

Modern fighters have very high maneuverability requirements for aircraft, and good maneuverability requires a variable cycle engine with good acceleration performance. Acceleration process control is one of the transition state controls of a variable cycle engine, and the acceleration process control has a more pronounced effect on the variable cycle engine and aircraft performance than the variable cycle engine start, on/off boost, and deceleration controls. The accelerating process of the variable cycle engine directly affects important flight indexes of the fighter (such as fighter acceleration, climbing, emergency landing and flying, and the like), so that the research on the optimal control of the accelerating process of the variable cycle engine has important significance in improving the accelerating performance of the variable cycle engine.

The traditional intelligent optimization algorithm realizes the optimization of the control system by random search based on probability, but has the defects of low convergence speed, easy sinking into local optimum, easy premature and the like. The complex nonlinear control system characteristics of the variable cycle engine and the various control coupling parameters further amplify the disadvantages of the intelligent optimization algorithm. The acceleration process control of the variable cycle engine requires that the multivariable optimization control is realized under various limiting conditions, and the number of corresponding local optimal points is increased sharply, so that the optimal control of the acceleration process needs to have excellent global optimizing capability and quick optimizing searching capability. Although a certain result is achieved in the research of the optimal control of the accelerating process of the variable cycle engine at home and abroad, a plurality of unresolved technical problems or improvements are still existed.

Disclosure of Invention

The invention aims to provide a high-efficiency and accurate variable cycle engine acceleration process control optimization method based on a computer aiming at the technical problems existing in the prior art. The deep exploration optimization reinforcement learning (Deep Exploration and Optimization Reinforcement Learning, DEORL) is applied to optimizing control of the acceleration process of the variable cycle engine, so that the optimal control of the acceleration process of the variable cycle engine is realized, the acceleration process performance of the variable cycle engine is improved, and the maneuverability of the aircraft is improved. The deep exploration optimization reinforcement learning fully utilizes sample data to avoid the defect of too slow network convergence, and simultaneously avoids a local optimal solution by optimizing network parameters, thereby effectively solving the defect of optimal control of the traditional intelligent optimization algorithm in the variable cycle engine acceleration process.

According to the invention, a nonlinear mathematical model of the variable cycle engine is firstly established, and then the DEORL is used for optimizing the acceleration process of the variable cycle engine, so that the optimization of the acceleration process of a certain variable cycle engine is realized.

The invention comprises the following steps:

1) Establishing a nonlinear mathematical model of the variable cycle engine;

2) Determining a corresponding objective function and a constraint function according to a variable cycle engine acceleration process;

3) Optimizing calculation by DEORL;

4) And outputting the optimal control variable to the variable cycle engine.

In step 1), the nonlinear mathematical model of the variable cycle engine is:

S _t ＝f(a _t )

wherein ,to control the input vector, and also the output of the strategic network, including the mode select shutter MSV opening MSV, regulating the main fuel flow W _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vane angle dvgh,/>The output vector represents the state of the variable cycle engine at the current time, and the state comprises the fuel consumption rate sfc and the variable cycle engine thrust force F, F (·) as nonlinear vector functions for generating the system output.

In step 2), the constraint conditions to be considered in the acceleration process of the variable cycle engine are as follows: the temperature before the turbine is not over-temperature, the high-pressure compressor is not surge, the high-pressure rotor is not over-rotated, the fan is not over-rotated, the combustion chamber is not rich in oil and extinguished, the oil supply of the main combustion chamber is not over the maximum oil supply of the main combustion chamber, and the like; the mathematical description of the optimization problem is as follows:

wherein the control variable a _t ＝[msv,W _f ,A ₉ ,dvgl,dvgh] ^T All the variables take initial values within the corresponding change range, J ₁ For the first optimization objective, J ₂ For the second optimization objective, T _t4 Indicating total temperature after combustion chamber of variable cycle engine, n _F Indicating variable cycle engine fan speed, n _H Represents the rotating speed of a high-pressure rotor of a variable cycle engine, n _Hd The expected value of the high-pressure rotor rotating speed of the variable cycle engine is represented, SMF represents the surge margin of a fan of the variable cycle engine, SMC represents the surge margin of a compressor of the variable cycle engine, f represents the oil-gas ratio of the variable cycle engine, and p _t3 Representing total pressure after compressor of variable cycle engine, a and b respectively represent control variable a _t Lower and upper limits of (2).

Converting the multi-objective function into a single-objective function by adopting a linear weighting method to determine an optimizing objective function; namely:

discretizing and normalizing the above; the purpose of this process is to eliminate the influence of the difference of the dimension and magnitude variation ranges of each parameter in the objective function on the optimization result, and the final optimizing objective function can be written as follows:

in the above, ω _a and ω_b For the weight coefficient of the corresponding objective function, satisfy omega _a ≥0,ω _b And the size of the optimization target function is more than or equal to 0, and reflects the importance degree of the corresponding optimization target function in the multi-target optimization problem.

With reference to the form of the objective function, discretizing and normalizing the variable cycle engine constraint is also performed:

above g _i (x) (i=1, 2,.,. 11) constitutes a constraint function matrix g (x), and the objective function can be given consideration of the constraint condition:

wherein ω= [ ω ] ₁ ,ω ₂ ,ω ₃ ,ω ₄ ,ω ₅ ,ω ₆ ,ω ₇ ,ω ₈ ,ω ₉ ,ω ₁₀ ,ω ₁₁ ]Adjusting the coefficient matrix, ω, for the weight of the constraint function ₁ ,ω ₂ ,ω ₃ ,ω ₄ ,ω ₅ ,ω ₆ ,ω ₇ ,ω ₈ ,ω ₉ ,ω ₁₀ ,ω ₁₁ The weight coefficient can be adjusted for the corresponding constraint condition, and the design of omega.g (x) is used for meeting the constraint condition of the variable cycle engine.

In step 3), the optimization calculation with DEORL has the algorithm flow as follows:

(1) Random initialization of current Actor network μ (s|θ ^μ ) Weight parameter theta of (2) ^μ And the current Critic network Q (s, a|θ ^Q ) Weight parameter theta of (2) ^Q ；

(2) Initializing a target Actor network mu 'and a target Critic network Q', wherein the respective network weight parameters are as follows: θ ^μ′ ←θ ^μ ,θ ^Q′ ←θ ^Q ；

(3) Initializing an experience playback pool R;

(4) When i=1, 2, …, maximum round number, initializing a random process N for action exploration to obtain an initial state s ₀ ；

(5) When t=1, 2, …, T, according to formula a _t ＝μ(s _t |θ ^μ ) +N computing action currently to be performed, environment performing action a _t State transition to s occurs _t+1 And obtains the prize value r _t The current sample (s _t ,a _t ,s _t+1 ,r _t ) Stored in an experience playback pool, and randomly sampled N from the experience playback pool R ₁ Bar sample, as training data for Actor network, critic network, let y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ) Updating Critic network parameters by minimizing a loss function L;

wherein ,y_i For the target Q value, y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ) The method comprises the steps of carrying out a first treatment on the surface of the In calculating y _i When gamma represents discount coefficient, gamma is 0,1]For the target value network Q' (s, a|theta) ^Q′ ) And a target policy network μ' (s|θ) ^μ′ ) The Q network can be kept stable in the training process, and is easier to converge;

calculating the gradient of the current strategy network:

updating the target policy network μ 'and the target value network Q':

θ ^μ′ ＝τθ ^μ +(1-τ)θ ^μ′

θ ^Q′ ＝τθ ^Q +(1-τ)θ ^Q′

wherein ,θ^μ and θ^Q Parameters of the current policy network and the current value network respectively, theta ^μ′ and θ^Q′ Parameters of the target policy network and the target value network respectively; t.epsilon.0, 1]To update the coefficients, the update step size is represented, balancing between the current network parameters and the target network parameters.

In step 4), the controlControl variable is mode selection valve MSV opening degree MSV, main fuel flow W is adjusted _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vane angle dvgh.

The invention designs a DEORL algorithm, combines an Actor-Critic model and a DQN idea deep reinforcement learning algorithm, and utilizes experience playback and a target network to improve algorithm performance. The experience playback technology can effectively improve the data utilization rate and relieve the relevance among sample data, so that the problems of unstable training and difficult convergence of a network are avoided. Exploratory expansion techniques achieve a balance of deep exploration and utilization of the environment by adding noise. The Actor-Critic model combines the advantages of the strategy gradient and value function method, can update network parameters in a single step, improves algorithm efficiency, and simultaneously avoids the problem that the strategy gradient algorithm converges to a local optimal solution. The algorithm can solve the problem of optimizing and controlling the minimum oil consumption of the variable cycle engine by executing a series of instructions and operations by utilizing programming language, data structures and other computer science technologies. The algorithm can efficiently and accurately complete tasks and can exhibit advantages in terms of large-scale data processing and complex computation. The correctness, reliability and efficiency of the algorithm are verified and tested in a computer environment. And the DEORL algorithm is applied to optimizing control of the accelerating process of the variable cycle engine, so that the optimal control of the accelerating process of the variable cycle engine is realized, the accelerating time of the variable cycle engine is shortened on the premise of ensuring the safe operation of the variable cycle engine, the accelerating performance of the variable cycle engine is effectively improved, and the maneuverability of an airplane is improved.

Drawings

FIG. 1 is a schematic illustration of a variable cycle engine configuration of the present invention;

FIG. 2 is a flow chart of optimizing control of the variable cycle engine acceleration process based on the DEORL algorithm of the present invention;

FIG. 3 is a diagram of an Actor-Critic network architecture of the present invention.

Detailed Description

The invention solves the problem of optimizing control of the acceleration process of a variable cycle engine. The optimizing problem of the variable cycle engine is to accelerate the variable cycle engineReaching the optimum, selecting the optimum control method to find a group of optimum control amounts (the opening degree MSV of the mode selection valve MSV, the main fuel flow W _f Area A of tail nozzle ₉ Fan guide vane angle dvgl, compressor guide vane angle dvgh).

And (3) taking a nonlinear mathematical model of a variable cycle engine as a research object, establishing a corresponding objective function of the acceleration process, and performing optimization calculation on the variable cycle engine by using a DEORL algorithm to obtain an optimal control variable meeting an optimal performance index in the acceleration process, so that the acceleration time of the variable cycle engine is shortened and the acceleration performance of the variable cycle engine is effectively improved on the premise of ensuring the safe operation of the variable cycle engine.

1. Variable cycle engine working principle and nonlinear model design

The invention takes a double-external-culvert variable cycle engine with a Core Driving Fan Stage (CDFS) as a main research object, and the main structure is shown in figure 1, and comprises main components including an air inlet channel, a fan, the core driving fan stage, a high-pressure air compressor, a combustion chamber, a high-pressure turbine, a low-pressure turbine, a mixing chamber, an afterburner and a tail nozzle. Compared with a common double-shaft turbofan engine, the novel double-shaft turbofan engine has the remarkable structural characteristics that CDFS is added between a fan and a high-pressure compressor, and an auxiliary culvert and a main culvert are respectively arranged behind the fan and the CDFS. Under different working states of the variable cycle engine, the air flow of the variable cycle engine external duct and the core engine can be greatly adjusted by changing the angle of the guide vane of the CDFS, so that the circulation parameters such as the internal and external duct air flow, the duct ratio, the supercharging ratio and the like of the variable cycle engine are adjusted, and the thermodynamic cycle adjustment of the variable cycle engine is more flexible.

Compared with the traditional variable cycle engine, the performance advantage of the variable cycle engine is mainly reflected in that the variable cycle engine is increased due to the fact that the adjustable components are increased, the pneumatic thermodynamic cycle of the variable cycle engine in the working process is regulated by changing the parameters of the adjustable components, the unit fuel consumption rate is obviously reduced when the thrust is basically unchanged, the economic benefit of the variable cycle engine is greatly improved, meanwhile, the adjustable components are increased, the regulating process of a control system is more flexible, and the stability margin of the components such as a fan, a gas compressor and the like is greatly improved.

The variable cycle engine has two typical modes of operation, single/double by which switching is achieved by a variable valve such as mode select valve MSV, FVABI, RVABI. When the MSV is completely opened, the air flow is divided into two parts after passing through the fan, one air flow flows into the auxiliary culvert, and the part of air flow is finally effectively mixed with the main culvert air flow at the outlet section of the main culvert and flows into the main culvert. Another air stream flows into the CDFS, part of this air stream is introduced into the overall culvert via RVABI and the remainder of the air stream will flow into the core machine. Because of the existence of the tail end duct and the RVABI, the total external air flow is divided into two parts at the outlet, one air flow directly flows into the tail nozzle through the tail end duct, the other air flow enters the mixing chamber, is mixed with the air flow passing through the core machine, is combusted through the afterburner, and flows into the tail nozzle. In the working process, the main culvert and the auxiliary culvert are provided with air flow, so that the double culvert mode is named.

When the mode selection shutter MSV is fully closed, the air flow through the fan flows entirely into the CDFS, the fan operates in the compressor mode, and no more air flows through the secondary culvert, a process designated as a single culvert operation mode.

When the variable cycle engine is switched between different modes of operation, the internal thermodynamic cycle state changes accordingly. In order to ensure that the variable cycle engine can continuously and stably and reliably work, the single-double culvert mode conversion is stably realized, and the following basic conditions should be met in the mode switching process:

(1) The inlet flow of the fan is basically kept unchanged;

(2) The fan boost ratio remains substantially unchanged;

(3) The boost ratio of the core driven fan stage varies steadily with the switching process;

(4) The bypass ratio changes smoothly with the change of MSV displacement;

(5) Ensuring that the backflow margin is always greater than 0, namely that no backflow of air flow around the CDFS exists;

(6) Avoiding continuous overtemperature and overturning and avoiding surging.

In order to meet the above conditions, when adjusting MSV displacement, one shouldThe mode select shutter MSV opening may be indicative of the operating mode of the variable cycle engine in coordination with adjusting other adjustable component parameters. The mode switch adjustment strategy that has proven feasible at present is: in the mode switching process from single culvert to double culvert, the MSV displacement is regulated to increase the cross-sectional area of the inlet of the auxiliary culvert, so as to avoid the great reduction of the fan pressure ratio, the angle alpha of the guide vane at the inlet of the CDFS needs to be reduced in a matched manner _i While reducing the adjustable turbine guide angle alpha _t . The mode switching process from double culverts to single culverts is opposite in regulation strategy. When the variable cycle engine works in different working modes, the angle alpha of the CDFS guide vane needs to be adjusted in order to obtain the ideal bypass ratio and ensure that the airflow does not surge or other abnormal working states _i To change the content air flow rate to match the variable cycle engine operating condition.

Because the optimal control of the minimum oil consumption of the variable cycle engine needs to make a control decision according to the current working state parameters of the variable cycle engine, the actual variable cycle engine is usually replaced by a mathematical model of the variable cycle engine when the optimal control method is researched. Because the modeling technology of the variable cycle engine is very mature, and is not repeated here, the established variable cycle engine nonlinear model is directly given

S _t ＝f(a _t )

wherein To control the input vector, and also the output of the strategic network, including the mode select shutter MSV opening MSV, regulating the main fuel flow W _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vane angle dvgh,/>The output vector represents the state of the variable cycle engine at the current time, and the state comprises the fuel consumption rate sfc and the variable cycle engine thrust force F, F (·) as nonlinear vector functions for generating the system output.

2. Variable cycle engine acceleration process optimizing control framework based on DEORL algorithm

The shortest response time control mode in dynamic performance optimizing control of the variable cycle engine is to shorten the acceleration time of the variable cycle engine on the premise of ensuring the safe operation of the variable cycle engine. The shortest response time control mode is generally used for the acceleration process of the variable cycle engine, and effectively improves the acceleration performance of the variable cycle engine. The invention designs a variable cycle engine acceleration process optimizing control based on a DEORL algorithm, and the basic idea is shown in figure 2. In FIG. 2, a _t An output representing a policy network; s is S _t Representing the state of the variable cycle engine at the current t moment; s is S _t+1 Representing a state of the variable cycle engine at the time t in a way of a _t As input, the t+1 time state will be reached; r is (r) _t Further updates of the network are directed to the prize values calculated from the states and actions at each of the above moments.

There are two types of neural network structures for the DEORL algorithm: policy networks and value networks. Wherein the policy network represents a specific control policy pi (a|s), and comprises four layers of neural network structures in total: input layer, hidden layer 1, hidden layer 2, output layer. Wherein the input layer comprises two neuron nodes for receiving state vectors respectivelyThe number of neurons of the hidden layers 1 and 2 is 30 and 20 respectively, and complex functional relations are fitted; the output layer has two neuron nodes for outputting motion vector +.>The value network represents a state action value function Q (s, a), also comprises a four-layer neural network structure, the input layer is 4 neuron nodes, and the value network represents the sum of the dimensions of a state vector and an action vector.

3. DEORL algorithm principle and design flow

The DEORL algorithm integrates the idea of the DQN algorithm on the basis of an Actor-Critic model structure, and the structure of the DEORL algorithm comprises a value network and a strategy network. Reinforcement learning is usually based on an MDP model, and there is a correlation between data, which is easy to cause unstable training and difficult to converge on a network, so that the DEORL algorithm improves the overall performance of the algorithm by using two modes of an experience playback pool and a target network.

Experience playback pool:

for a neural network of deep reinforcement learning, a certain amount of sample data is required when the weight coefficient of the neuron is updated by using a gradient descent method. If the online interactive learning mode is utilized, the current data needs to be discarded after the current network updating is finished, so that the data utilization rate is greatly reduced, and the intelligent agent needs to perform more interactions with the environment to achieve the final convergence effect. The experience playback technique opens up a buffer of a certain size to transfer state transition information (s _t ,a _t ,r _t ,s _t+1 ) And (5) storing. The state transition sample information sequentially enters the buffer area according to the sequence, and if the buffer area is full, when a new sample enters, the sample with the longest time can be moved out of the buffer area.

Exploratory extension:

the DEORL algorithm is used as a deterministic strategy gradient algorithm, and after an initial state is given, an interaction sequence obtained according to a strategy network is fixed, and an intelligent agent cannot generate different behaviors to deeply explore the environment, so that the strategy cannot be promoted. In order to change the decision process of DEORL from the determination process to the randomness process, adding noise N on the basis of the strategy output action realizes the exploratory expansion, and finally the action a is executed by the environment _t The expression is:

a _t ＝μ(s _t |θ ^μ )+N

n is typically set to gaussian white noise, the average of which is the output value of the policy network. As the training process continues, the noise variance is continually reduced to achieve a balance of exploration and utilization.

The Actor-Critic method:

the strategy gradient algorithm is optimizing according to the gradient of the strategy function, and the parameters are corrected in a small amplitude along the gradient direction, so that the optimizing process is smoother and has small fluctuation, but the relative efficiency is lower. Furthermore, the gradient method also allows the strategy gradient algorithm to easily converge to a locally optimal solution, rather than the desired global optimum. The Actor-Critic model thus yields a better algorithm structure by combining the strategy gradient method with the value function method, as shown in fig. 3.

In the Actor-Critic model structure, the basis of an Actor network is a strategy gradient algorithm, and proper actions can be selected from continuous actions according to the current state; the Critic network is based on a DQN equivalent function method, a punishment value caused by state transition after the action is executed is calculated, and whether the action is reasonable is evaluated.

The Actor-Critic structure can update network parameters in a single step, so that the problem of low model efficiency caused by the round-trip updating of a strategy gradient algorithm is avoided. In a specific interaction process, the Actor network acquires a probability value of each action, and then selects a behavior based on the probability; the Critic network is continuously updated to perfect the reward and punishment value of each action selected under each state; finally, the Actor network updates own parameters according to the punishment and punishment values of the Critic network to the actions, and the new loss function is as follows:

Loss＝r(s _t ,a _t )+γQ(s _t+1 ,μ(s _t+1 |θ ^μ )|θ ^Q )-Q(s _t ,a _t |θ ^Q )

the policy gradient update formula adopted by the Actor network is as follows:

the Critic network updates its parameters by the DQN algorithm, and the gradient update formula is as follows:

wherein λ_μ and λ_Q Learning rates of an Actor network and a Critic network respectively; θ ^μ and θ^Q Parameters of an Actor network and a Critic network respectively; r(s) _t ,a _t ) Representing the state s of the agent in the environment _t Lower execution action a _t The prize value size obtained.

The DEORL algorithm flow is shown below.

According to the theoretical basis, the Actor and the Critic both comprise two structures of an online network and a target network. And generating sample data through interaction between the online strategy network and the environment, storing the data into an experience playback pool, randomly sampling a certain sample from the experience playback pool by the intelligent agent in the next time step, and updating parameters of the online strategy network and the online value network according to the sample data. The algorithm flow of the DEORL is as follows:

(3) Initializing an experience playback pool R;

(5) When t=1, 2, …, T, according to formula a _t ＝μ(s _t |θ ^μ ) +N computing action currently to be performed, environment performing action a _t State transition to s occurs _t+1 And obtains the prize value r _t The current sample (s _t ,a _t ,s _t+1 ,r _t ) Stored in an experience playback pool, and randomly sampled N from the experience playback pool R ₁ Bar sample, as training data for Actor network, critic network, let y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ) Updating Critic network parameters by minimizing loss function L

wherein ,y_i For the target Q value, y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ ). In calculating y _i When gamma represents discount coefficient, gamma is 0,1]Using a target value network Q' (s, a|theta) ^Q′ ) And a target policy network μ' (s|θ) ^μ′ ) The Q network can be kept stable in the training process, and is easier to converge.

Calculating the gradient of the strategy network:

updating the target network μ 'and Q':

θ ^μ′ ＝τθ ^μ +(1-τ)θ ^μ′

θ ^Q′ ＝τθ ^Q +(1-τ)θ ^Q′

wherein ,θ^μ and θ^Q Parameters of the current policy network and the current value network respectively, theta ^μ′ and θ^Q′ Parameters of the target policy network and the target value network, respectively. T.epsilon.0, 1]To update the coefficients, the update step size is represented, balancing between the current network parameters and the target network parameters.

In short, the DEORL algorithm is based on an Actor-Critic network framework, and the iterative training of each network is completed through the interaction among the environment, the Actor network and the Critic network under the round and time steps of the loop.

4. Acceleration process optimizing control based on DEORL algorithm

On the premise of ensuring the safe operation of the variable cycle engine, the DEORL is adopted to perform optimizing control on the acceleration process of a certain variable cycle engine, and on the premise of ensuring the safe operation of the variable cycle engine, the improved simplex method can effectively shorten the acceleration time, so as to achieve the aim of optimizing.

The acceleration time of a variable cycle engine is defined as

Wherein: i is the rotational inertia of the rotor; n is n _max Is the rotational speed at the end of the acceleration process; n is n _idle Is the rotating speed of the slow car; ΔN _ac Is the remaining power of the turbine during acceleration.

From the above equation, it can be seen that: the factor determining the acceleration time is mainly the turbine remaining power ΔN during acceleration _ac . While the residual power of the turbine is mainly determined by the high-pressure rotor speed n _H And total temperature before high-pressure turbine T _t4 . To shorten the acceleration time, the remaining power of the turbine must be increased, that is, the high-pressure rotor speed of the variable cycle engine must be increased and the temperature after the combustion chamber must be increased. Therefore, the invention selects the high-pressure rotor rotating speed n _H And total temperature before high-pressure turbine T _t4 As an objective function of the optimization control of the acceleration process. The mathematical expression of the objective function is as follows:

in the above, n _Hd For the target rotational speed of the high-pressure rotor, n _H Is the actual rotational speed of the high pressure rotor. T (T) _t4d For a target total temperature before the high-pressure turbine, T _t4 Is the actual total temperature before the high-pressure turbine.

To ensure stable operation of the variable cycle engine during acceleration, the constraint conditions considered by the invention are: the temperature before the turbine is not over-temperature, the high-pressure compressor is not surge, the high-pressure rotor is not over-rotated, the fan is not over-rotated, the combustion chamber is not rich in oil and extinguished, the oil supply quantity of the main combustion chamber is not over the maximum oil supply quantity of the main combustion chamber, and the like.

Taking into account the influence of objective functions, constraints and control variables, one needs to find a set of suitable W _f ，A ₉ Dvgl, dvgh, minimizes variable cycle engine acceleration time, i.e., requires solving the following nonlinear constraint problem:

wherein the variables are controlledAll the variables take initial values within the corresponding change ranges.

The invention adopts a multi-objective optimal control method, and adopts a linear weighting method to convert a multi-objective function into a single-objective function so as to determine an optimal objective function. I.e.

Discretizing and normalizing the above formula. The purpose of this process is to eliminate the effect of the difference in the dimension and magnitude range of the parameters in the objective function on the optimization result. The final optimizing objective function can be written in the form:

wherein ω＝[ω₁ ,ω ₂ ,ω ₃ ,ω ₄ ,ω ₅ ,ω ₆ ,ω ₇ ,ω ₈ ,ω ₉ ,ω ₁₀ ,ω ₁₁ ]Adjusting a coefficient matrix for the weights of the constraint function, wherein ω ₁ ,ω ₂ ,ω ₃ ,ω ₄ ,ω ₅ ,ω ₆ ,ω ₇ ,ω ₈ ,ω ₉ ,ω ₁₀ ,ω ₁₁ The weight coefficient can be adjusted for the corresponding constraint condition, and the design of omega.g (x) is used for meeting the constraint condition of the variable cycle engine.

The deep exploration optimization reinforcement learning (Deep Exploration and Optimization Reinforcement Learning, DEORL) algorithm combines with the Actor-Critic model and the DQN idea deep reinforcement learning algorithm, and utilizes experience playback and a target network to improve algorithm performance. The experience playback technology can effectively improve the data utilization rate and relieve the relevance among sample data, so that the problems of unstable training and difficult convergence of a network are avoided. Exploratory expansion techniques achieve a balance of deep exploration and utilization of the environment by adding noise. The Actor-Critic model combines the advantages of the strategy gradient and value function method, can update network parameters in a single step, improves algorithm efficiency, and simultaneously avoids the problem that the strategy gradient algorithm converges to a local optimal solution. The algorithm solves the problem of optimizing and controlling the minimum oil consumption of the variable cycle engine by executing a series of instructions and operations by utilizing programming language, data structures and other computer science technologies. The algorithm can efficiently and accurately complete tasks and can exhibit advantages in terms of large-scale data processing and complex computation. The correctness, reliability and efficiency of the algorithm are verified and tested in a computer environment. And finally, the DEORL is applied to optimizing control of the accelerating process of the variable cycle engine, so that the optimal control of the accelerating process of the variable cycle engine is realized, the accelerating process performance of the variable cycle engine is improved, and the maneuverability of the aircraft is improved.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.

Claims

1. The optimal control method for the acceleration process of the variable cycle engine based on the computer is characterized by comprising the following steps of:

1) Establishing a nonlinear mathematical model of the variable cycle engine; the nonlinear mathematical model of the variable cycle engine is as follows:

S _t ＝f(a _t )

wherein ,to control the input vector, and also the output of the strategic network, including the mode select shutter MSV opening MSV, regulating the main fuel flow W _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vane angle dvgh,/>The method comprises the steps that as an output vector, the state of a variable cycle engine at the current time is represented, and the state comprises a fuel consumption rate sfc and variable cycle engine thrust F, F (·) which are nonlinear vector functions for generating system output;

2) Determining a corresponding objective function and a constraint function according to a variable cycle engine acceleration process; the variable cycle engine acceleration process considers constraints: the mathematical description of the optimization problem is as follows, wherein the temperature before the turbine is not over-temperature, the high-pressure compressor is not surging, the high-pressure rotor is not over-rotating, the fan is not over-rotating, the combustion chamber is not rich in oil and extinguishes, and the oil supply quantity of the main combustion chamber is not over the maximum oil supply quantity of the main combustion chamber:

wherein the control variable a _t ＝[msv,W _f ,A ₉ ,dvgl,dvgh] ^T All the variables take initial values within the corresponding change range, J ₁ For the first optimization objective, J ₂ For the second optimization objective, T _t4 Indicating total temperature after combustion chamber of variable cycle engine, n _F Indicating variable cycle engine fan speed, n _H Represents the rotating speed of a high-pressure rotor of a variable cycle engine, n _Hd The expected value of the high-pressure rotor rotating speed of the variable cycle engine is represented, SMF represents the surge margin of a fan of the variable cycle engine, SMC represents the surge margin of a compressor of the variable cycle engine, f represents the oil-gas ratio of the variable cycle engine, and p _t3 Representing total pressure after compressor of variable cycle engine, a and b respectively represent control variable a _t Lower and upper limits of (2)

3) Optimizing calculation by DEORL;

4) And outputting the optimal control variable to the variable cycle engine.

2. The method for optimally controlling the acceleration process of a variable cycle engine based on a computer as claimed in claim 1, wherein in the step 2), the corresponding objective function and constraint function are determined, and a linear weighting method is adopted to convert the multiple objective functions into a single objective function so as to determine an optimized objective function; namely:

discretizing and normalizing the above to eliminate the influence of the difference of the dimension and magnitude change range of each parameter in the objective function on the optimization result, and writing the final optimizing objective function into the following form:

in the above, ω _a and ω_b For the weight coefficient of the corresponding objective function, satisfy omega _a ≥0,ω _b Not less than 0, the size of the target function reflects the importance degree of the corresponding optimizing target function in the multi-target optimizing problem;

above g _i (x) (i=1, 2,.,. 11) constitutes a constraint function matrix g (x), the objective function being, taking into account the constraint conditions:

3. The optimal control method for the acceleration process of the variable cycle engine based on the computer as set forth in claim 1, wherein in the step 3), the optimization calculation is performed by using DEORL, and the algorithm flow is as follows:

(3) Initializing an experience playback pool R;

calculating the gradient of the strategy network:

updating the target policy network μ 'and the target value network Q':

θ ^μ′ ＝τθ ^μ +(1-τ)θ ^μ′

θ ^Q′ ＝τθ ^Q +(1-τ)θ ^Q′

4. A method for optimizing control of acceleration of a computer-based variable cycle engine as set forth in claim 1, wherein in step 4), said control variable is a mode select valve MSV opening MSV, regulating main fuel flow W _f Area A of tail nozzle ₉ Fan vane angle dvgl and compressor vane angle dvgh.