CN116123124A

CN116123124A - Deep reinforcement learning-based active surge control method and system for gas compressor

Info

Publication number: CN116123124A
Application number: CN202310113139.6A
Authority: CN
Inventors: 张兴龙; 张天宏; 黄向华; 盛汉霖; 庞淑伟
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-05-16

Abstract

The invention discloses a method and a system for controlling active surge of a gas compressor based on deep reinforcement learning, which relate to the field of control of active stability of an aeroengine and comprise the following steps: (1) establishing a mathematical model of the compressor with an executing mechanism; (2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor; (3) training the agent using a soft actor-critique algorithm; (4) After training, the weight parameters of the action network are fixed, and the weight parameters are deployed to the electronic controller for online application. The invention takes deep reinforcement learning as a new thought for solving complex nonlinearity in the design of the active surge control system of the compressor, improves the self-adaptability and the robustness of the controller and greatly reduces the design difficulty of the controller.

Description

Deep reinforcement learning-based active surge control method and system for gas compressor

Technical Field

The invention relates to the technical field of aero-engine active stability control, in particular to a method and a system for controlling active surge of a gas compressor based on deep reinforcement learning.

Background

Surge, a typical flow regime for compressor instability, has a great impact on the performance and safety of aircraft engines and, in severe cases, will also cause engine failure, leading to catastrophic consequences. In order to avoid rotating stall and surge of the compressor beyond an unstable boundary, early people often take passive anti-surge measures, namely, enough surge margin is reserved when the compressor is designed. The control method is an open-loop control thought, can reduce the possibility of an unstable state to a certain extent, but also greatly limits the range of the working flow and the pressure ratio of the compressor, and sacrifices the performance and the operation efficiency. With the intensive research of unstable characteristics of the compressor, the idea of active surge control has been developed, namely, the formation and development of pressure or flow disturbance in a flow field at the early stage of surge of the compressor are inhibited by the feedback control of an oscillating blade row, high-pressure jet, a loudspeaker, a piston damping mechanism, a controllable regulating valve, a close-coupled valve, a throttle valve and other actuating mechanisms, so that the compressor stably operates in a high-pressure ratio and high-efficiency area in a surge boundary.

The patent with publication number CN113279997A proposes an aeroengine surge active control system based on controller fuzzy switching, a mode control method of Lyapunov stability theory is used for designing a plurality of basic controllers suitable for different working ranges, and control signals of the basic controllers are weighted and fused according to the fuzzy switching principle so as to determine a final control quantity; the patent of publication No. CN109339954A proposes an active control method based on aerodynamic instability of an aero-engine compressor component, and a surge active controller is designed by using the estimated feedback of a compressor pressure coefficient and a flow coefficient and combining bifurcation theory. The method designs the controller from the aspect of the characteristic analysis of the compressor model, so that the whole system meets the Lyapunov stability. Although these methods are capable of controlling the compressor to operate in a stable state to some extent, they require accurate model parameters and set constraints, in other words, do not take into account external disturbances and model uncertainties, resulting in that they are not robust. In addition, the nonlinear control algorithms involve multiple matrix derivation and inversion calculations, have high computational complexity and high requirements on the on-line computing capacity of the surge control system, and therefore have limitations when deployed to engineering applications.

Disclosure of Invention

In order to solve the problems, the invention provides a method and a system for controlling active surging of a gas compressor based on deep reinforcement learning, so as to realize robust self-adaptive optimal control under the conditions of uncertainty of a gas compressor model and external disturbance.

The technical scheme provided by the invention is as follows:

a method and a system for controlling active surging of a gas compressor based on deep reinforcement learning comprise the following steps:

(1) Establishing a mathematical model of the air compressor with an executing mechanism;

(2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor;

(3) Training the agent using a soft actor-critique algorithm;

(4) And deploying the action network of the trained intelligent agent to the electronic controller for online application.

Establishing a mathematical model of the compressor with the executing mechanism, and identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using real physical characteristic data of the compressor; and establishing a final compressor surge dynamic model according to the identified compressor characteristic items.

Further, the real compressor physical characteristic data describe the compressor pressure ratio pi when the flow is m under different relative percentage rotating speeds n of the compressor;

further, the identification method of the characteristic items of the air compressor is to carry out dimensionless on the pressure ratio and the flow in the physical characteristic data of the real air compressor to obtain a pressure coefficient and a flow coefficient:

wherein phi is the flowA coefficient of quantity; psi is the pressure coefficient; ρ is the gas density inside the compressor; a is that _c Is the equivalent sectional area of the internal flow passage of the compressor; u is the linear speed of the rim at the middle diameter of the compressor rotor; p is p ₀ Is the ambient pressure;

data fitting was then performed using a least squares method using a cubic surface equation:

wherein, psi (phi, n) is the characteristic item of the compressor; a, a ₀ 、a ₁ 、b ₀ 、b ₁ 、c ₀ 、c ₁ Fitting coefficients;

further, the actuating mechanism is a close-coupled valve, and the final mathematical model of the compressor is as follows:

wherein u is the control quantity input to the close-coupled valve actuating mechanism and is the model input; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a characteristic B parameter; l (L) _c Is the equivalent length of the compressor; gamma ray _T Is the throttle opening; d, d _φ And d _ψ The flow coefficient and the pressure coefficient are respectively disturbed and uncertain.

The step (2) of establishing a deep reinforcement learning agent simulation training environment facing the active surge control task of the compressor comprises the following steps:

step (2.1), giving a reference instruction p for the pressure ratio coefficient of the compressor _ref As a target of the active surge control of the compressor, the pressure ratio coefficient reference instruction is used for ensuring that the pressure ratio of the compressor smoothly transits to a low flow area outside the surge boundary, and is given by the following formula:

where τ is the inertia coefficient; c is the pressure coefficient reference command end value.

Step (2.2), selecting the observation variable O as k historical control periods and the pressure ratio coefficient reference instruction p in the current control period _ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient _ref - ψ and its integral eint and derivative edot, then the variable O is observed at time t _t Represented as a multi-element time series matrix of dimensions (k+1, 6).

Step (2.3), designing an action network and an evaluation network of the intelligent agent;

action network pi of the agent _θ The system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and is respectively output through two branch networks; the branch network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantity

Outputting standard deviation sigma of 2-bit control quantity of the layer;

the intelligent agent comprises two evaluation networks with identical structures, which are respectively

And evaluation network->

Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; the outputs of the branch network 1 and the branch network 2 are spliced into high-dimensional vectors by a concat layer and then sequentially pass through LSTM circulationA loop neural network layer, a relu activation function, a fully connected neural network layer and an output layer; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment _t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity to obtain the hope of rewards.

Step (2.4), designing a reward function r according to observed quantity, wherein the reward function r is specifically as follows:

r＝r ₁ +r ₂ +r ₃

wherein:

based on the simulation training environment of the deep reinforcement learning agent constructed in the step 2), training the agent by using a soft actor-critique algorithm, wherein the method specifically comprises the following steps:

step (3.1), establishing a target evaluation network

And a target evaluation network->

Its structure is respectively associated with an evaluation network>

And->

The same;

step (3.2), initializing evaluation network by random parameters respectively

Evaluation network->

And action network pi _θ Weight parameter w of (2) ₁ ，w ₂ And θ; reuse->

Weight initialization target evaluation network of +.>

Use->

Weight initialization target evaluation network of +.>

Initializing an experience playback pool R, setting a training round number E, a simulation time T, a simulation sampling step length delta T, a training cycle number lambda, a defined discount factor gamma and an exponential average movement coefficient tau;

step (3.4), the round circulation is started;

step (3.5), simulation starts;

step (3.6), at the time t of the simulation, O is added _t+1 Inputting the action network to obtain the control quantity u _t The method comprises the steps of carrying out a first treatment on the surface of the Execution u _t Calculating a prize r _t At this time, the environmental state becomes O _t+1 ；

Step (3.7), step (O) _t+1 ，u _t ，r _t ，O _t+1 ) Storing the sample as a sample in an experience playback pool R;

step (3.8), training cycle is started;

step (3.9), sampling N samples from R, and updating weight parameters of all networks:

/>

step (3.10), executing step (3.9) until the training cycle is finished;

step (3.11), the simulation of the execution steps (3.5) - (3.10) is finished;

step (3.12), executing the steps (3.4) - (3.11) until the round circulation is finished;

and (4) after training, fixing the weight parameters of the action network and deploying the weight parameters to the electronic controller. In each control period, the electronic controller receives the observation variable from the air compressor in real time, inputs the observation variable into the trained action network, and finally outputs a control signal to the air compressor close-connected valve to ensure the stable operation of the air compressor;

further, the control amount standard deviation sigma=0 of the output of the action network is set in the electronic controller, so that the control amount average value of the output of the action network is set

Directly as the final control signal u.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

the method and the system for controlling the active surge of the air compressor based on the deep reinforcement learning can ensure that the pressure ratio coefficient instruction is accurately tracked when the air compressor has complex nonlinearity and under external disturbance, effectively enlarge the stable working range of the air compressor and ensure the high-efficiency and reliable work of the aeroengine; meanwhile, the invention solves the problem of complex manual design of the active surge controller of the compressor by means of strong self-learning capability of deep reinforcement learning.

Drawings

FIG. 1 is a schematic diagram of the present invention;

FIG. 2 (a) is a graph showing actual compressor physical characteristic data in an embodiment of the present invention;

FIG. 2 (b) is non-dimensionalized compressor characterization data in accordance with an embodiment of the present invention;

FIG. 2 (c) is a cubic curve fitted by dimensionless compressor characterization data in an embodiment of the present invention;

FIG. 3 (a) is a diagram illustrating the operation network of the deep reinforcement learning agent according to the present invention;

FIG. 3 (b) is a diagram of an evaluation network of a deep reinforcement learning agent according to the present invention;

FIG. 4 is a graph showing the change of the reward function of the training process of the intelligent agent according to the embodiment of the present invention;

fig. 5 is a graph comparing control effects of three compressor active surge control methods.

Detailed Description

The technical scheme of the invention is further explained below by a specific embodiment with reference to the attached drawings.

Referring to fig. 1, a method and a system for controlling active surge of a compressor based on deep reinforcement learning comprise the following steps:

identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using the physical characteristic data of the compressor as shown in fig. 2 (a), wherein the physical characteristic data of the compressor describes the distribution condition of the pressure ratio pi of the compressor when the flow rate is m at the relative percentage rotating speed n of the compressor;

the pressure ratio and the flow rate in the physical characteristic data of the real air compressor are subjected to the following dimensionless treatment to obtain a pressure coefficient and a flow coefficient which are shown in the figure 2 (b);

wherein phi is the flow coefficient; psi is the pressure coefficient; ρ is the density of the gas in the compressor, 1.225kg/m is taken ³ ；A _c Is taken as 0.0291m for the equivalent sectional area of the internal flow passage of the air compressor ² The method comprises the steps of carrying out a first treatment on the surface of the U is the linear speed of the rim at the middle diameter of the compressor rotor, and 927.63m/s is taken; p is p ₀ Taking 100kPa for the environmental pressure;

and finally, adopting the following cubic surface equation to perform data fitting by using a least square method.

Wherein psi (phi, n) is the characteristic item of the compressor after fitting; a, a ₀ 、a ₁ 、b ₀ 、b ₁ 、c ₀ 、c ₁ Fitting coefficients;

fig. 2 (c) shows a cubic surface fitted in this embodiment:

the mathematical model of the compressor finally comprising the close-coupled valve actuating mechanism is as follows:

wherein u is the control quantity input to the close-coupled valve actuating mechanism and is the model input; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a dimensionless B parameter, and 1.8 is taken; l (L) _c Is compressed airTaking the equivalent length of the machine; gamma ray _T Taking 0.6 as the opening of the throttle valve; d, d _φ And d _ψ The flow coefficient and pressure coefficient disturbance and uncertainty act respectively, in this embodiment given by:

d _ψ ＝0.02sin(0.1t)+0.02cos(0.4t)

d _φ ＝0.02sin(0.1t)+0.02cos(0.4t)

wherein τ is an inertia coefficient, 0.05; c is the pressure coefficient reference command final value, and 0.6 is taken; when solving the equation, p _ref The initial value is set to 0.6568;

step (2.2), selecting the observation variable O as 3 historical control periods and the pressure ratio coefficient reference instruction p in the current control period _ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient _ref - ψ and integral eint and derivative edot thereof; observing variable O at time t _t A multi-element time series matrix expressed as dimensions (4, 6):

action network pi of the agent _θ As shown in fig. 3 (a), the system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and then outputs through two branch networks respectively; branching netThe network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the number of neurons of the full-connection layer and the LSTM layer of the action network is 128; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantity

The standard deviation sigma of the layer 2 bit control quantity is output.

The agent comprises two evaluation networks with identical structures, as shown in FIG. 3 (b), respectively

And evaluation network->

Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; after the outputs of the branch network 1 and the branch network 2 are spliced into high-dimensional vectors by the concat layer, the high-dimensional vectors sequentially pass through the LSTM circulating neural network layer, the relu activation function, the fully-connected neural network layer and the output layer; the number of neurons of the full-connection layer and the LSTM layer of the evaluation network is 128; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment _t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity to obtain the hope of rewards.

r＝r ₁ +r ₂ +r ₃

wherein:

(3) Training the agent using a soft actor-critique algorithm;

step (3.1), establishing a target evaluation network

And a target evaluation network->

Its structure is respectively associated with an evaluation network>

And->

The same;

step (3.2), initializing evaluation network by random parameters respectively

Evaluation network->

Weight initialization target evaluation network of +.>

Use->

Weight initialization target evaluation network of +.>

Step (3.3), initializing an experience playback pool R, setting a training round number e=2000, a simulation time t=300 s, a simulation sampling step size Δt=0.02 s, a training round number λ=0.99, a defined discount factor γ=0.99, and an exponential average movement coefficient τ=0.5:

step (3.4), the round circulation is started;

step (3.5), simulation starts;

step (3.8), training cycle is started;

step (3.9), sampling N samples from R, calculating a loss function and updating weight parameters:

step (3.10), executing step (3.9) until the training cycle is finished;

step (3.11), the simulation of the execution steps (3.5) - (3.10) is finished;

fig. 4 shows the change in the bonus function during training of the agent using the soft actor-critique algorithm, which shows that the bonus value quickly converges to a higher level over 2000 rounds of training.

(4) After training, the weight parameters of the action network are fixed and deployed to an electronic controller, the on-line application mode is that in each control period, the electronic controller receives the observation variable from the air compressor in real time, inputs the observation variable to the trained action network, and sets the standard deviation sigma=0 of the control quantity output by the action network, so that the average value of the control quantity output by the action network

Directly output to the compressor close-coupled valve as a final control signal.

In order to verify the effectiveness of the method and system for controlling the active surge of the compressor, the method and the sliding mode control method and the fuzzy backstepping control method are compared based on the mathematical model of the compressor in the embodiment, and the result is shown in fig. 5. It can be seen that when external disturbance and model uncertainty exist, the sliding mode control method and the fuzzy backstepping control method enable the surge to occur in the process of tracking the pressure coefficient of the air compressor to a given reference instruction, so that the pressure coefficient cannot track the given instruction and has obvious fluctuation, and large dynamic and steady-state errors exist in the whole process, so that the air compressor is in an unstable working state; in contrast, the method for controlling the active surge of the gas compressor based on the deep reinforcement learning can ensure that the gas compressor tracks the given pressure coefficient reference instruction in the whole process, and ensures the stable operation of the gas compressor.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. The method and the system for controlling the active surging of the gas compressor based on the deep reinforcement learning are characterized by comprising the following steps:

(3) Training the agent using a soft actor-critique algorithm;

2. The method and system for controlling the active surging of the compressor based on the deep reinforcement learning as claimed in claim 1, wherein the step (1) of establishing the mathematical model of the compressor with the executing mechanism comprises the following steps: identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using the real physical characteristic data of the compressor; and then establishing a final mathematical model of the compressor according to the identified characteristic items of the compressor.

3. The method and system for controlling the active surge of the compressor based on the deep reinforcement learning as defined in claim 1, wherein the establishing the deep reinforcement learning agent simulation training environment facing the active surge control task of the compressor in the step (2) comprises:

where τ is the inertia coefficient; c is the pressure coefficient reference command end value;

step (2.2), selecting the observation variable O as k historical control periods and the pressure ratio coefficient reference instruction p in the current control period _ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient _ref - ψ and integral eint and derivative edot thereof, observing variable O at time t _t A multi-element time series matrix expressed as dimension (k+1, 6):

Outputting standard deviation sigma of 2-bit control quantity of the layer;

And evaluation network->

Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; the outputs of the branch network 1 and the branch network 2 are composed of concaAfter the t layers are spliced into high-dimensional vectors, the high-dimensional vectors sequentially pass through an LSTM circulating neural network layer, a relu activation function, a fully-connected neural network layer and an output layer; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment _t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity which can obtain the expectations of rewards;

r＝r ₁ +r ₂ +r ₃

wherein:

4. the method and system for controlling the active surging of the air compressor based on deep reinforcement learning as claimed in claim 1, wherein after the training in the step (4) is completed, the weight parameters of the action network are fixed and deployed to the electronic controller, and in each control period, the electronic controller receives the observation variables from the air compressor in real time and inputs the observation variables to the trained action network; given a standard deviation sigma=0 of the control quantity output by the action network, the average value of the control quantity output by the action network is calculated

Directly as the final control signal u; and outputting a final control signal to the compressor executing mechanism to control the compressor to stably track the pressure coefficient reference instruction.

5. The method for controlling the active surging of the gas compressor based on the deep reinforcement learning as claimed in claim 2, wherein the method for identifying the characteristic items of the gas compressor is characterized in that the pressure ratio pi and the flow m in the physical characteristic data of the real gas compressor are dimensionless to obtain a pressure coefficient and a flow coefficient, and then a cubic surface equation is adopted to perform data fitting by using a least square method:

wherein m is the physical flow of the compressor; pi is the physical pressure ratio of the compressor; phi is the flow coefficient; psi is the pressure coefficient; ρ is the gas density inside the compressor; a is that _c Is the equivalent sectional area of the internal flow passage of the compressor; u is the linear speed of the rim at the middle diameter of the compressor rotor; p is p ₀ Is the ambient pressure; ψ (phi, n) is the compressor characteristic term; a, a ₀ 、a ₁ 、b ₀ 、b ₁ 、c ₀ 、c ₁ Is the fitting coefficient.

6. The method and system for controlling the active surging of the compressor based on deep reinforcement learning as claimed in claim 2, wherein the actuator is a close-coupled valve, and the mathematical model of the final compressor comprising the actuator is as follows:

the control quantity of the close-connected valve actuating mechanism is input as a model; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a characteristic B parameter; l (L) _c Is the equivalent length of the compressor; gamma ray _T Is the throttle opening; d, d _φ And d _ψ The flow coefficient and the pressure coefficient are respectively disturbed and uncertain.