CN116123124A - Deep reinforcement learning-based active surge control method and system for gas compressor - Google Patents
Deep reinforcement learning-based active surge control method and system for gas compressor Download PDFInfo
- Publication number
- CN116123124A CN116123124A CN202310113139.6A CN202310113139A CN116123124A CN 116123124 A CN116123124 A CN 116123124A CN 202310113139 A CN202310113139 A CN 202310113139A CN 116123124 A CN116123124 A CN 116123124A
- Authority
- CN
- China
- Prior art keywords
- compressor
- network
- coefficient
- layer
- control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000002787 reinforcement Effects 0.000 title claims abstract description 26
- 230000009471 action Effects 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000004088 simulation Methods 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 238000013178 mathematical model Methods 0.000 claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 7
- 238000011156 evaluation Methods 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 27
- 239000003795 chemical substances by application Substances 0.000 claims description 25
- 230000004913 activation Effects 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000005251 gamma ray Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 abstract description 4
- 230000001276 controlling effect Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F04—POSITIVE - DISPLACEMENT MACHINES FOR LIQUIDS; PUMPS FOR LIQUIDS OR ELASTIC FLUIDS
- F04D—NON-POSITIVE-DISPLACEMENT PUMPS
- F04D27/00—Control, e.g. regulation, of pumps, pumping installations or pumping systems specially adapted for elastic fluids
- F04D27/02—Surge control
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F05—INDEXING SCHEMES RELATING TO ENGINES OR PUMPS IN VARIOUS SUBCLASSES OF CLASSES F01-F04
- F05D—INDEXING SCHEME FOR ASPECTS RELATING TO NON-POSITIVE-DISPLACEMENT MACHINES OR ENGINES, GAS-TURBINES OR JET-PROPULSION PLANTS
- F05D2270/00—Control
- F05D2270/70—Type of control algorithm
- F05D2270/709—Type of control algorithm with neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Mechanical Engineering (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Positive-Displacement Air Blowers (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a method and a system for controlling active surge of a gas compressor based on deep reinforcement learning, which relate to the field of control of active stability of an aeroengine and comprise the following steps: (1) establishing a mathematical model of the compressor with an executing mechanism; (2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor; (3) training the agent using a soft actor-critique algorithm; (4) After training, the weight parameters of the action network are fixed, and the weight parameters are deployed to the electronic controller for online application. The invention takes deep reinforcement learning as a new thought for solving complex nonlinearity in the design of the active surge control system of the compressor, improves the self-adaptability and the robustness of the controller and greatly reduces the design difficulty of the controller.
Description
Technical Field
The invention relates to the technical field of aero-engine active stability control, in particular to a method and a system for controlling active surge of a gas compressor based on deep reinforcement learning.
Background
Surge, a typical flow regime for compressor instability, has a great impact on the performance and safety of aircraft engines and, in severe cases, will also cause engine failure, leading to catastrophic consequences. In order to avoid rotating stall and surge of the compressor beyond an unstable boundary, early people often take passive anti-surge measures, namely, enough surge margin is reserved when the compressor is designed. The control method is an open-loop control thought, can reduce the possibility of an unstable state to a certain extent, but also greatly limits the range of the working flow and the pressure ratio of the compressor, and sacrifices the performance and the operation efficiency. With the intensive research of unstable characteristics of the compressor, the idea of active surge control has been developed, namely, the formation and development of pressure or flow disturbance in a flow field at the early stage of surge of the compressor are inhibited by the feedback control of an oscillating blade row, high-pressure jet, a loudspeaker, a piston damping mechanism, a controllable regulating valve, a close-coupled valve, a throttle valve and other actuating mechanisms, so that the compressor stably operates in a high-pressure ratio and high-efficiency area in a surge boundary.
The patent with publication number CN113279997A proposes an aeroengine surge active control system based on controller fuzzy switching, a mode control method of Lyapunov stability theory is used for designing a plurality of basic controllers suitable for different working ranges, and control signals of the basic controllers are weighted and fused according to the fuzzy switching principle so as to determine a final control quantity; the patent of publication No. CN109339954A proposes an active control method based on aerodynamic instability of an aero-engine compressor component, and a surge active controller is designed by using the estimated feedback of a compressor pressure coefficient and a flow coefficient and combining bifurcation theory. The method designs the controller from the aspect of the characteristic analysis of the compressor model, so that the whole system meets the Lyapunov stability. Although these methods are capable of controlling the compressor to operate in a stable state to some extent, they require accurate model parameters and set constraints, in other words, do not take into account external disturbances and model uncertainties, resulting in that they are not robust. In addition, the nonlinear control algorithms involve multiple matrix derivation and inversion calculations, have high computational complexity and high requirements on the on-line computing capacity of the surge control system, and therefore have limitations when deployed to engineering applications.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for controlling active surging of a gas compressor based on deep reinforcement learning, so as to realize robust self-adaptive optimal control under the conditions of uncertainty of a gas compressor model and external disturbance.
The technical scheme provided by the invention is as follows:
a method and a system for controlling active surging of a gas compressor based on deep reinforcement learning comprise the following steps:
(1) Establishing a mathematical model of the air compressor with an executing mechanism;
(2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor;
(3) Training the agent using a soft actor-critique algorithm;
(4) And deploying the action network of the trained intelligent agent to the electronic controller for online application.
Establishing a mathematical model of the compressor with the executing mechanism, and identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using real physical characteristic data of the compressor; and establishing a final compressor surge dynamic model according to the identified compressor characteristic items.
Further, the real compressor physical characteristic data describe the compressor pressure ratio pi when the flow is m under different relative percentage rotating speeds n of the compressor;
further, the identification method of the characteristic items of the air compressor is to carry out dimensionless on the pressure ratio and the flow in the physical characteristic data of the real air compressor to obtain a pressure coefficient and a flow coefficient:
wherein phi is the flowA coefficient of quantity; psi is the pressure coefficient; ρ is the gas density inside the compressor; a is that c Is the equivalent sectional area of the internal flow passage of the compressor; u is the linear speed of the rim at the middle diameter of the compressor rotor; p is p 0 Is the ambient pressure;
data fitting was then performed using a least squares method using a cubic surface equation:
wherein, psi (phi, n) is the characteristic item of the compressor; a, a 0 、a 1 、b 0 、b 1 、c 0 、c 1 Fitting coefficients;
further, the actuating mechanism is a close-coupled valve, and the final mathematical model of the compressor is as follows:
wherein u is the control quantity input to the close-coupled valve actuating mechanism and is the model input; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a characteristic B parameter; l (L) c Is the equivalent length of the compressor; gamma ray T Is the throttle opening; d, d φ And d ψ The flow coefficient and the pressure coefficient are respectively disturbed and uncertain.
The step (2) of establishing a deep reinforcement learning agent simulation training environment facing the active surge control task of the compressor comprises the following steps:
step (2.1), giving a reference instruction p for the pressure ratio coefficient of the compressor ref As a target of the active surge control of the compressor, the pressure ratio coefficient reference instruction is used for ensuring that the pressure ratio of the compressor smoothly transits to a low flow area outside the surge boundary, and is given by the following formula:
where τ is the inertia coefficient; c is the pressure coefficient reference command end value.
Step (2.2), selecting the observation variable O as k historical control periods and the pressure ratio coefficient reference instruction p in the current control period ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient ref - ψ and its integral eint and derivative edot, then the variable O is observed at time t t Represented as a multi-element time series matrix of dimensions (k+1, 6).
Step (2.3), designing an action network and an evaluation network of the intelligent agent;
action network pi of the agent θ The system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and is respectively output through two branch networks; the branch network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantityOutputting standard deviation sigma of 2-bit control quantity of the layer;
the intelligent agent comprises two evaluation networks with identical structures, which are respectivelyAnd evaluation network->Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; the outputs of the branch network 1 and the branch network 2 are spliced into high-dimensional vectors by a concat layer and then sequentially pass through LSTM circulationA loop neural network layer, a relu activation function, a fully connected neural network layer and an output layer; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity to obtain the hope of rewards.
Step (2.4), designing a reward function r according to observed quantity, wherein the reward function r is specifically as follows:
r=r 1 +r 2 +r 3
wherein:
based on the simulation training environment of the deep reinforcement learning agent constructed in the step 2), training the agent by using a soft actor-critique algorithm, wherein the method specifically comprises the following steps:
step (3.1), establishing a target evaluation networkAnd a target evaluation network->Its structure is respectively associated with an evaluation network>And->The same;
step (3.2), initializing evaluation network by random parameters respectivelyEvaluation network->And action network pi θ Weight parameter w of (2) 1 ,w 2 And θ; reuse->Weight initialization target evaluation network of +.>Use->Weight initialization target evaluation network of +.>
Initializing an experience playback pool R, setting a training round number E, a simulation time T, a simulation sampling step length delta T, a training cycle number lambda, a defined discount factor gamma and an exponential average movement coefficient tau;
step (3.4), the round circulation is started;
step (3.5), simulation starts;
step (3.6), at the time t of the simulation, O is added t+1 Inputting the action network to obtain the control quantity u t The method comprises the steps of carrying out a first treatment on the surface of the Execution u t Calculating a prize r t At this time, the environmental state becomes O t+1 ;
Step (3.7), step (O) t+1 ,u t ,r t ,O t+1 ) Storing the sample as a sample in an experience playback pool R;
step (3.8), training cycle is started;
step (3.9), sampling N samples from R, and updating weight parameters of all networks:
step (3.10), executing step (3.9) until the training cycle is finished;
step (3.11), the simulation of the execution steps (3.5) - (3.10) is finished;
step (3.12), executing the steps (3.4) - (3.11) until the round circulation is finished;
and (4) after training, fixing the weight parameters of the action network and deploying the weight parameters to the electronic controller. In each control period, the electronic controller receives the observation variable from the air compressor in real time, inputs the observation variable into the trained action network, and finally outputs a control signal to the air compressor close-connected valve to ensure the stable operation of the air compressor;
further, the control amount standard deviation sigma=0 of the output of the action network is set in the electronic controller, so that the control amount average value of the output of the action network is setDirectly as the final control signal u.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the method and the system for controlling the active surge of the air compressor based on the deep reinforcement learning can ensure that the pressure ratio coefficient instruction is accurately tracked when the air compressor has complex nonlinearity and under external disturbance, effectively enlarge the stable working range of the air compressor and ensure the high-efficiency and reliable work of the aeroengine; meanwhile, the invention solves the problem of complex manual design of the active surge controller of the compressor by means of strong self-learning capability of deep reinforcement learning.
Drawings
FIG. 1 is a schematic diagram of the present invention;
FIG. 2 (a) is a graph showing actual compressor physical characteristic data in an embodiment of the present invention;
FIG. 2 (b) is non-dimensionalized compressor characterization data in accordance with an embodiment of the present invention;
FIG. 2 (c) is a cubic curve fitted by dimensionless compressor characterization data in an embodiment of the present invention;
FIG. 3 (a) is a diagram illustrating the operation network of the deep reinforcement learning agent according to the present invention;
FIG. 3 (b) is a diagram of an evaluation network of a deep reinforcement learning agent according to the present invention;
FIG. 4 is a graph showing the change of the reward function of the training process of the intelligent agent according to the embodiment of the present invention;
fig. 5 is a graph comparing control effects of three compressor active surge control methods.
Detailed Description
The technical scheme of the invention is further explained below by a specific embodiment with reference to the attached drawings.
Referring to fig. 1, a method and a system for controlling active surge of a compressor based on deep reinforcement learning comprise the following steps:
(1) Establishing a mathematical model of the air compressor with an executing mechanism;
identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using the physical characteristic data of the compressor as shown in fig. 2 (a), wherein the physical characteristic data of the compressor describes the distribution condition of the pressure ratio pi of the compressor when the flow rate is m at the relative percentage rotating speed n of the compressor;
the pressure ratio and the flow rate in the physical characteristic data of the real air compressor are subjected to the following dimensionless treatment to obtain a pressure coefficient and a flow coefficient which are shown in the figure 2 (b);
wherein phi is the flow coefficient; psi is the pressure coefficient; ρ is the density of the gas in the compressor, 1.225kg/m is taken 3 ;A c Is taken as 0.0291m for the equivalent sectional area of the internal flow passage of the air compressor 2 The method comprises the steps of carrying out a first treatment on the surface of the U is the linear speed of the rim at the middle diameter of the compressor rotor, and 927.63m/s is taken; p is p 0 Taking 100kPa for the environmental pressure;
and finally, adopting the following cubic surface equation to perform data fitting by using a least square method.
Wherein psi (phi, n) is the characteristic item of the compressor after fitting; a, a 0 、a 1 、b 0 、b 1 、c 0 、c 1 Fitting coefficients;
fig. 2 (c) shows a cubic surface fitted in this embodiment:
the mathematical model of the compressor finally comprising the close-coupled valve actuating mechanism is as follows:
wherein u is the control quantity input to the close-coupled valve actuating mechanism and is the model input; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a dimensionless B parameter, and 1.8 is taken; l (L) c Is compressed airTaking the equivalent length of the machine; gamma ray T Taking 0.6 as the opening of the throttle valve; d, d φ And d ψ The flow coefficient and pressure coefficient disturbance and uncertainty act respectively, in this embodiment given by:
d ψ =0.02sin(0.1t)+0.02cos(0.4t)
d φ =0.02sin(0.1t)+0.02cos(0.4t)
(2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor;
step (2.1), giving a reference instruction p for the pressure ratio coefficient of the compressor ref As a target of the active surge control of the compressor, the pressure ratio coefficient reference instruction is used for ensuring that the pressure ratio of the compressor smoothly transits to a low flow area outside the surge boundary, and is given by the following formula:
wherein τ is an inertia coefficient, 0.05; c is the pressure coefficient reference command final value, and 0.6 is taken; when solving the equation, p ref The initial value is set to 0.6568;
step (2.2), selecting the observation variable O as 3 historical control periods and the pressure ratio coefficient reference instruction p in the current control period ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient ref - ψ and integral eint and derivative edot thereof; observing variable O at time t t A multi-element time series matrix expressed as dimensions (4, 6):
step (2.3), designing an action network and an evaluation network of the intelligent agent;
action network pi of the agent θ As shown in fig. 3 (a), the system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and then outputs through two branch networks respectively; branching netThe network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the number of neurons of the full-connection layer and the LSTM layer of the action network is 128; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantityThe standard deviation sigma of the layer 2 bit control quantity is output.
The agent comprises two evaluation networks with identical structures, as shown in FIG. 3 (b), respectivelyAnd evaluation network->Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; after the outputs of the branch network 1 and the branch network 2 are spliced into high-dimensional vectors by the concat layer, the high-dimensional vectors sequentially pass through the LSTM circulating neural network layer, the relu activation function, the fully-connected neural network layer and the output layer; the number of neurons of the full-connection layer and the LSTM layer of the evaluation network is 128; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity to obtain the hope of rewards.
Step (2.4), designing a reward function r according to observed quantity, wherein the reward function r is specifically as follows:
r=r 1 +r 2 +r 3
wherein:
(3) Training the agent using a soft actor-critique algorithm;
step (3.1), establishing a target evaluation networkAnd a target evaluation network->Its structure is respectively associated with an evaluation network>And->The same;
step (3.2), initializing evaluation network by random parameters respectivelyEvaluation network->And action network pi θ Weight parameter w of (2) 1 ,w 2 And θ; reuse->Weight initialization target evaluation network of +.>Use->Weight initialization target evaluation network of +.>
Step (3.3), initializing an experience playback pool R, setting a training round number e=2000, a simulation time t=300 s, a simulation sampling step size Δt=0.02 s, a training round number λ=0.99, a defined discount factor γ=0.99, and an exponential average movement coefficient τ=0.5:
step (3.4), the round circulation is started;
step (3.5), simulation starts;
step (3.6), at the time t of the simulation, O is added t+1 Inputting the action network to obtain the control quantity u t The method comprises the steps of carrying out a first treatment on the surface of the Execution u t Calculating a prize r t At this time, the environmental state becomes O t+1 ;
Step (3.7), step (O) t+1 ,u t ,r t ,O t+1 ) Storing the sample as a sample in an experience playback pool R;
step (3.8), training cycle is started;
step (3.9), sampling N samples from R, calculating a loss function and updating weight parameters:
step (3.10), executing step (3.9) until the training cycle is finished;
step (3.11), the simulation of the execution steps (3.5) - (3.10) is finished;
step (3.12), executing the steps (3.4) - (3.11) until the round circulation is finished;
fig. 4 shows the change in the bonus function during training of the agent using the soft actor-critique algorithm, which shows that the bonus value quickly converges to a higher level over 2000 rounds of training.
(4) After training, the weight parameters of the action network are fixed and deployed to an electronic controller, the on-line application mode is that in each control period, the electronic controller receives the observation variable from the air compressor in real time, inputs the observation variable to the trained action network, and sets the standard deviation sigma=0 of the control quantity output by the action network, so that the average value of the control quantity output by the action networkDirectly output to the compressor close-coupled valve as a final control signal.
In order to verify the effectiveness of the method and system for controlling the active surge of the compressor, the method and the sliding mode control method and the fuzzy backstepping control method are compared based on the mathematical model of the compressor in the embodiment, and the result is shown in fig. 5. It can be seen that when external disturbance and model uncertainty exist, the sliding mode control method and the fuzzy backstepping control method enable the surge to occur in the process of tracking the pressure coefficient of the air compressor to a given reference instruction, so that the pressure coefficient cannot track the given instruction and has obvious fluctuation, and large dynamic and steady-state errors exist in the whole process, so that the air compressor is in an unstable working state; in contrast, the method for controlling the active surge of the gas compressor based on the deep reinforcement learning can ensure that the gas compressor tracks the given pressure coefficient reference instruction in the whole process, and ensures the stable operation of the gas compressor.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
Claims (6)
1. The method and the system for controlling the active surging of the gas compressor based on the deep reinforcement learning are characterized by comprising the following steps:
(1) Establishing a mathematical model of the air compressor with an executing mechanism;
(2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor;
(3) Training the agent using a soft actor-critique algorithm;
(4) And deploying the action network of the trained intelligent agent to the electronic controller for online application.
2. The method and system for controlling the active surging of the compressor based on the deep reinforcement learning as claimed in claim 1, wherein the step (1) of establishing the mathematical model of the compressor with the executing mechanism comprises the following steps: identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using the real physical characteristic data of the compressor; and then establishing a final mathematical model of the compressor according to the identified characteristic items of the compressor.
3. The method and system for controlling the active surge of the compressor based on the deep reinforcement learning as defined in claim 1, wherein the establishing the deep reinforcement learning agent simulation training environment facing the active surge control task of the compressor in the step (2) comprises:
step (2.1), giving a reference instruction p for the pressure ratio coefficient of the compressor ref As a target of the active surge control of the compressor, the pressure ratio coefficient reference instruction is used for ensuring that the pressure ratio of the compressor smoothly transits to a low flow area outside the surge boundary, and is given by the following formula:
where τ is the inertia coefficient; c is the pressure coefficient reference command end value;
step (2.2), selecting the observation variable O as k historical control periods and the pressure ratio coefficient reference instruction p in the current control period ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient ref - ψ and integral eint and derivative edot thereof, observing variable O at time t t A multi-element time series matrix expressed as dimension (k+1, 6):
step (2.3), designing an action network and an evaluation network of the intelligent agent;
action network pi of the agent θ The system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and is respectively output through two branch networks; the branch network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantityOutputting standard deviation sigma of 2-bit control quantity of the layer;
the intelligent agent comprises two evaluation networks with identical structures, which are respectivelyAnd evaluation network->Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; the outputs of the branch network 1 and the branch network 2 are composed of concaAfter the t layers are spliced into high-dimensional vectors, the high-dimensional vectors sequentially pass through an LSTM circulating neural network layer, a relu activation function, a fully-connected neural network layer and an output layer; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity which can obtain the expectations of rewards;
step (2.4), designing a reward function r according to observed quantity, wherein the reward function r is specifically as follows:
r=r 1 +r 2 +r 3
wherein:
4. the method and system for controlling the active surging of the air compressor based on deep reinforcement learning as claimed in claim 1, wherein after the training in the step (4) is completed, the weight parameters of the action network are fixed and deployed to the electronic controller, and in each control period, the electronic controller receives the observation variables from the air compressor in real time and inputs the observation variables to the trained action network; given a standard deviation sigma=0 of the control quantity output by the action network, the average value of the control quantity output by the action network is calculatedDirectly as the final control signal u; and outputting a final control signal to the compressor executing mechanism to control the compressor to stably track the pressure coefficient reference instruction.
5. The method for controlling the active surging of the gas compressor based on the deep reinforcement learning as claimed in claim 2, wherein the method for identifying the characteristic items of the gas compressor is characterized in that the pressure ratio pi and the flow m in the physical characteristic data of the real gas compressor are dimensionless to obtain a pressure coefficient and a flow coefficient, and then a cubic surface equation is adopted to perform data fitting by using a least square method:
wherein m is the physical flow of the compressor; pi is the physical pressure ratio of the compressor; phi is the flow coefficient; psi is the pressure coefficient; ρ is the gas density inside the compressor; a is that c Is the equivalent sectional area of the internal flow passage of the compressor; u is the linear speed of the rim at the middle diameter of the compressor rotor; p is p 0 Is the ambient pressure; ψ (phi, n) is the compressor characteristic term; a, a 0 、a 1 、b 0 、b 1 、c 0 、c 1 Is the fitting coefficient.
6. The method and system for controlling the active surging of the compressor based on deep reinforcement learning as claimed in claim 2, wherein the actuator is a close-coupled valve, and the mathematical model of the final compressor comprising the actuator is as follows:
the control quantity of the close-connected valve actuating mechanism is input as a model; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a characteristic B parameter; l (L) c Is the equivalent length of the compressor; gamma ray T Is the throttle opening; d, d φ And d ψ The flow coefficient and the pressure coefficient are respectively disturbed and uncertain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310113139.6A CN116123124A (en) | 2023-02-14 | 2023-02-14 | Deep reinforcement learning-based active surge control method and system for gas compressor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310113139.6A CN116123124A (en) | 2023-02-14 | 2023-02-14 | Deep reinforcement learning-based active surge control method and system for gas compressor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116123124A true CN116123124A (en) | 2023-05-16 |
Family
ID=86311511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310113139.6A Pending CN116123124A (en) | 2023-02-14 | 2023-02-14 | Deep reinforcement learning-based active surge control method and system for gas compressor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116123124A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116517867A (en) * | 2023-06-28 | 2023-08-01 | 国网江苏省电力有限公司常州供电分公司 | Method and device for diagnosing and suppressing surge of compressor |
CN116566200A (en) * | 2023-07-10 | 2023-08-08 | 南京信息工程大学 | Direct-current buck converter control method, device and system and storage medium |
CN117648827A (en) * | 2024-01-29 | 2024-03-05 | 中国航发四川燃气涡轮研究院 | Method for evaluating precision of performance simulation program of air compressor based on test database |
CN117709027A (en) * | 2024-02-05 | 2024-03-15 | 山东大学 | Kinetic model parameter identification method and system for mechatronic-hydraulic coupling linear driving system |
CN117724337A (en) * | 2023-12-18 | 2024-03-19 | 大连理工大学 | Aeroengine surge active control system based on second-order sliding mode control |
-
2023
- 2023-02-14 CN CN202310113139.6A patent/CN116123124A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116517867A (en) * | 2023-06-28 | 2023-08-01 | 国网江苏省电力有限公司常州供电分公司 | Method and device for diagnosing and suppressing surge of compressor |
CN116517867B (en) * | 2023-06-28 | 2023-10-03 | 国网江苏省电力有限公司常州供电分公司 | Method and device for diagnosing and suppressing surge of compressor |
CN116566200A (en) * | 2023-07-10 | 2023-08-08 | 南京信息工程大学 | Direct-current buck converter control method, device and system and storage medium |
CN116566200B (en) * | 2023-07-10 | 2023-09-22 | 南京信息工程大学 | Direct-current buck converter control method, device and system and storage medium |
CN117724337A (en) * | 2023-12-18 | 2024-03-19 | 大连理工大学 | Aeroengine surge active control system based on second-order sliding mode control |
CN117648827A (en) * | 2024-01-29 | 2024-03-05 | 中国航发四川燃气涡轮研究院 | Method for evaluating precision of performance simulation program of air compressor based on test database |
CN117648827B (en) * | 2024-01-29 | 2024-04-16 | 中国航发四川燃气涡轮研究院 | Method for evaluating precision of performance simulation program of air compressor based on test database |
CN117709027A (en) * | 2024-02-05 | 2024-03-15 | 山东大学 | Kinetic model parameter identification method and system for mechatronic-hydraulic coupling linear driving system |
CN117709027B (en) * | 2024-02-05 | 2024-05-28 | 山东大学 | Kinetic model parameter identification method and system for mechatronic-hydraulic coupling linear driving system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116123124A (en) | Deep reinforcement learning-based active surge control method and system for gas compressor | |
Shipman et al. | Reinforcement learning and deep neural networks for PI controller tuning | |
Kamalasadan et al. | A neural network parallel adaptive controller for fighter aircraft pitch-rate tracking | |
CN113093526B (en) | Overshoot-free PID controller parameter setting method based on reinforcement learning | |
Jordanou et al. | Online learning control with echo state networks of an oil production platform | |
Mousavi et al. | Applying q (λ)-learning in deep reinforcement learning to play atari games | |
CN112729024B (en) | Intelligent adjusting method and system for control parameters of missile boosting section | |
Sathyan et al. | Collaborative control of multiple robots using genetic fuzzy systems approach | |
Witczak | Toward the training of feed-forward neural networks with the D-optimum input sequence | |
Ikemoto et al. | Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems | |
CN115618497A (en) | Aerofoil optimization design method based on deep reinforcement learning | |
US11738454B2 (en) | Method and device for operating a robot | |
CN108319146A (en) | A kind of method that radial base neural net is trained based on discrete particle cluster | |
Fernandez et al. | Deep reinforcement learning with linear quadratic regulator regions | |
Wang et al. | Learning Classifier System on a humanoid NAO robot in dynamic environments | |
Lin et al. | TSK-type quantum neural fuzzy network for temperature control | |
Kohler et al. | PID tuning using cross-entropy deep learning: A Lyapunov stability analysis | |
Li et al. | Research and Application of Process Object Intelligent Learning Modeling | |
Lu et al. | On-line outliers detection by neural network with quantum evolutionary algorithm | |
Panfilov et al. | Soft computing optimizer for intelligent control systems design: the structure and applications | |
XU et al. | Adjustment strategy for a dual-fuzzy-neuro controller using genetic algorithms–application to gas-fired water heater | |
CN117057225A (en) | Self-adaptive learning gas valve high-speed high-frequency high-precision servo and performance reconstruction method | |
Gholizadeh et al. | An Improved Real-Time Implementation of Adaptive Neuro-Fuzzy Controller | |
CN117270392A (en) | Multi-loop pre-estimation compensation control method and device for servo system | |
CN117930650A (en) | Aeroengine bleed air temperature fault-tolerant control method based on reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |