CN116123124A - Deep reinforcement learning-based active surge control method and system for gas compressor - Google Patents

Deep reinforcement learning-based active surge control method and system for gas compressor Download PDF

Info

Publication number
CN116123124A
CN116123124A CN202310113139.6A CN202310113139A CN116123124A CN 116123124 A CN116123124 A CN 116123124A CN 202310113139 A CN202310113139 A CN 202310113139A CN 116123124 A CN116123124 A CN 116123124A
Authority
CN
China
Prior art keywords
compressor
network
coefficient
layer
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310113139.6A
Other languages
Chinese (zh)
Inventor
张兴龙
张天宏
黄向华
盛汉霖
庞淑伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310113139.6A priority Critical patent/CN116123124A/en
Publication of CN116123124A publication Critical patent/CN116123124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F04POSITIVE - DISPLACEMENT MACHINES FOR LIQUIDS; PUMPS FOR LIQUIDS OR ELASTIC FLUIDS
    • F04DNON-POSITIVE-DISPLACEMENT PUMPS
    • F04D27/00Control, e.g. regulation, of pumps, pumping installations or pumping systems specially adapted for elastic fluids
    • F04D27/02Surge control
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F05INDEXING SCHEMES RELATING TO ENGINES OR PUMPS IN VARIOUS SUBCLASSES OF CLASSES F01-F04
    • F05DINDEXING SCHEME FOR ASPECTS RELATING TO NON-POSITIVE-DISPLACEMENT MACHINES OR ENGINES, GAS-TURBINES OR JET-PROPULSION PLANTS
    • F05D2270/00Control
    • F05D2270/70Type of control algorithm
    • F05D2270/709Type of control algorithm with neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Positive-Displacement Air Blowers (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a method and a system for controlling active surge of a gas compressor based on deep reinforcement learning, which relate to the field of control of active stability of an aeroengine and comprise the following steps: (1) establishing a mathematical model of the compressor with an executing mechanism; (2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor; (3) training the agent using a soft actor-critique algorithm; (4) After training, the weight parameters of the action network are fixed, and the weight parameters are deployed to the electronic controller for online application. The invention takes deep reinforcement learning as a new thought for solving complex nonlinearity in the design of the active surge control system of the compressor, improves the self-adaptability and the robustness of the controller and greatly reduces the design difficulty of the controller.

Description

Deep reinforcement learning-based active surge control method and system for gas compressor
Technical Field
The invention relates to the technical field of aero-engine active stability control, in particular to a method and a system for controlling active surge of a gas compressor based on deep reinforcement learning.
Background
Surge, a typical flow regime for compressor instability, has a great impact on the performance and safety of aircraft engines and, in severe cases, will also cause engine failure, leading to catastrophic consequences. In order to avoid rotating stall and surge of the compressor beyond an unstable boundary, early people often take passive anti-surge measures, namely, enough surge margin is reserved when the compressor is designed. The control method is an open-loop control thought, can reduce the possibility of an unstable state to a certain extent, but also greatly limits the range of the working flow and the pressure ratio of the compressor, and sacrifices the performance and the operation efficiency. With the intensive research of unstable characteristics of the compressor, the idea of active surge control has been developed, namely, the formation and development of pressure or flow disturbance in a flow field at the early stage of surge of the compressor are inhibited by the feedback control of an oscillating blade row, high-pressure jet, a loudspeaker, a piston damping mechanism, a controllable regulating valve, a close-coupled valve, a throttle valve and other actuating mechanisms, so that the compressor stably operates in a high-pressure ratio and high-efficiency area in a surge boundary.
The patent with publication number CN113279997A proposes an aeroengine surge active control system based on controller fuzzy switching, a mode control method of Lyapunov stability theory is used for designing a plurality of basic controllers suitable for different working ranges, and control signals of the basic controllers are weighted and fused according to the fuzzy switching principle so as to determine a final control quantity; the patent of publication No. CN109339954A proposes an active control method based on aerodynamic instability of an aero-engine compressor component, and a surge active controller is designed by using the estimated feedback of a compressor pressure coefficient and a flow coefficient and combining bifurcation theory. The method designs the controller from the aspect of the characteristic analysis of the compressor model, so that the whole system meets the Lyapunov stability. Although these methods are capable of controlling the compressor to operate in a stable state to some extent, they require accurate model parameters and set constraints, in other words, do not take into account external disturbances and model uncertainties, resulting in that they are not robust. In addition, the nonlinear control algorithms involve multiple matrix derivation and inversion calculations, have high computational complexity and high requirements on the on-line computing capacity of the surge control system, and therefore have limitations when deployed to engineering applications.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for controlling active surging of a gas compressor based on deep reinforcement learning, so as to realize robust self-adaptive optimal control under the conditions of uncertainty of a gas compressor model and external disturbance.
The technical scheme provided by the invention is as follows:
a method and a system for controlling active surging of a gas compressor based on deep reinforcement learning comprise the following steps:
(1) Establishing a mathematical model of the air compressor with an executing mechanism;
(2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor;
(3) Training the agent using a soft actor-critique algorithm;
(4) And deploying the action network of the trained intelligent agent to the electronic controller for online application.
Establishing a mathematical model of the compressor with the executing mechanism, and identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using real physical characteristic data of the compressor; and establishing a final compressor surge dynamic model according to the identified compressor characteristic items.
Further, the real compressor physical characteristic data describe the compressor pressure ratio pi when the flow is m under different relative percentage rotating speeds n of the compressor;
further, the identification method of the characteristic items of the air compressor is to carry out dimensionless on the pressure ratio and the flow in the physical characteristic data of the real air compressor to obtain a pressure coefficient and a flow coefficient:
Figure BSA0000296599810000021
Figure BSA0000296599810000022
wherein phi is the flowA coefficient of quantity; psi is the pressure coefficient; ρ is the gas density inside the compressor; a is that c Is the equivalent sectional area of the internal flow passage of the compressor; u is the linear speed of the rim at the middle diameter of the compressor rotor; p is p 0 Is the ambient pressure;
data fitting was then performed using a least squares method using a cubic surface equation:
Figure BSA0000296599810000023
wherein, psi (phi, n) is the characteristic item of the compressor; a, a 0 、a 1 、b 0 、b 1 、c 0 、c 1 Fitting coefficients;
further, the actuating mechanism is a close-coupled valve, and the final mathematical model of the compressor is as follows:
Figure BSA0000296599810000024
wherein u is the control quantity input to the close-coupled valve actuating mechanism and is the model input; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a characteristic B parameter; l (L) c Is the equivalent length of the compressor; gamma ray T Is the throttle opening; d, d φ And d ψ The flow coefficient and the pressure coefficient are respectively disturbed and uncertain.
The step (2) of establishing a deep reinforcement learning agent simulation training environment facing the active surge control task of the compressor comprises the following steps:
step (2.1), giving a reference instruction p for the pressure ratio coefficient of the compressor ref As a target of the active surge control of the compressor, the pressure ratio coefficient reference instruction is used for ensuring that the pressure ratio of the compressor smoothly transits to a low flow area outside the surge boundary, and is given by the following formula:
Figure BSA0000296599810000031
where τ is the inertia coefficient; c is the pressure coefficient reference command end value.
Step (2.2), selecting the observation variable O as k historical control periods and the pressure ratio coefficient reference instruction p in the current control period ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient ref - ψ and its integral eint and derivative edot, then the variable O is observed at time t t Represented as a multi-element time series matrix of dimensions (k+1, 6).
Figure BSA0000296599810000032
Step (2.3), designing an action network and an evaluation network of the intelligent agent;
action network pi of the agent θ The system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and is respectively output through two branch networks; the branch network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantity
Figure BSA0000296599810000036
Outputting standard deviation sigma of 2-bit control quantity of the layer;
the intelligent agent comprises two evaluation networks with identical structures, which are respectively
Figure BSA0000296599810000037
And evaluation network->
Figure BSA0000296599810000038
Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; the outputs of the branch network 1 and the branch network 2 are spliced into high-dimensional vectors by a concat layer and then sequentially pass through LSTM circulationA loop neural network layer, a relu activation function, a fully connected neural network layer and an output layer; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity to obtain the hope of rewards.
Step (2.4), designing a reward function r according to observed quantity, wherein the reward function r is specifically as follows:
r=r 1 +r 2 +r 3
wherein:
Figure BSA0000296599810000033
Figure BSA0000296599810000034
Figure BSA0000296599810000035
based on the simulation training environment of the deep reinforcement learning agent constructed in the step 2), training the agent by using a soft actor-critique algorithm, wherein the method specifically comprises the following steps:
step (3.1), establishing a target evaluation network
Figure BSA0000296599810000041
And a target evaluation network->
Figure BSA0000296599810000042
Its structure is respectively associated with an evaluation network>
Figure BSA0000296599810000043
And->
Figure BSA0000296599810000044
The same;
step (3.2), initializing evaluation network by random parameters respectively
Figure BSA0000296599810000045
Evaluation network->
Figure BSA0000296599810000046
And action network pi θ Weight parameter w of (2) 1 ,w 2 And θ; reuse->
Figure BSA0000296599810000047
Weight initialization target evaluation network of +.>
Figure BSA0000296599810000048
Use->
Figure BSA0000296599810000049
Weight initialization target evaluation network of +.>
Figure BSA00002965998100000410
Initializing an experience playback pool R, setting a training round number E, a simulation time T, a simulation sampling step length delta T, a training cycle number lambda, a defined discount factor gamma and an exponential average movement coefficient tau;
step (3.4), the round circulation is started;
step (3.5), simulation starts;
step (3.6), at the time t of the simulation, O is added t+1 Inputting the action network to obtain the control quantity u t The method comprises the steps of carrying out a first treatment on the surface of the Execution u t Calculating a prize r t At this time, the environmental state becomes O t+1
Step (3.7), step (O) t+1 ,u t ,r t ,O t+1 ) Storing the sample as a sample in an experience playback pool R;
step (3.8), training cycle is started;
step (3.9), sampling N samples from R, and updating weight parameters of all networks:
Figure BSA00002965998100000411
/>
Figure BSA00002965998100000412
Figure BSA00002965998100000413
Figure BSA00002965998100000414
Figure BSA00002965998100000415
step (3.10), executing step (3.9) until the training cycle is finished;
step (3.11), the simulation of the execution steps (3.5) - (3.10) is finished;
step (3.12), executing the steps (3.4) - (3.11) until the round circulation is finished;
and (4) after training, fixing the weight parameters of the action network and deploying the weight parameters to the electronic controller. In each control period, the electronic controller receives the observation variable from the air compressor in real time, inputs the observation variable into the trained action network, and finally outputs a control signal to the air compressor close-connected valve to ensure the stable operation of the air compressor;
further, the control amount standard deviation sigma=0 of the output of the action network is set in the electronic controller, so that the control amount average value of the output of the action network is set
Figure BSA0000296599810000051
Directly as the final control signal u.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the method and the system for controlling the active surge of the air compressor based on the deep reinforcement learning can ensure that the pressure ratio coefficient instruction is accurately tracked when the air compressor has complex nonlinearity and under external disturbance, effectively enlarge the stable working range of the air compressor and ensure the high-efficiency and reliable work of the aeroengine; meanwhile, the invention solves the problem of complex manual design of the active surge controller of the compressor by means of strong self-learning capability of deep reinforcement learning.
Drawings
FIG. 1 is a schematic diagram of the present invention;
FIG. 2 (a) is a graph showing actual compressor physical characteristic data in an embodiment of the present invention;
FIG. 2 (b) is non-dimensionalized compressor characterization data in accordance with an embodiment of the present invention;
FIG. 2 (c) is a cubic curve fitted by dimensionless compressor characterization data in an embodiment of the present invention;
FIG. 3 (a) is a diagram illustrating the operation network of the deep reinforcement learning agent according to the present invention;
FIG. 3 (b) is a diagram of an evaluation network of a deep reinforcement learning agent according to the present invention;
FIG. 4 is a graph showing the change of the reward function of the training process of the intelligent agent according to the embodiment of the present invention;
fig. 5 is a graph comparing control effects of three compressor active surge control methods.
Detailed Description
The technical scheme of the invention is further explained below by a specific embodiment with reference to the attached drawings.
Referring to fig. 1, a method and a system for controlling active surge of a compressor based on deep reinforcement learning comprise the following steps:
(1) Establishing a mathematical model of the air compressor with an executing mechanism;
identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using the physical characteristic data of the compressor as shown in fig. 2 (a), wherein the physical characteristic data of the compressor describes the distribution condition of the pressure ratio pi of the compressor when the flow rate is m at the relative percentage rotating speed n of the compressor;
the pressure ratio and the flow rate in the physical characteristic data of the real air compressor are subjected to the following dimensionless treatment to obtain a pressure coefficient and a flow coefficient which are shown in the figure 2 (b);
Figure BSA0000296599810000061
Figure BSA0000296599810000062
wherein phi is the flow coefficient; psi is the pressure coefficient; ρ is the density of the gas in the compressor, 1.225kg/m is taken 3 ;A c Is taken as 0.0291m for the equivalent sectional area of the internal flow passage of the air compressor 2 The method comprises the steps of carrying out a first treatment on the surface of the U is the linear speed of the rim at the middle diameter of the compressor rotor, and 927.63m/s is taken; p is p 0 Taking 100kPa for the environmental pressure;
and finally, adopting the following cubic surface equation to perform data fitting by using a least square method.
Figure BSA0000296599810000063
Wherein psi (phi, n) is the characteristic item of the compressor after fitting; a, a 0 、a 1 、b 0 、b 1 、c 0 、c 1 Fitting coefficients;
fig. 2 (c) shows a cubic surface fitted in this embodiment:
Figure BSA0000296599810000064
the mathematical model of the compressor finally comprising the close-coupled valve actuating mechanism is as follows:
Figure BSA0000296599810000065
wherein u is the control quantity input to the close-coupled valve actuating mechanism and is the model input; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a dimensionless B parameter, and 1.8 is taken; l (L) c Is compressed airTaking the equivalent length of the machine; gamma ray T Taking 0.6 as the opening of the throttle valve; d, d φ And d ψ The flow coefficient and pressure coefficient disturbance and uncertainty act respectively, in this embodiment given by:
d ψ =0.02sin(0.1t)+0.02cos(0.4t)
d φ =0.02sin(0.1t)+0.02cos(0.4t)
(2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor;
step (2.1), giving a reference instruction p for the pressure ratio coefficient of the compressor ref As a target of the active surge control of the compressor, the pressure ratio coefficient reference instruction is used for ensuring that the pressure ratio of the compressor smoothly transits to a low flow area outside the surge boundary, and is given by the following formula:
Figure BSA0000296599810000066
wherein τ is an inertia coefficient, 0.05; c is the pressure coefficient reference command final value, and 0.6 is taken; when solving the equation, p ref The initial value is set to 0.6568;
step (2.2), selecting the observation variable O as 3 historical control periods and the pressure ratio coefficient reference instruction p in the current control period ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient ref - ψ and integral eint and derivative edot thereof; observing variable O at time t t A multi-element time series matrix expressed as dimensions (4, 6):
Figure BSA0000296599810000071
step (2.3), designing an action network and an evaluation network of the intelligent agent;
action network pi of the agent θ As shown in fig. 3 (a), the system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and then outputs through two branch networks respectively; branching netThe network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the number of neurons of the full-connection layer and the LSTM layer of the action network is 128; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantity
Figure BSA00002965998100000716
The standard deviation sigma of the layer 2 bit control quantity is output.
The agent comprises two evaluation networks with identical structures, as shown in FIG. 3 (b), respectively
Figure BSA0000296599810000072
And evaluation network->
Figure BSA0000296599810000073
Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; after the outputs of the branch network 1 and the branch network 2 are spliced into high-dimensional vectors by the concat layer, the high-dimensional vectors sequentially pass through the LSTM circulating neural network layer, the relu activation function, the fully-connected neural network layer and the output layer; the number of neurons of the full-connection layer and the LSTM layer of the evaluation network is 128; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity to obtain the hope of rewards.
Step (2.4), designing a reward function r according to observed quantity, wherein the reward function r is specifically as follows:
r=r 1 +r 2 +r 3
wherein:
Figure BSA0000296599810000074
Figure BSA0000296599810000075
Figure BSA0000296599810000076
(3) Training the agent using a soft actor-critique algorithm;
step (3.1), establishing a target evaluation network
Figure BSA0000296599810000077
And a target evaluation network->
Figure BSA0000296599810000078
Its structure is respectively associated with an evaluation network>
Figure BSA0000296599810000079
And->
Figure BSA00002965998100000710
The same;
step (3.2), initializing evaluation network by random parameters respectively
Figure BSA00002965998100000711
Evaluation network->
Figure BSA00002965998100000712
And action network pi θ Weight parameter w of (2) 1 ,w 2 And θ; reuse->
Figure BSA00002965998100000713
Weight initialization target evaluation network of +.>
Figure BSA00002965998100000714
Use->
Figure BSA00002965998100000715
Weight initialization target evaluation network of +.>
Figure BSA0000296599810000081
Step (3.3), initializing an experience playback pool R, setting a training round number e=2000, a simulation time t=300 s, a simulation sampling step size Δt=0.02 s, a training round number λ=0.99, a defined discount factor γ=0.99, and an exponential average movement coefficient τ=0.5:
step (3.4), the round circulation is started;
step (3.5), simulation starts;
step (3.6), at the time t of the simulation, O is added t+1 Inputting the action network to obtain the control quantity u t The method comprises the steps of carrying out a first treatment on the surface of the Execution u t Calculating a prize r t At this time, the environmental state becomes O t+1
Step (3.7), step (O) t+1 ,u t ,r t ,O t+1 ) Storing the sample as a sample in an experience playback pool R;
step (3.8), training cycle is started;
step (3.9), sampling N samples from R, calculating a loss function and updating weight parameters:
Figure BSA0000296599810000082
Figure BSA0000296599810000083
Figure BSA0000296599810000084
Figure BSA0000296599810000085
Figure BSA0000296599810000086
step (3.10), executing step (3.9) until the training cycle is finished;
step (3.11), the simulation of the execution steps (3.5) - (3.10) is finished;
step (3.12), executing the steps (3.4) - (3.11) until the round circulation is finished;
fig. 4 shows the change in the bonus function during training of the agent using the soft actor-critique algorithm, which shows that the bonus value quickly converges to a higher level over 2000 rounds of training.
(4) After training, the weight parameters of the action network are fixed and deployed to an electronic controller, the on-line application mode is that in each control period, the electronic controller receives the observation variable from the air compressor in real time, inputs the observation variable to the trained action network, and sets the standard deviation sigma=0 of the control quantity output by the action network, so that the average value of the control quantity output by the action network
Figure BSA0000296599810000091
Directly output to the compressor close-coupled valve as a final control signal.
In order to verify the effectiveness of the method and system for controlling the active surge of the compressor, the method and the sliding mode control method and the fuzzy backstepping control method are compared based on the mathematical model of the compressor in the embodiment, and the result is shown in fig. 5. It can be seen that when external disturbance and model uncertainty exist, the sliding mode control method and the fuzzy backstepping control method enable the surge to occur in the process of tracking the pressure coefficient of the air compressor to a given reference instruction, so that the pressure coefficient cannot track the given instruction and has obvious fluctuation, and large dynamic and steady-state errors exist in the whole process, so that the air compressor is in an unstable working state; in contrast, the method for controlling the active surge of the gas compressor based on the deep reinforcement learning can ensure that the gas compressor tracks the given pressure coefficient reference instruction in the whole process, and ensures the stable operation of the gas compressor.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (6)

1. The method and the system for controlling the active surging of the gas compressor based on the deep reinforcement learning are characterized by comprising the following steps:
(1) Establishing a mathematical model of the air compressor with an executing mechanism;
(2) Establishing a deep reinforcement learning agent simulation training environment facing an active surge control task of the compressor;
(3) Training the agent using a soft actor-critique algorithm;
(4) And deploying the action network of the trained intelligent agent to the electronic controller for online application.
2. The method and system for controlling the active surging of the compressor based on the deep reinforcement learning as claimed in claim 1, wherein the step (1) of establishing the mathematical model of the compressor with the executing mechanism comprises the following steps: identifying a flow coefficient-pressure ratio coefficient characteristic item of the compressor by using the real physical characteristic data of the compressor; and then establishing a final mathematical model of the compressor according to the identified characteristic items of the compressor.
3. The method and system for controlling the active surge of the compressor based on the deep reinforcement learning as defined in claim 1, wherein the establishing the deep reinforcement learning agent simulation training environment facing the active surge control task of the compressor in the step (2) comprises:
step (2.1), giving a reference instruction p for the pressure ratio coefficient of the compressor ref As a target of the active surge control of the compressor, the pressure ratio coefficient reference instruction is used for ensuring that the pressure ratio of the compressor smoothly transits to a low flow area outside the surge boundary, and is given by the following formula:
Figure QLYQS_1
where τ is the inertia coefficient; c is the pressure coefficient reference command end value;
step (2.2), selecting the observation variable O as k historical control periods and the pressure ratio coefficient reference instruction p in the current control period ref Flow coefficient phi of the compressor model, pressure coefficient psi of the compressor model, control error e=p of the pressure coefficient ref - ψ and integral eint and derivative edot thereof, observing variable O at time t t A multi-element time series matrix expressed as dimension (k+1, 6):
Figure QLYQS_2
step (2.3), designing an action network and an evaluation network of the intelligent agent;
action network pi of the agent θ The system comprises an input layer, a full connection layer, an LSTM layer and a relu activation function layer, and is respectively output through two branch networks; the branch network 1 sequentially comprises a full connection layer, a relu activation function, a full connection layer and an output layer 1; the branch network 2 sequentially comprises a full connection layer, a relu activation function, a full connection layer, a softplus activation function layer and an output layer 2; the input parameter of the action network is observed quantity O, and the output layer 1 is the average value of the control quantity
Figure QLYQS_3
Outputting standard deviation sigma of 2-bit control quantity of the layer;
the intelligent agent comprises two evaluation networks with identical structures, which are respectively
Figure QLYQS_4
And evaluation network->
Figure QLYQS_5
Each evaluation network is combined by two branch networks, and the branch network 1 sequentially comprises an input layer 1, a full connection layer, a relu activation function layer and a full connection layer; the branch network 2 sequentially comprises an input layer 2 and a full-connection layer; the outputs of the branch network 1 and the branch network 2 are composed of concaAfter the t layers are spliced into high-dimensional vectors, the high-dimensional vectors sequentially pass through an LSTM circulating neural network layer, a relu activation function, a fully-connected neural network layer and an output layer; the input of the branch network 1 of the evaluation network is the observed quantity O, and the input of the branch network 2 is the control quantity u at the previous moment t-1 The method comprises the steps of carrying out a first treatment on the surface of the The output of the evaluation network is the current observed quantity and the control quantity which can obtain the expectations of rewards;
step (2.4), designing a reward function r according to observed quantity, wherein the reward function r is specifically as follows:
r=r 1 +r 2 +r 3
wherein:
Figure QLYQS_6
Figure QLYQS_7
Figure QLYQS_8
4. the method and system for controlling the active surging of the air compressor based on deep reinforcement learning as claimed in claim 1, wherein after the training in the step (4) is completed, the weight parameters of the action network are fixed and deployed to the electronic controller, and in each control period, the electronic controller receives the observation variables from the air compressor in real time and inputs the observation variables to the trained action network; given a standard deviation sigma=0 of the control quantity output by the action network, the average value of the control quantity output by the action network is calculated
Figure QLYQS_9
Directly as the final control signal u; and outputting a final control signal to the compressor executing mechanism to control the compressor to stably track the pressure coefficient reference instruction.
5. The method for controlling the active surging of the gas compressor based on the deep reinforcement learning as claimed in claim 2, wherein the method for identifying the characteristic items of the gas compressor is characterized in that the pressure ratio pi and the flow m in the physical characteristic data of the real gas compressor are dimensionless to obtain a pressure coefficient and a flow coefficient, and then a cubic surface equation is adopted to perform data fitting by using a least square method:
Figure QLYQS_10
Figure QLYQS_11
Figure QLYQS_12
wherein m is the physical flow of the compressor; pi is the physical pressure ratio of the compressor; phi is the flow coefficient; psi is the pressure coefficient; ρ is the gas density inside the compressor; a is that c Is the equivalent sectional area of the internal flow passage of the compressor; u is the linear speed of the rim at the middle diameter of the compressor rotor; p is p 0 Is the ambient pressure; ψ (phi, n) is the compressor characteristic term; a, a 0 、a 1 、b 0 、b 1 、c 0 、c 1 Is the fitting coefficient.
6. The method and system for controlling the active surging of the compressor based on deep reinforcement learning as claimed in claim 2, wherein the actuator is a close-coupled valve, and the mathematical model of the final compressor comprising the actuator is as follows:
Figure QLYQS_13
the control quantity of the close-connected valve actuating mechanism is input as a model; phi is the flow coefficient output by the model; psi is a pressure coefficient output by the model; b is a characteristic B parameter; l (L) c Is the equivalent length of the compressor; gamma ray T Is the throttle opening; d, d φ And d ψ The flow coefficient and the pressure coefficient are respectively disturbed and uncertain.
CN202310113139.6A 2023-02-14 2023-02-14 Deep reinforcement learning-based active surge control method and system for gas compressor Pending CN116123124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310113139.6A CN116123124A (en) 2023-02-14 2023-02-14 Deep reinforcement learning-based active surge control method and system for gas compressor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310113139.6A CN116123124A (en) 2023-02-14 2023-02-14 Deep reinforcement learning-based active surge control method and system for gas compressor

Publications (1)

Publication Number Publication Date
CN116123124A true CN116123124A (en) 2023-05-16

Family

ID=86311511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310113139.6A Pending CN116123124A (en) 2023-02-14 2023-02-14 Deep reinforcement learning-based active surge control method and system for gas compressor

Country Status (1)

Country Link
CN (1) CN116123124A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116517867A (en) * 2023-06-28 2023-08-01 国网江苏省电力有限公司常州供电分公司 Method and device for diagnosing and suppressing surge of compressor
CN116566200A (en) * 2023-07-10 2023-08-08 南京信息工程大学 Direct-current buck converter control method, device and system and storage medium
CN117648827A (en) * 2024-01-29 2024-03-05 中国航发四川燃气涡轮研究院 Method for evaluating precision of performance simulation program of air compressor based on test database
CN117709027A (en) * 2024-02-05 2024-03-15 山东大学 Kinetic model parameter identification method and system for mechatronic-hydraulic coupling linear driving system
CN117724337A (en) * 2023-12-18 2024-03-19 大连理工大学 Aeroengine surge active control system based on second-order sliding mode control

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116517867A (en) * 2023-06-28 2023-08-01 国网江苏省电力有限公司常州供电分公司 Method and device for diagnosing and suppressing surge of compressor
CN116517867B (en) * 2023-06-28 2023-10-03 国网江苏省电力有限公司常州供电分公司 Method and device for diagnosing and suppressing surge of compressor
CN116566200A (en) * 2023-07-10 2023-08-08 南京信息工程大学 Direct-current buck converter control method, device and system and storage medium
CN116566200B (en) * 2023-07-10 2023-09-22 南京信息工程大学 Direct-current buck converter control method, device and system and storage medium
CN117724337A (en) * 2023-12-18 2024-03-19 大连理工大学 Aeroengine surge active control system based on second-order sliding mode control
CN117648827A (en) * 2024-01-29 2024-03-05 中国航发四川燃气涡轮研究院 Method for evaluating precision of performance simulation program of air compressor based on test database
CN117648827B (en) * 2024-01-29 2024-04-16 中国航发四川燃气涡轮研究院 Method for evaluating precision of performance simulation program of air compressor based on test database
CN117709027A (en) * 2024-02-05 2024-03-15 山东大学 Kinetic model parameter identification method and system for mechatronic-hydraulic coupling linear driving system
CN117709027B (en) * 2024-02-05 2024-05-28 山东大学 Kinetic model parameter identification method and system for mechatronic-hydraulic coupling linear driving system

Similar Documents

Publication Publication Date Title
CN116123124A (en) Deep reinforcement learning-based active surge control method and system for gas compressor
Shipman et al. Reinforcement learning and deep neural networks for PI controller tuning
Kamalasadan et al. A neural network parallel adaptive controller for fighter aircraft pitch-rate tracking
CN113093526B (en) Overshoot-free PID controller parameter setting method based on reinforcement learning
Jordanou et al. Online learning control with echo state networks of an oil production platform
Mousavi et al. Applying q (λ)-learning in deep reinforcement learning to play atari games
CN112729024B (en) Intelligent adjusting method and system for control parameters of missile boosting section
Sathyan et al. Collaborative control of multiple robots using genetic fuzzy systems approach
Witczak Toward the training of feed-forward neural networks with the D-optimum input sequence
Ikemoto et al. Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems
CN115618497A (en) Aerofoil optimization design method based on deep reinforcement learning
US11738454B2 (en) Method and device for operating a robot
CN108319146A (en) A kind of method that radial base neural net is trained based on discrete particle cluster
Fernandez et al. Deep reinforcement learning with linear quadratic regulator regions
Wang et al. Learning Classifier System on a humanoid NAO robot in dynamic environments
Lin et al. TSK-type quantum neural fuzzy network for temperature control
Kohler et al. PID tuning using cross-entropy deep learning: A Lyapunov stability analysis
Li et al. Research and Application of Process Object Intelligent Learning Modeling
Lu et al. On-line outliers detection by neural network with quantum evolutionary algorithm
Panfilov et al. Soft computing optimizer for intelligent control systems design: the structure and applications
XU et al. Adjustment strategy for a dual-fuzzy-neuro controller using genetic algorithms–application to gas-fired water heater
CN117057225A (en) Self-adaptive learning gas valve high-speed high-frequency high-precision servo and performance reconstruction method
Gholizadeh et al. An Improved Real-Time Implementation of Adaptive Neuro-Fuzzy Controller
CN117270392A (en) Multi-loop pre-estimation compensation control method and device for servo system
CN117930650A (en) Aeroengine bleed air temperature fault-tolerant control method based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination