CN115544899B - Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning - Google Patents

Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN115544899B
CN115544899B CN202211475230.4A CN202211475230A CN115544899B CN 115544899 B CN115544899 B CN 115544899B CN 202211475230 A CN202211475230 A CN 202211475230A CN 115544899 B CN115544899 B CN 115544899B
Authority
CN
China
Prior art keywords
water
agent
network
time slot
pump station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211475230.4A
Other languages
Chinese (zh)
Other versions
CN115544899A (en
Inventor
余亮
檀洋阳
李澳
王冬生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211475230.4A priority Critical patent/CN115544899B/en
Publication of CN115544899A publication Critical patent/CN115544899A/en
Application granted granted Critical
Publication of CN115544899B publication Critical patent/CN115544899B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a multi-agent deep reinforcement learning-based energy-saving scheduling method for a water intake pump station of a water plant, which comprises the following steps: (1) On the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, modeling the total energy consumption minimization problem of the water intake pump station of the water plant into a Markov game, and designing a corresponding environment state, behavior and reward function; (2) Constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network; (3) Training a deep reinforcement learning agent based on a scheduling environment model and a multi-agent actor-attention-critic reinforcement learning algorithm; (4) And deploying the intelligent agent strategy obtained by training into an actual system. Compared with the existing method, the method provided by the invention has stronger system safety maintenance capability, energy-saving potential (up to 12.8%) and universality.

Description

Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
Technical Field
The invention relates to an energy-saving dispatching method for a water taking pump station of a water plant based on multi-agent deep reinforcement learning, and belongs to the crossing field of dispatching and artificial intelligence of the water taking pump station of the water plant.
Background
Water is the basis of sustainable development of economic society, but the comprehensive power consumption of water treatment plants is large due to the backward of processes, equipment or water treatment systems, wherein the power consumption of water taking pump stations accounts for the major part. The traditional pump station dispatching mainly depends on engineering experience to adjust the collocation of a pump set and adjust the frequency of a variable frequency pump. The regulation is qualitative regulation, the labor cost is high, the energy-saving level is unstable, and even the energy consumption is increased. In addition, the frequent opening and closing of the pump set can cause great impact on the pipeline, and the rapid change of the pressure can easily cause the water hammer phenomenon. Therefore, the energy consumption of the water taking pump station is reduced under the conditions of guaranteeing the water supply safety and meeting the water supply requirement, and the method has important significance for reducing the operation cost of a water plant, saving the urban power consumption and reducing the carbon dioxide emission.
Aiming at the energy-saving optimization scheduling research of a water intake pump station, a plurality of methods such as nonlinear programming, dynamic programming, genetic algorithm and the like are provided in the existing research. Although the above methods have certain advantages, these methods require knowledge of an explicit scheduling model of the water intake pumping station (e.g., an explicit relational expression between total energy consumption and water intake pumping station status and scheduling decisions). Since the performance of the water intake pumping station depends on many factors (such as internal parameters (lift, shaft power, motor speed, motor frequency, motor efficiency), external environments (such as water intake and water supply), and loss of liquid in the water pump due to limited blade number, friction, impact and leakage, etc.), it is very difficult to establish a water intake pumping station scheduling model which is accurate and easy to control. Furthermore, the research work considering the above method does not consider the loss of the water pump by frequently switching the pump group.
With the development of the internet of things technology and the artificial intelligence technology, a large amount of historical operating data of the water intake pump station of the water plant is easy to obtain and can be effectively utilized. For example: some work proposes a water intake pump station scheduling method based on data driving, such as a water intake pump station scheduling method combining a particle swarm algorithm and a support vector regression algorithm. However, this method requires prediction of water supply amount and the like for 24 hours in the future and rolling generation of a pump group schedule recommendation table, and thus is prone to introduce errors and to cause a large amount of calculation tasks. In addition, some of the work proposed a control method for a water distribution system based on reinforcement learning and deep reinforcement learning, using an algorithm comprising: q-learning, duel depth Q-network, near-end strategy gradient with knowledge assistance. Although the water pump station dispatching method based on reinforcement learning and deep reinforcement learning does not need to know a clear dispatching model of the water taking pump station, the water pump station dispatching method does not carry out research aiming at the energy saving problem of the water taking pump station, and a mode of controlling a water pump by a single intelligent body is adopted. When jointly considering the energy-saving scheduling of the fixed-frequency pump and the variable-frequency pump, the single intelligent body is directly adopted to control the water taking pump, so that the action space of the intelligent body is increased sharply, the learning efficiency is low, and the energy consumption is effectively saved on the premise of maintaining the safe work of the water taking pump and meeting the water supply requirement.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a water intake pump station energy-saving scheduling method for a water plant based on multi-agent deep reinforcement learning, and aims to reduce the energy consumption of the water intake pump station under the condition of maintaining the operation safety of a system. The method adopts the steps of carrying out low-dimensional discretization on the continuous motion space of the variable-frequency pump and controlling the water taking pump group by using a plurality of intelligent bodies. In order to realize the high-efficiency training of the multi-agent, a water intake pump station dispatching environment model (which is a black box model and does not need to know a white box model in the existing research) is constructed by using historical operating data, and a multi-agent actor-attention-critic reinforcement learning algorithm is adopted as a training algorithm, so that the water intake pump station energy-saving dispatching method with high expandability and high efficiency is finally obtained. The method does not need to predict any uncertain parameter or know a clear scheduling model of the water taking pump station, and has the advantages of low calculation complexity, obvious energy-saving effect and the like.
The invention discloses a water intaking pump station energy-saving scheduling method based on multi-agent deep reinforcement learning, which is characterized by comprising the following steps:
(1) On the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, modeling the total energy consumption minimization problem of the water taking pump station into a Markov game, designing the corresponding environmental state, behavior and reward function of the Markov game, and constructing a multi-agent about the water taking pump station system.
(2) And constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network.
(3) And carrying out deep reinforcement learning training on the multi-agent based on the water taking pump station dispatching environment model and the multi-agent actor-attention-critic reinforcement learning algorithm.
(4) And deploying the multi-agent strategy obtained by training into an actual water taking pump station system.
Further, the expression of the problem of minimizing the total energy consumption of the water intake pump station is as follows:
Figure 501203DEST_PATH_IMAGE002
in the formula (I), the compound is shown in the specification,
Figure 946353DEST_PATH_IMAGE004
is composed of
Figure 505511DEST_PATH_IMAGE006
Time slot (
Figure 218252DEST_PATH_IMAGE007
Figure 204662DEST_PATH_IMAGE008
Representing the total number of the optimized time slots) total energy consumption of the water intake pump station;
Figure 635644DEST_PATH_IMAGE009
is a desired operator whose operation is primarily directed to uncertainty parameters (such as water supply);
Figure 732913DEST_PATH_IMAGE011
is composed of
Figure 34581DEST_PATH_IMAGE012
The working frequency (aiming at a variable frequency pump) or the state (aiming at a fixed frequency pump) of the time slot water taking pump station;
Figure 457472DEST_PATH_IMAGE014
is composed of
Figure 877214DEST_PATH_IMAGE012
The liquid level of the time slot water storage tank,
Figure 247016DEST_PATH_IMAGE015
and
Figure 934349DEST_PATH_IMAGE016
the minimum and maximum liquid levels are within the safe range of the reservoir;
Figure 262562DEST_PATH_IMAGE018
is composed of
Figure 402556DEST_PATH_IMAGE019
Pressure intensity of main pipe of time slot water taking pump station,
Figure 107207DEST_PATH_IMAGE021
Is composed of
Figure 914626DEST_PATH_IMAGE023
The pressure intensity of the main pipe of the time slot water taking pump station,
Figure 148162DEST_PATH_IMAGE024
for the highest manifold pressure difference within the safe range,
Figure 276917DEST_PATH_IMAGE026
to indicate cutoff to
Figure 254100DEST_PATH_IMAGE027
The time slot water taking pump station switches times in one day,
Figure 916026DEST_PATH_IMAGE028
the maximum switching times of the water taking pump station in a safety range in one day are obtained.
Further, the environment state in the Markov game
Figure 586041DEST_PATH_IMAGE030
The expression of (a) is as follows:
Figure 700628DEST_PATH_IMAGE032
in the formula (I), the compound is shown in the specification,
Figure 481502DEST_PATH_IMAGE034
Figure 997934DEST_PATH_IMAGE036
taking 1,2, …,
Figure 838851DEST_PATH_IMAGE037
Figure 942199DEST_PATH_IMAGE037
to representThe number of water pumps that need to be controlled,
Figure 526764DEST_PATH_IMAGE037
and the total number of agents in the markov game (each agent is responsible for controlling 1 water pump). Wherein:
Figure 897702DEST_PATH_IMAGE038
is a multi-agent
Figure 909521DEST_PATH_IMAGE039
The state of the environment of the time slot,
Figure 467541DEST_PATH_IMAGE041
is shown as
Figure 855797DEST_PATH_IMAGE042
A local observed state of an individual fixed or variable frequency pump agent,
Figure 346821DEST_PATH_IMAGE044
is composed of
Figure 529541DEST_PATH_IMAGE045
The relative time index of the current absolute time of the time slot within a day,
Figure 607480DEST_PATH_IMAGE047
is composed of
Figure 268269DEST_PATH_IMAGE019
The liquid level of the time slot reservoir,
Figure 348220DEST_PATH_IMAGE049
is composed of
Figure 967420DEST_PATH_IMAGE005
The pressure of the main pipe of the water intake pump station in the time slot,
Figure 765612DEST_PATH_IMAGE051
is composed of
Figure 230092DEST_PATH_IMAGE012
The time slot reservoir borrows water (i.e. the amount of water called into the reservoir from other waterworks),
Figure 695708DEST_PATH_IMAGE053
is composed of
Figure 220230DEST_PATH_IMAGE054
The timeslot reservoir water supply (i.e. the amount of water called out from the reservoir),
Figure 240139DEST_PATH_IMAGE056
to be cut off to
Figure 287072DEST_PATH_IMAGE027
The time slot water taking pump station switches times in one day,
Figure 341616DEST_PATH_IMAGE058
as an agent
Figure 37039DEST_PATH_IMAGE059
The controlled water pump is arranged at
Figure 809823DEST_PATH_IMAGE045
Time slot switch state.
Further, behavior in the Markov game
Figure 881684DEST_PATH_IMAGE060
The expression of (c) is as follows:
Figure 790734DEST_PATH_IMAGE062
Figure 922639DEST_PATH_IMAGE064
in the formula (I), the compound is shown in the specification,
Figure 917139DEST_PATH_IMAGE065
indicating the number of water pumps that need to be controlled,
Figure 294156DEST_PATH_IMAGE065
is an integer which is the number of the whole,
Figure 323292DEST_PATH_IMAGE035
taking 1,2, …,
Figure 360518DEST_PATH_IMAGE065
. Wherein when
Figure 842315DEST_PATH_IMAGE067
When the temperature of the water is higher than the set temperature,
Figure 255979DEST_PATH_IMAGE068
is less than
Figure 139621DEST_PATH_IMAGE065
Integer of (2), agent
Figure 613328DEST_PATH_IMAGE035
In order to provide a constant-frequency pump,
Figure 582421DEST_PATH_IMAGE070
for constant frequency pumping in
Figure 301241DEST_PATH_IMAGE005
On-off state of the time slot when
Figure 39390DEST_PATH_IMAGE072
Intelligent body of time, timing frequency pump
Figure 683998DEST_PATH_IMAGE035
The operation is closed, and the operation is carried out,
Figure 140387DEST_PATH_IMAGE074
and starting the fixed-frequency pump intelligent body. When the temperature is higher than the set temperature
Figure 895853DEST_PATH_IMAGE076
Time, intelligent agent
Figure 488508DEST_PATH_IMAGE035
Is a variable-frequency pump and is characterized in that,
Figure 569597DEST_PATH_IMAGE078
Figure 247703DEST_PATH_IMAGE080
is a variable frequency pump
Figure 42746DEST_PATH_IMAGE082
The increase or decrease of the frequency of the time slot,
Figure 755487DEST_PATH_IMAGE084
indicating that the frequency pump is off,
Figure 741897DEST_PATH_IMAGE086
and
Figure 172879DEST_PATH_IMAGE088
respectively indicating frequency reduction of variable frequency pump
Figure 270148DEST_PATH_IMAGE090
And increase
Figure 837395DEST_PATH_IMAGE091
Figure 994707DEST_PATH_IMAGE092
Figure 647405DEST_PATH_IMAGE093
Indicating that the variable frequency pump frequency is unchanged.
Further, the expression of the winning function of the Markov game is as follows:
Figure 315409DEST_PATH_IMAGE095
in the formula (I), the compound is shown in the specification,
Figure 737163DEST_PATH_IMAGE097
is composed of
Figure 65377DEST_PATH_IMAGE098
The end of the timeslot is used to control the reward received by the agent for each pump, where:
Figure 205371DEST_PATH_IMAGE100
is composed of
Figure 644443DEST_PATH_IMAGE098
The penalty cost associated with the time slot and the energy consumption of the water intake pumping station,
Figure 186282DEST_PATH_IMAGE102
is composed of
Figure 685397DEST_PATH_IMAGE098
The penalty cost associated with a time slot versus reservoir level violation of a safe range,
Figure 312687DEST_PATH_IMAGE104
is composed of
Figure 289871DEST_PATH_IMAGE105
The time slot is associated with a penalty cost which is violated from the pressure difference safety range of the water intake pump station main pipe,
Figure 453261DEST_PATH_IMAGE107
is composed of
Figure 123277DEST_PATH_IMAGE108
Penalty cost related to the combined switching cost of the time slot and the water intaking pumping station,
Figure 972284DEST_PATH_IMAGE110
is composed of
Figure 18737DEST_PATH_IMAGE111
And the penalty caused by the fact that the time slot and the water getting pump station are combined and switched for times which violate the safety range.
Figure 535169DEST_PATH_IMAGE112
For liquid level violation of reservoirThe importance coefficient of the penalty incurred by the back safety margin versus the penalty cost associated with energy consumption,
Figure 376086DEST_PATH_IMAGE113
the penalty incurred for a manifold pressure differential violation of the safety margin versus the penalty cost associated with energy consumption,
Figure 712390DEST_PATH_IMAGE114
the penalty incurred for switching the pumping station relative to the penalty cost associated with energy consumption,
Figure 296955DEST_PATH_IMAGE115
and the importance coefficient of punishment caused by the fact that the switching times of the water taking pumping station violate the safety range relative to the punishment cost related to energy consumption.
Further, the water intaking pump station scheduling environment model is constructed as follows:
Figure 169358DEST_PATH_IMAGE117
in the formula (I), the compound is shown in the specification,
Figure 181177DEST_PATH_IMAGE119
is composed of
Figure 270355DEST_PATH_IMAGE121
The liquid level of the time slot water storage tank,
Figure 127453DEST_PATH_IMAGE123
is composed of
Figure 618477DEST_PATH_IMAGE124
The pressure intensity of the main pipe of the time slot water taking pump station,
Figure 801197DEST_PATH_IMAGE126
is composed of
Figure 112092DEST_PATH_IMAGE127
The time slot water taking pump station consumes energy,
Figure 38460DEST_PATH_IMAGE129
is composed of
Figure 151035DEST_PATH_IMAGE130
The water borrowing amount of the time slot water storage tank,
Figure 504656DEST_PATH_IMAGE132
is composed of
Figure 37268DEST_PATH_IMAGE133
The water supply amount of the time slot water storage tank,
Figure 32906DEST_PATH_IMAGE135
for energy consumption prediction Long Short Term Memory (LSTM) networks trained using real historical operational data,
Figure 232943DEST_PATH_IMAGE137
in order to utilize the liquid level prediction long-short term memory (LSTM) network obtained by real historical operation data training,
Figure 757465DEST_PATH_IMAGE139
long-short term memory (LSTM) networks are predicted for manifold pressures trained using real historical operating data.
Further, the multi-agent for the water intake pumping station system comprises: the intelligent agent quantity equals with water pump quantity, and every water pump is controlled by 1 intelligent agent. Each agent contains 1 actor network, 1 target actor network, 1 critic network, 1 target critic network and 1 attention network. The actor network and target actor network of each agent are identical in structure, and the reviewer network and target reviewer network of each agent are identical in structure.
In particular, an agent
Figure 42953DEST_PATH_IMAGE059
(i.e. with water pump)
Figure 576703DEST_PATH_IMAGE140
Corresponding agent) is a multi-layer deep neural network, and the actor network input is
Figure 120993DEST_PATH_IMAGE142
The network output of the actor is
Figure 816416DEST_PATH_IMAGE143
The activation function adopted by the deep neural network hidden layer is a leakage rectification function, and the activation function adopted by the deep neural network output layer is a normalized exponential function. The critic network in each intelligent agent comprises 3 perceptron modules which are respectively a first perceptron module, a second perceptron module and a third perceptron module. Wherein: the input to the first perceptron module is a local observed state
Figure 323621DEST_PATH_IMAGE144
The observation state code value is output after passing through the first sensor module
Figure 395482DEST_PATH_IMAGE146
(ii) a The input to the second perceptron module is a local observed state
Figure DEST_PATH_IMAGE147
And behaviors
Figure 835691DEST_PATH_IMAGE148
The output is a joint coded value of observed state and behavior
Figure 967595DEST_PATH_IMAGE150
(ii) a The output of the second sensor module in the critic networks of all the agents is used as the input of the attention network; the attention network returns other agents to the current agent
Figure 962096DEST_PATH_IMAGE035
Contribution of (2)
Figure 339113DEST_PATH_IMAGE152
Said contribution
Figure 102669DEST_PATH_IMAGE152
And the output of the first perceptron module
Figure 405475DEST_PATH_IMAGE154
The output of the third perceptron module is a function of the behavior values of all the agents currently in use as input to the third perceptron module
Figure 887271DEST_PATH_IMAGE156
Figure DEST_PATH_IMAGE157
A shared weight parameter representing all agent critic networks,
Figure DEST_PATH_IMAGE159
representing an agent
Figure 363252DEST_PATH_IMAGE042
The multilayer perceptron of (1).
The attention network is internally provided with
Figure 246895DEST_PATH_IMAGE065
A sub-network of the same structure, corresponding to
Figure 222066DEST_PATH_IMAGE065
An individual agent; with sub-networks
Figure 191159DEST_PATH_IMAGE160
For example, its input comprises the output of a second perceptron module in all agent critics' networks
Figure 142935DEST_PATH_IMAGE162
Said sub-network
Figure 881083DEST_PATH_IMAGE160
Output to all other agent pairs
Figure 525691DEST_PATH_IMAGE160
Is a contribution value of
Figure DEST_PATH_IMAGE163
The contribution value is a weighted sum of outputs obtained after output values of the second perceptron modules in the critic networks of all other agents are sent to the single-layer perceptron through linear transformation, namely:
Figure DEST_PATH_IMAGE165
wherein: weighting coefficient
Figure DEST_PATH_IMAGE167
Reflect the intelligent agent
Figure 342600DEST_PATH_IMAGE160
Second perceptron module output value in critic network
Figure DEST_PATH_IMAGE169
And other agents
Figure 629225DEST_PATH_IMAGE170
Second perceptron module output value in critic network
Figure 487459DEST_PATH_IMAGE172
The similarity between the two groups is similar to each other,
Figure DEST_PATH_IMAGE173
is a shared matrix of which the number of antennas is,
Figure 568548DEST_PATH_IMAGE174
is the leak ReLU activation function.
Figure 246654DEST_PATH_IMAGE176
Figure 805811DEST_PATH_IMAGE178
And
Figure 518552DEST_PATH_IMAGE180
are a shared matrix and are respectively paired
Figure DEST_PATH_IMAGE181
And
Figure 537586DEST_PATH_IMAGE182
the linear transformation is carried out, and the linear transformation is carried out,
Figure 702988DEST_PATH_IMAGE184
further, the deep reinforcement learning training process of the multi-agent comprises the following steps:
(1) And obtaining the current environment state according to the historical operating data of the water taking pump station.
(2) And the actor network of each agent outputs the current behavior of each water taking pump according to the current environment state.
(3) And according to the current environment state and the current behavior, utilizing a water taking pump station scheduling environment model to obtain the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot under the state and the behavior, and utilizing the information to reconstruct the environment state and the reward of the next time slot.
(4) And sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool.
(5) If the weight parameters of the deep neural network in the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, and updating the weight of the critic network by utilizing a multi-intelligent-body actor-attention-critic reinforcement learning algorithm and then updating the actor network.
Specifically, the critic network parameter update is performed according to the joint loss function minimization as follows:
Figure 800257DEST_PATH_IMAGE186
wherein:
Figure 633084DEST_PATH_IMAGE188
in order to be a function of the joint loss,
Figure DEST_PATH_IMAGE189
is a pool of experiences for storage
Figure DEST_PATH_IMAGE191
Figure 852713DEST_PATH_IMAGE192
Which is representative of the desired operation,
Figure DEST_PATH_IMAGE193
representing a discount coefficient;
Figure 538034DEST_PATH_IMAGE194
a vector of parameters representing all target actor networks, namely:
Figure DEST_PATH_IMAGE195
wherein:
Figure DEST_PATH_IMAGE197
representing an agent
Figure DEST_PATH_IMAGE199
Target actor network parameters of (1);
Figure 766890DEST_PATH_IMAGE200
and
Figure DEST_PATH_IMAGE201
a shared weight parameter representing all agent reviewer networks and the target reviewer network,
Figure 985382DEST_PATH_IMAGE202
is a temperature parameter that balances the maximum entropy and the maximum reward.
Figure 815060DEST_PATH_IMAGE204
Representing the expected value of the chosen action when it complies with the target actor's network policy,
Figure 955054DEST_PATH_IMAGE206
policies representing a network of target actors.
The network updating of the actor is carried out by adopting a gradient ascending method, and the specific gradient updating formula is as follows:
Figure 394126DEST_PATH_IMAGE208
in the formula:
Figure 935966DEST_PATH_IMAGE210
representing an agent
Figure 435080DEST_PATH_IMAGE212
Policy function of actor network (i.e. from observed state)
Figure 62371DEST_PATH_IMAGE142
To
Figure DEST_PATH_IMAGE213
The probability distribution mapping of (c),
Figure DEST_PATH_IMAGE215
in addition to
Figure 134494DEST_PATH_IMAGE216
Average value of other agent behaviors.
Figure 265261DEST_PATH_IMAGE218
Representing the gradient of the network of actors,
Figure 935277DEST_PATH_IMAGE220
representing the expected value of the action taken when it complies with the actor's network policy,
Figure 49863DEST_PATH_IMAGE222
express logarithmic functionAnd (5) calculating partial derivatives.
(6) And (3) judging whether the training process is finished or not after the weight parameters of the intelligent body deep neural network are updated, if not, skipping to the step (1), otherwise, terminating the training process, and using each actor network obtained by training as an optimal strategy (namely function mapping from a local observation state to a water pump control action) of the corresponding intelligent body for the control deployment of the actual water taking pump station.
Has the advantages that: compared with the prior art, the energy-saving dispatching method of the water intake pump station based on multi-agent deep reinforcement learning has the following beneficial effects that:
(1) Compared with scheduling methods based on nonlinear programming, dynamic programming and the like, the method disclosed by the invention does not need to know a clear dynamic model of the water taking pump station. Different from a scheduling method based on prediction, the multi-agent strategy obtained by the method only outputs the control decision of the water taking pump station according to the observation state of the current time slot, so that any uncertain parameter does not need to be predicted. In addition, the output process of the multi-agent strategy only relates to the forward conduction of the multi-layer deep neural network, and the execution time is in millisecond order, so that the computation complexity is extremely low. Therefore, the method has strong universality.
(2) Compared with a single-agent water pump scheduling method based on reinforcement learning, the method can utilize an attention mechanism among multiple agents to realize efficient coordinated scheduling among a plurality of water taking pumps; the energy consumption can be obviously reduced on the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump within a safety range. Therefore, the method of the invention has high efficiency.
Drawings
Fig. 1 is a flowchart of a water intake pumping station scheduling control method provided by the present invention.
FIG. 2 is a graph of the convergence of a training curve for an embodiment of the method of the present invention.
FIG. 3 is a graph comparing the average energy consumption of an embodiment of the method of the present invention with other embodiments.
FIG. 4 is a graph comparing the average liquid level crossing limits of an embodiment of the method of the present invention with other solutions.
Figure 5 is a graph comparing the mean manifold pressure overshoot for an embodiment of the method of the present invention with other embodiments.
Fig. 6 is a graph comparing the average water pump switching times of the embodiment of the method of the present invention with those of other schemes.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely for illustrating the technical solutions of the present invention more clearly, and the scope of the present invention should not be limited thereby.
As shown in fig. 1, a design flowchart of the energy-saving dispatching control method for a water intake pumping station based on multi-agent deep reinforcement learning provided by the invention comprises the following steps:
step 1: on the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, the problem of minimizing the total energy consumption of the water taking pump station is converted into a Markov game, and a corresponding environment state, behavior and reward function are designed.
And 2, step: and constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network.
And step 3: deep reinforcement learning agents are trained based on a dispatch environment model and a multi-agent actor-attention-critic reinforcement learning algorithm.
And 4, step 4: and deploying the intelligent agent strategy obtained by training into an actual system.
In step 1, the behavior of the markov game includes that the reservoir liquid level, the manifold pressure difference and the water pump switching times are required to be maintained within a safe range again: frequency decision of the frequency pump and switch decision of the fixed frequency pump; the constraints to be considered are: the constraints related to the reservoir liquid level, the constraint related to the manifold pressure difference and the constraint related to the water pump switching times are as follows:
(1) Liquid level of reservoir
Figure DEST_PATH_IMAGE223
In a safe range, i.e.
Figure 627475DEST_PATH_IMAGE224
Wherein:
Figure 878328DEST_PATH_IMAGE016
and
Figure 719245DEST_PATH_IMAGE015
respectively representing the upper limit and the lower limit of the safe liquid level of the reservoir.
(2) The manifold pressure difference being less than the upper limit of the pressure difference, i.e.
Figure DEST_PATH_IMAGE225
Wherein:
Figure 177970DEST_PATH_IMAGE226
water pump with indication
Figure 762535DEST_PATH_IMAGE035
The highest manifold pressure differential that can be accepted,
Figure DEST_PATH_IMAGE227
to represent
Figure 664632DEST_PATH_IMAGE228
The manifold pressure at the time of day,
Figure DEST_PATH_IMAGE229
representing the manifold pressure at time t-1.
(3) Number of times of switching water pump
Figure 207609DEST_PATH_IMAGE230
Less than the safe switching range, i.e.
Figure 296787DEST_PATH_IMAGE232
Wherein:
Figure DEST_PATH_IMAGE233
indicating the highest number of handovers acceptable within a day.
In step 1 above, the markov game may be defined by a series of states, behaviors, state transition functions, and reward functions. In the Markov game, each agent maximizes its expected return (i.e., the expected value of the jackpot) based on the current state and the selected behavior. The environment state, the behavior and the reward function of the Markov game are respectively designed as follows:
(1) The environmental state.
Figure 452087DEST_PATH_IMAGE234
Is a multi-agent
Figure 677532DEST_PATH_IMAGE005
The environmental state of the time slot and the environmental state of the multi-agent are designed as follows:
Figure DEST_PATH_IMAGE235
Figure 391410DEST_PATH_IMAGE027
time slot
Figure 702306DEST_PATH_IMAGE035
Water pump frequency decision related intelligent agent
Figure 628674DEST_PATH_IMAGE035
For locally observing states
Figure 974204DEST_PATH_IMAGE236
Is shown, in which:
Figure DEST_PATH_IMAGE237
respectively represent:
Figure 360449DEST_PATH_IMAGE238
is composed of
Figure 955378DEST_PATH_IMAGE005
The relative time index of the current absolute time of the time slot within a day,
Figure DEST_PATH_IMAGE239
is composed of
Figure 577115DEST_PATH_IMAGE005
The liquid level of the time slot water storage tank,
Figure 511573DEST_PATH_IMAGE240
is composed of
Figure 36095DEST_PATH_IMAGE005
The pressure intensity of the main pipe of the time slot water taking pump station,
Figure DEST_PATH_IMAGE241
is composed of
Figure 852741DEST_PATH_IMAGE005
The water borrowing amount of the time slot water storage tank,
Figure 120911DEST_PATH_IMAGE242
is composed of
Figure 175455DEST_PATH_IMAGE005
The water supply amount of the time slot water storage tank,
Figure 870879DEST_PATH_IMAGE243
to be cut off to
Figure 643663DEST_PATH_IMAGE005
The switching frequency condition of the water getting pump station of the time slot in one day.
Figure DEST_PATH_IMAGE244
As an agent
Figure 482568DEST_PATH_IMAGE245
In that
Figure 657197DEST_PATH_IMAGE005
Time-slotted pump on-off conditions.
(2) And (6) behaviors.
Figure 523522DEST_PATH_IMAGE027
For the behaviour of time slots
Figure 783602DEST_PATH_IMAGE060
It is shown that,
Figure DEST_PATH_IMAGE246
Figure DEST_PATH_IMAGE248
(3) A reward function. First water taking pump station
Figure 455892DEST_PATH_IMAGE036
The water pump is related to the intelligent body
Figure 986493DEST_PATH_IMAGE027
For reward functions of time slots
Figure DEST_PATH_IMAGE250
It comprises 5 components:
1.
Figure 820456DEST_PATH_IMAGE108
penalty associated with energy consumption of time slot water intaking pump station
Figure DEST_PATH_IMAGE252
2.
Figure 833412DEST_PATH_IMAGE108
Penalty due to time slot reservoir level crossing
Figure DEST_PATH_IMAGE254
Figure 778234DEST_PATH_IMAGE255
3.
Figure 661876DEST_PATH_IMAGE105
Penalty due to time slot violating safety manifold pressure difference range
Figure 637048DEST_PATH_IMAGE257
Figure 340562DEST_PATH_IMAGE255
4.
Figure 292337DEST_PATH_IMAGE127
Punishment caused by closing of water taking pump station for time slot switching
Figure 296065DEST_PATH_IMAGE259
Figure 940673DEST_PATH_IMAGE261
Water pump with indication
Figure 397062DEST_PATH_IMAGE035
In a time slot
Figure 152529DEST_PATH_IMAGE005
On and off state of when
Figure 479605DEST_PATH_IMAGE263
Water pump with indication
Figure 295114DEST_PATH_IMAGE035
In a time slot
Figure 740264DEST_PATH_IMAGE005
Is turned off when
Figure DEST_PATH_IMAGE265
Water pump with indication
Figure 830580DEST_PATH_IMAGE140
In a time slot
Figure DEST_PATH_IMAGE266
When the switch is turned on;
Figure DEST_PATH_IMAGE268
water pump with indication
Figure DEST_PATH_IMAGE269
In a time slot
Figure 871217DEST_PATH_IMAGE271
On and off states of when
Figure DEST_PATH_IMAGE273
Water pump with indication
Figure 612953DEST_PATH_IMAGE035
In a time slot
Figure 43934DEST_PATH_IMAGE274
Is turned off when
Figure 141203DEST_PATH_IMAGE276
Water pump with indication
Figure 974030DEST_PATH_IMAGE035
In a time slot
Figure DEST_PATH_IMAGE277
And then is turned on.
5.
Figure DEST_PATH_IMAGE278
Punishment caused by safe switching range of time slot water taking pump station
Figure 928080DEST_PATH_IMAGE280
Figure DEST_PATH_IMAGE281
In the formula:
Figure 878980DEST_PATH_IMAGE112
importance of penalty for reservoir level violations of safety margin versus penalty cost associated with energy consumptionThe number of the first and second groups is,
Figure 779940DEST_PATH_IMAGE113
the penalty incurred for a manifold pressure differential violation of the safety margin versus the penalty cost associated with energy consumption,
Figure 201694DEST_PATH_IMAGE114
the penalty incurred for switching the pumping station relative to the penalty cost associated with energy consumption,
Figure 529908DEST_PATH_IMAGE115
and the importance coefficient of punishment caused by the fact that the switching times of the water taking pumping station violate the safety range relative to the punishment cost related to energy consumption.
In the step 2, the water intake pump station dispatching system aims to minimize the energy consumption of the water intake pump station on the premise of maintaining the liquid level of the reservoir, the pressure difference of the header pipe and the switching frequency of the water pump within a safe range. In order to establish a water intake pump station dispatching environment model, historical data and a long-short term memory (LSTM) network are adopted for construction. Specifically, the LSTM network outputs the energy consumption of the water taking pump station scheduling by inputting the state and the action of the water taking pump station
Figure DEST_PATH_IMAGE282
Liquid level
Figure 201060DEST_PATH_IMAGE284
And manifold pressure
Figure 905711DEST_PATH_IMAGE286
In step 3, a multi-agent actor-attention-critic reinforcement learning algorithm is used for training an optimal decision of the water taking pump station dispatching system for maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water taking pump stations within a safety range. The specific steps of training the deep reinforcement learning agent are as follows:
(1) Obtaining a current environment state according to historical operating data of a water taking pump station;
(2) The actor network of each agent outputs the current behavior of the water taking pump station according to the current environment state;
(3) According to the historical environment state and the current behavior, obtaining the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot by using a scheduling environment model under the state and the behavior, and reconstructing the environment state and the reward of the next time slot by using the information;
(4) Sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool;
(5) If the weight parameters of the deep neural network in the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, updating the weight of the deep neural network by using a multi-intelligent-body actor-attention-critic reinforcement learning algorithm, judging whether the training process is finished after the updating is finished, if not, skipping to the step (1), otherwise, terminating the training process. The actor network obtained after training is used for actual deployment as an optimal strategy (namely, function mapping from a local observation state to a water pump control action) of each agent.
Compared with the prior art, the embodiment of the invention can obtain the following beneficial effects:
1) The method provided by the invention has universality. The energy-saving dispatching method for the water intake pump station based on the multi-agent actor-attention-critic reinforcement learning algorithm is provided. Because the obtained intelligent agent strategy only obtains the control decision of each water intake pump according to the observation state of the current time slot, the method does not need to know the prior information or prediction uncertainty parameters (such as water supply quantity) of any uncertainty system parameters and the clear scheduling mechanism model of the water intake pump station;
2) The method provided by the invention has high efficiency. Compared with the existing scheduling method, the method can reduce the energy consumption by 12.8 percent while maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safe range.
Fig. 2 is a graph showing the convergence of the training curve according to the embodiment of the present invention. As can be seen from the curves, the training reward generally shows a growing trend and gradually becomes stable.
Fig. 3 is a graph showing the energy consumption of the embodiment of the method of the present invention compared with other embodiments. The scheme is a real scheduling scheme of a water intake pump station. The water intake, water supply and water intake pump station parameter data used by the invention are all from the actual data of the water plant from 11/month 1 in 2020 to 4/month 30 in 2021. Compared with the first scheme, the method can save the average energy consumption by 12.8 percent.
FIG. 4 is a graph showing the comparison between the average liquid level of the example of the method of the present invention and the average liquid level of other solutions. Compared with the first proposal, the average liquid level threshold under the method is reduced by 66.2 percent.
Fig. 5 is a graph comparing the average manifold pressure difference of an embodiment of the method of the present invention with other embodiments. As can be seen from the figure: compared with the first scheme, the method has smaller pressure difference of the main pipe, and the pressure difference of the main pipe of the method is always within a safety range.
Fig. 6 is a graph comparing the average water pump switching times of the embodiment of the method of the present invention with those of other solutions. Compared with the first scheme, the method can reduce the pump switching times by 50%.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (4)

1. A multi-agent deep reinforcement learning-based energy-saving scheduling method for a water intake pump station of a water plant is characterized by comprising the following steps:
step 1: on the premise of maintaining the liquid level of a water storage tank, the pressure difference of a main pipe and the switching times of a water pump in a safety range, modeling the total energy consumption minimization problem of a water taking pump station into a Markov game, designing the corresponding environmental state, behavior and reward function of the Markov game, and constructing a multi-agent about the water taking pump station system;
and 2, step: constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network;
and 3, step 3: carrying out deep reinforcement learning training on the multi-agent based on a water taking pump station scheduling environment model and a multi-agent actor-attention-critic reinforcement learning algorithm;
and 4, step 4: deploying the multi-agent strategy obtained by training into an actual water taking pump station system;
the expression of the problem of minimizing the total energy consumption of the water taking pump station is as follows:
Figure DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE002
is composed of
Figure DEST_PATH_IMAGE003
The total energy consumption of the time slot water taking pump station,
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
representing the total number of the optimized time slots;
Figure DEST_PATH_IMAGE006
is a desired operator;
Figure DEST_PATH_IMAGE007
is composed of
Figure 143582DEST_PATH_IMAGE003
The working frequency or the fixed frequency pump state of a variable frequency pump of a time slot water taking pump station;
Figure DEST_PATH_IMAGE008
is composed of
Figure 606529DEST_PATH_IMAGE003
The liquid level of the time slot reservoir,
Figure DEST_PATH_IMAGE009
and
Figure DEST_PATH_IMAGE010
the lowest liquid level and the highest liquid level are in a safe range of the reservoir;
Figure DEST_PATH_IMAGE011
is composed of
Figure DEST_PATH_IMAGE012
The pressure of the main pipe of the water intake pump station in the time slot,
Figure DEST_PATH_IMAGE013
is composed of
Figure DEST_PATH_IMAGE014
The pressure of the main pipe of the water intake pump station in the time slot,
Figure DEST_PATH_IMAGE015
for the highest manifold pressure difference within the safe range,
Figure DEST_PATH_IMAGE016
to indicate cutoff to
Figure 88064DEST_PATH_IMAGE003
The time slot water taking pump station switches times in one day,
Figure DEST_PATH_IMAGE017
the maximum switching times of the water taking pump station in a safety range in one day are set;
environment in the Markov gameStatus of state
Figure DEST_PATH_IMAGE018
The expression of (a) is as follows:
Figure DEST_PATH_IMAGE019
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
taking 1,2, …,
Figure DEST_PATH_IMAGE022
Figure 797787DEST_PATH_IMAGE022
indicating the number of water pumps that need to be controlled,
Figure 940055DEST_PATH_IMAGE022
meanwhile, the total number of agents in the Markov game is also provided, and each agent is responsible for controlling 1 water taking pump; wherein:
Figure 965780DEST_PATH_IMAGE018
for multiple agents in
Figure 560710DEST_PATH_IMAGE003
The state of the environment of the time slot,
Figure DEST_PATH_IMAGE023
is shown as
Figure 418332DEST_PATH_IMAGE021
A local observed state of an individual fixed frequency pump agent or variable frequency pump agent,
Figure DEST_PATH_IMAGE024
is composed of
Figure 618369DEST_PATH_IMAGE003
The relative time index of the current absolute time of the time slot within a day,
Figure DEST_PATH_IMAGE025
is composed of
Figure 267525DEST_PATH_IMAGE003
The liquid level of the time slot reservoir,
Figure DEST_PATH_IMAGE026
is composed of
Figure 553013DEST_PATH_IMAGE003
The pressure intensity of the main pipe of the time slot water taking pump station,
Figure DEST_PATH_IMAGE027
is composed of
Figure 945817DEST_PATH_IMAGE003
The water borrowing amount of the time slot water storage tank,
Figure DEST_PATH_IMAGE028
is composed of
Figure 360DEST_PATH_IMAGE003
The water supply amount of the time slot water storage tank,
Figure DEST_PATH_IMAGE029
to be cut off to
Figure 88927DEST_PATH_IMAGE003
The time slot water taking pump station switches times in one day,
Figure DEST_PATH_IMAGE030
to make an intelligenceEnergy body
Figure 127290DEST_PATH_IMAGE021
The water pump is controlled at
Figure 464730DEST_PATH_IMAGE003
A time slot switch state;
behavior in the Markov game
Figure DEST_PATH_IMAGE031
The expression of (c) is as follows:
Figure DEST_PATH_IMAGE032
Figure DEST_PATH_IMAGE033
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE034
indicating the number of water pumps that need to be controlled,
Figure 623048DEST_PATH_IMAGE034
is an integer which is the number of the whole,
Figure 161477DEST_PATH_IMAGE021
taking 1,2, …,
Figure 209505DEST_PATH_IMAGE034
(ii) a Wherein when
Figure DEST_PATH_IMAGE035
When the temperature of the water is higher than the set temperature,
Figure DEST_PATH_IMAGE036
is less than
Figure 147374DEST_PATH_IMAGE034
Integer of (2), agent
Figure 973248DEST_PATH_IMAGE021
In order to provide a constant-frequency pump,
Figure DEST_PATH_IMAGE037
for constant frequency pumping at
Figure 400687DEST_PATH_IMAGE003
On-off state of the time slot when
Figure DEST_PATH_IMAGE038
Intelligent body of time, timing frequency pump
Figure 882484DEST_PATH_IMAGE021
The operation is closed, and the operation is carried out,
Figure DEST_PATH_IMAGE039
starting the intelligent constant-frequency pump body; when the temperature is higher than the set temperature
Figure DEST_PATH_IMAGE040
Time, intelligent agent
Figure DEST_PATH_IMAGE041
Is a variable-frequency pump, and is characterized in that,
Figure DEST_PATH_IMAGE042
Figure DEST_PATH_IMAGE043
for variable frequency pumps in
Figure DEST_PATH_IMAGE044
The increase or decrease of the frequency of the time slot,
Figure DEST_PATH_IMAGE045
indicating that the frequency pump is off and,
Figure DEST_PATH_IMAGE046
and
Figure DEST_PATH_IMAGE047
respectively indicating frequency reduction of variable frequency pump
Figure DEST_PATH_IMAGE048
And increase of
Figure DEST_PATH_IMAGE049
Figure DEST_PATH_IMAGE050
Figure DEST_PATH_IMAGE051
The frequency of the variable frequency pump is unchanged;
the rewarding function expression in the Markov game is as follows:
Figure DEST_PATH_IMAGE052
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE053
is composed of
Figure DEST_PATH_IMAGE054
The end of the timeslot is used to control the reward received by the agent of each pump, wherein:
Figure DEST_PATH_IMAGE055
is composed of
Figure DEST_PATH_IMAGE056
The penalty cost associated with the time slot and the energy consumption of the water intake pumping station,
Figure DEST_PATH_IMAGE057
is composed of
Figure 767919DEST_PATH_IMAGE054
The penalty cost associated with a time slot versus reservoir level violation of a safe range,
Figure DEST_PATH_IMAGE058
is composed of
Figure DEST_PATH_IMAGE059
The time slot is associated with a penalty cost which is violated from the pressure difference safety range of the water intake pump station main pipe,
Figure DEST_PATH_IMAGE060
is composed of
Figure 310283DEST_PATH_IMAGE059
Penalty cost related to the combined switching cost of the time slot and the water intaking pumping station,
Figure DEST_PATH_IMAGE061
is composed of
Figure DEST_PATH_IMAGE062
The time slot and the water taking pump station combined switching times violate the punishment caused by the safety range;
Figure DEST_PATH_IMAGE063
an importance coefficient for penalties resulting from reservoir level violations of safety limits versus energy consumption related penalty costs,
Figure DEST_PATH_IMAGE064
the penalty incurred for a manifold pressure differential violation of the safety margin versus the penalty cost associated with energy consumption,
Figure DEST_PATH_IMAGE065
the penalty incurred for switching the pumping station relative to the penalty cost associated with energy consumption,
Figure DEST_PATH_IMAGE066
the importance coefficient of punishment caused by the fact that the switching times of the water taking pump station violate the safety range relative to the punishment cost related to energy consumption;
the multi-agent about water intaking pump station system includes: the number of the intelligent agents is equal to that of the water pumps, and each water pump is controlled by 1 intelligent agent; each intelligent agent internally comprises 1 actor network, 1 target actor network, 1 critic network, 1 target critic network and 1 attention network; the actor network and the target actor network of each intelligent body have the same structure, and the reviewer network and the target reviewer network of each intelligent body have the same structure;
intelligent agent
Figure DEST_PATH_IMAGE067
The actor network inputs are
Figure DEST_PATH_IMAGE068
The network output of the actor is
Figure DEST_PATH_IMAGE069
(ii) a The critic network in each intelligent agent comprises 3 perceptron modules which are respectively a first perceptron module, a second perceptron module and a third perceptron module; wherein: the input to the first perceptron module is a local observed state
Figure DEST_PATH_IMAGE070
The observation state code value is output after passing through the first sensor module
Figure DEST_PATH_IMAGE071
(ii) a The input to the second perceptron module is a local observed state
Figure DEST_PATH_IMAGE072
And behaviors
Figure DEST_PATH_IMAGE073
The output is a joint coded value of observed state and behavior
Figure DEST_PATH_IMAGE074
(ii) a The output of the second sensor module in the critic networks of all the agents is used as the input of the attention network; attention network returns other agents to current agent
Figure 952966DEST_PATH_IMAGE067
Contribution of (2)
Figure DEST_PATH_IMAGE075
Said contribution
Figure 46693DEST_PATH_IMAGE075
And the output of the first perceptron module
Figure DEST_PATH_IMAGE076
The output of the third perceptron module is a function of the state behavior values of all agents present as input to the third perceptron module
Figure DEST_PATH_IMAGE077
Figure DEST_PATH_IMAGE078
A shared weight parameter representing all agent critic networks,
Figure DEST_PATH_IMAGE079
representing an agent
Figure DEST_PATH_IMAGE080
The multilayer perceptron of (1); the attention network is internally provided with
Figure 453928DEST_PATH_IMAGE022
A sub-network of the same structure, corresponding to
Figure 723235DEST_PATH_IMAGE022
An individual agent; sub-networks
Figure DEST_PATH_IMAGE081
The input comprises the output of the second perceptron module in all the intelligent agent criticizing network
Figure DEST_PATH_IMAGE082
Sub-networks
Figure 682357DEST_PATH_IMAGE081
Output to all other agent pairs
Figure 810850DEST_PATH_IMAGE081
Is a contribution value of
Figure DEST_PATH_IMAGE083
(ii) a The contribution value
Figure 81163DEST_PATH_IMAGE083
The weighted sum of the output values of the second perceptron module in the critic networks of all other agents after being linearly transformed and sent into the single-layer perceptron is as follows:
Figure DEST_PATH_IMAGE084
wherein: weighting coefficient
Figure DEST_PATH_IMAGE085
Reflect the intelligent agent
Figure 660437DEST_PATH_IMAGE081
Second perceptron module output value in critic network
Figure DEST_PATH_IMAGE086
And other agents
Figure DEST_PATH_IMAGE087
The output value of the second perceptron module in the critic network
Figure DEST_PATH_IMAGE088
The similarity between the two groups is similar to each other,
Figure DEST_PATH_IMAGE089
is a shared matrix of which the number of channels is,
Figure DEST_PATH_IMAGE090
is the leak ReLU activation function.
2. The energy-saving dispatching method for the water intake pumping station of the water plant based on the multi-agent deep reinforcement learning of claim 1, wherein the dispatching environment model of the water intake pumping station is constructed as follows:
Figure DEST_PATH_IMAGE091
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE092
Figure DEST_PATH_IMAGE093
is composed of
Figure DEST_PATH_IMAGE094
The liquid level of the time slot water storage tank,
Figure DEST_PATH_IMAGE095
is composed of
Figure DEST_PATH_IMAGE096
The pressure intensity of the main pipe of the time slot water taking pump station,
Figure DEST_PATH_IMAGE097
is composed of
Figure DEST_PATH_IMAGE098
The time slot water taking pump station consumes energy,
Figure DEST_PATH_IMAGE099
is composed of
Figure 876611DEST_PATH_IMAGE054
The water borrowing amount of the time slot water storage tank,
Figure DEST_PATH_IMAGE100
is composed of
Figure 944930DEST_PATH_IMAGE054
The water supply amount of the time slot water storage tank,
Figure DEST_PATH_IMAGE101
in order to predict the long-term and short-term memory network by using the energy consumption obtained by training the real historical operation data,
Figure DEST_PATH_IMAGE102
in order to use the liquid level prediction long-term and short-term memory network obtained by real historical operation data training,
Figure DEST_PATH_IMAGE103
and predicting the long-term and short-term memory network for the manifold pressure obtained by training by utilizing real historical operation data.
3. The multi-agent deep reinforcement learning-based energy-saving dispatching method for the water intake pump station of the water plant as claimed in claim 1, wherein the deep reinforcement learning training process of the multi-agent comprises the following steps:
step 4.1: obtaining a current environment state according to historical operating data of a water taking pump station;
and 4.2: the actor network of each agent outputs the current behavior of each water taking pump according to the current environment state;
step 4.3: according to the current environment state and the current behavior, the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot are obtained by utilizing a water taking pump station scheduling environment model under the state and the behavior, and the environment state and the reward of the next time slot are reconstructed by utilizing the information;
step 4.4: sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool;
step 4.5: if the weight parameters of the deep neural network of the actor network inside the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, firstly updating the weight of the critic network by utilizing a multi-intelligent-actor-attention-critic reinforcement learning algorithm, and then updating the actor network;
step 4.6: and (3) judging whether the training process is finished or not after the weight parameters of the deep neural network of the intelligent body are updated, if not, skipping to the step 4.1, otherwise, terminating the training process, and using each actor network obtained by training as an optimal strategy of the corresponding intelligent body for the control deployment of the actual water taking pump station.
4. The multi-agent deep reinforcement learning-based energy-saving scheduling method for water intake pumping stations in water plants as claimed in claim 3, wherein a joint loss function minimization method is adopted for weight update of a critic network, and the calculation formula of the joint loss function is as follows:
Figure DEST_PATH_IMAGE104
wherein:
Figure DEST_PATH_IMAGE105
in order to be a function of the joint loss,
Figure DEST_PATH_IMAGE106
is a pool of experiences for storage
Figure DEST_PATH_IMAGE107
Figure DEST_PATH_IMAGE108
Which is representative of the desired operation,
Figure DEST_PATH_IMAGE109
represents a discount coefficient;
Figure DEST_PATH_IMAGE110
a vector of parameters representing all target actor networks, namely:
Figure DEST_PATH_IMAGE111
wherein:
Figure DEST_PATH_IMAGE112
representing an agent
Figure DEST_PATH_IMAGE113
Target actor network parameters of (1);
Figure DEST_PATH_IMAGE114
and
Figure DEST_PATH_IMAGE115
a sharing weight parameter representing all agent reviewer networks and the target reviewer network,
Figure DEST_PATH_IMAGE116
is a temperature parameter that balances the maximum entropy and the maximum reward;
Figure DEST_PATH_IMAGE117
representing the expected value of the selected action subject to the network policy of the target actor,
Figure DEST_PATH_IMAGE118
acting on behalf of a targetThe policy of the network of the user,
Figure 862944DEST_PATH_IMAGE022
representing the total number of agents;
Figure DEST_PATH_IMAGE119
a behavior value function representing the network state of the critic;
Figure DEST_PATH_IMAGE120
representing a behavior value function of a next time slot state of the target critic network;
Figure DEST_PATH_IMAGE121
a current award;
the weight updating of the actor network adopts a gradient ascending method, and a gradient updating calculation formula is as follows:
Figure DEST_PATH_IMAGE122
in the formula:
Figure DEST_PATH_IMAGE123
representing an agent
Figure DEST_PATH_IMAGE124
The policy function of the actor network may be,
Figure DEST_PATH_IMAGE125
in addition to
Figure DEST_PATH_IMAGE126
Average value of other agent behaviors;
Figure DEST_PATH_IMAGE127
representing the gradient of the network of actors,
Figure DEST_PATH_IMAGE128
representing the expected value of the action taken when it complies with the actor's network policy,
Figure DEST_PATH_IMAGE129
representing the partial derivation of a logarithmic function.
CN202211475230.4A 2022-11-23 2022-11-23 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning Active CN115544899B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211475230.4A CN115544899B (en) 2022-11-23 2022-11-23 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211475230.4A CN115544899B (en) 2022-11-23 2022-11-23 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN115544899A CN115544899A (en) 2022-12-30
CN115544899B true CN115544899B (en) 2023-04-07

Family

ID=84721315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211475230.4A Active CN115544899B (en) 2022-11-23 2022-11-23 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115544899B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116187208B (en) * 2023-04-27 2023-08-01 深圳市广汇源环境水务有限公司 Drainage basin water quantity and quality joint scheduling method based on constraint reinforcement learning
CN116738874B (en) * 2023-05-12 2024-01-23 珠江水利委员会珠江水利科学研究院 Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning
CN117588394B (en) * 2024-01-18 2024-04-05 华土木(厦门)科技有限公司 AIoT-based intelligent linkage control method and system for vacuum pump

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114279042A (en) * 2021-12-27 2022-04-05 苏州科技大学 Central air conditioner control method based on multi-agent deep reinforcement learning
CN115289619A (en) * 2022-07-28 2022-11-04 安徽大学 Subway platform HVAC control method based on multi-agent deep reinforcement learning

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11657266B2 (en) * 2018-11-16 2023-05-23 Honda Motor Co., Ltd. Cooperative multi-goal, multi-agent, multi-stage reinforcement learning
US11586974B2 (en) * 2018-09-14 2023-02-21 Honda Motor Co., Ltd. System and method for multi-agent reinforcement learning in a multi-agent environment
US11295174B2 (en) * 2018-11-05 2022-04-05 Royal Bank Of Canada Opponent modeling with asynchronous methods in deep RL
CN109729528B (en) * 2018-12-21 2020-08-18 北京邮电大学 D2D resource allocation method based on multi-agent deep reinforcement learning
CN110458443B (en) * 2019-08-07 2022-08-16 南京邮电大学 Smart home energy management method and system based on deep reinforcement learning
CN111144793B (en) * 2020-01-03 2022-06-14 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning
CN112491818B (en) * 2020-11-12 2023-02-03 南京邮电大学 Power grid transmission line defense method based on multi-agent deep reinforcement learning
CN112540535B (en) * 2020-11-13 2022-08-30 南京邮电大学 Office building thermal comfort control system and method based on deep reinforcement learning
US20220230080A1 (en) * 2021-01-20 2022-07-21 Honda Motor Co., Ltd. System and method for utilizing a recursive reasoning graph in multi-agent reinforcement learning
CN114362187B (en) * 2021-11-25 2022-12-09 南京邮电大学 Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
CN114357569A (en) * 2021-12-13 2022-04-15 南京邮电大学 Commercial building HVAC control method and system based on evolution deep reinforcement learning
CN114971819A (en) * 2022-03-28 2022-08-30 东北大学 User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning
CN115291625A (en) * 2022-07-15 2022-11-04 同济大学 Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114279042A (en) * 2021-12-27 2022-04-05 苏州科技大学 Central air conditioner control method based on multi-agent deep reinforcement learning
CN115289619A (en) * 2022-07-28 2022-11-04 安徽大学 Subway platform HVAC control method based on multi-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN115544899A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN115544899B (en) Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
Huang Enhancement of hydroelectric generation scheduling using ant colony system based optimization approaches
CN111242443B (en) Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
CN103729695A (en) Short-term power load forecasting method based on particle swarm and BP neural network
WO2023070293A1 (en) Long-term scheduling method for industrial byproduct gas system
CN116187601B (en) Comprehensive energy system operation optimization method based on load prediction
CN112012875B (en) Optimization method of PID control parameters of water turbine regulating system
Yang et al. Optimal energy operation strategy for we-energy of energy internet based on hybrid reinforcement learning with human-in-the-loop
CN112460741A (en) Control method of building heating, ventilation and air conditioning system
CN117057553A (en) Deep reinforcement learning-based household energy demand response optimization method and system
CN115345380A (en) New energy consumption electric power scheduling method based on artificial intelligence
CN113869795B (en) Long-term scheduling method for industrial byproduct gas system
CN115986839A (en) Intelligent scheduling method and system for wind-water-fire comprehensive energy system
CN114322208B (en) Intelligent park air conditioner load regulation and control method and system based on deep reinforcement learning
CN116436033A (en) Temperature control load frequency response control method based on user satisfaction and reinforcement learning
Yang et al. Data-driven optimal dynamic dispatch for hydro-PV-PHS integrated power system using deep reinforcement learning approach
CN115411776B (en) Thermoelectric collaborative scheduling method and device for residence comprehensive energy system
CN115526504A (en) Energy-saving scheduling method and system for water supply system of pump station, electronic equipment and storage medium
CN114372645A (en) Energy supply system optimization method and system based on multi-agent reinforcement learning
Salvador et al. Historian data based predictive control of a water distribution network
CN115239133A (en) Multi-heat-source heat supply system collaborative optimization scheduling method based on layered reinforcement learning
CN111275572B (en) Unit scheduling system and method based on particle swarm and deep reinforcement learning
Cheng et al. A cyber physical system model using genetic algorithm for actuators control
Xin et al. Genetic based fuzzy Q-learning energy management for smart grid
Silva et al. Framework for the development of a digital twin for solar water heating systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant