CN115544899B - Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning - Google Patents
Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning Download PDFInfo
- Publication number
- CN115544899B CN115544899B CN202211475230.4A CN202211475230A CN115544899B CN 115544899 B CN115544899 B CN 115544899B CN 202211475230 A CN202211475230 A CN 202211475230A CN 115544899 B CN115544899 B CN 115544899B
- Authority
- CN
- China
- Prior art keywords
- water
- agent
- network
- time slot
- pump station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a multi-agent deep reinforcement learning-based energy-saving scheduling method for a water intake pump station of a water plant, which comprises the following steps: (1) On the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, modeling the total energy consumption minimization problem of the water intake pump station of the water plant into a Markov game, and designing a corresponding environment state, behavior and reward function; (2) Constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network; (3) Training a deep reinforcement learning agent based on a scheduling environment model and a multi-agent actor-attention-critic reinforcement learning algorithm; (4) And deploying the intelligent agent strategy obtained by training into an actual system. Compared with the existing method, the method provided by the invention has stronger system safety maintenance capability, energy-saving potential (up to 12.8%) and universality.
Description
Technical Field
The invention relates to an energy-saving dispatching method for a water taking pump station of a water plant based on multi-agent deep reinforcement learning, and belongs to the crossing field of dispatching and artificial intelligence of the water taking pump station of the water plant.
Background
Water is the basis of sustainable development of economic society, but the comprehensive power consumption of water treatment plants is large due to the backward of processes, equipment or water treatment systems, wherein the power consumption of water taking pump stations accounts for the major part. The traditional pump station dispatching mainly depends on engineering experience to adjust the collocation of a pump set and adjust the frequency of a variable frequency pump. The regulation is qualitative regulation, the labor cost is high, the energy-saving level is unstable, and even the energy consumption is increased. In addition, the frequent opening and closing of the pump set can cause great impact on the pipeline, and the rapid change of the pressure can easily cause the water hammer phenomenon. Therefore, the energy consumption of the water taking pump station is reduced under the conditions of guaranteeing the water supply safety and meeting the water supply requirement, and the method has important significance for reducing the operation cost of a water plant, saving the urban power consumption and reducing the carbon dioxide emission.
Aiming at the energy-saving optimization scheduling research of a water intake pump station, a plurality of methods such as nonlinear programming, dynamic programming, genetic algorithm and the like are provided in the existing research. Although the above methods have certain advantages, these methods require knowledge of an explicit scheduling model of the water intake pumping station (e.g., an explicit relational expression between total energy consumption and water intake pumping station status and scheduling decisions). Since the performance of the water intake pumping station depends on many factors (such as internal parameters (lift, shaft power, motor speed, motor frequency, motor efficiency), external environments (such as water intake and water supply), and loss of liquid in the water pump due to limited blade number, friction, impact and leakage, etc.), it is very difficult to establish a water intake pumping station scheduling model which is accurate and easy to control. Furthermore, the research work considering the above method does not consider the loss of the water pump by frequently switching the pump group.
With the development of the internet of things technology and the artificial intelligence technology, a large amount of historical operating data of the water intake pump station of the water plant is easy to obtain and can be effectively utilized. For example: some work proposes a water intake pump station scheduling method based on data driving, such as a water intake pump station scheduling method combining a particle swarm algorithm and a support vector regression algorithm. However, this method requires prediction of water supply amount and the like for 24 hours in the future and rolling generation of a pump group schedule recommendation table, and thus is prone to introduce errors and to cause a large amount of calculation tasks. In addition, some of the work proposed a control method for a water distribution system based on reinforcement learning and deep reinforcement learning, using an algorithm comprising: q-learning, duel depth Q-network, near-end strategy gradient with knowledge assistance. Although the water pump station dispatching method based on reinforcement learning and deep reinforcement learning does not need to know a clear dispatching model of the water taking pump station, the water pump station dispatching method does not carry out research aiming at the energy saving problem of the water taking pump station, and a mode of controlling a water pump by a single intelligent body is adopted. When jointly considering the energy-saving scheduling of the fixed-frequency pump and the variable-frequency pump, the single intelligent body is directly adopted to control the water taking pump, so that the action space of the intelligent body is increased sharply, the learning efficiency is low, and the energy consumption is effectively saved on the premise of maintaining the safe work of the water taking pump and meeting the water supply requirement.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a water intake pump station energy-saving scheduling method for a water plant based on multi-agent deep reinforcement learning, and aims to reduce the energy consumption of the water intake pump station under the condition of maintaining the operation safety of a system. The method adopts the steps of carrying out low-dimensional discretization on the continuous motion space of the variable-frequency pump and controlling the water taking pump group by using a plurality of intelligent bodies. In order to realize the high-efficiency training of the multi-agent, a water intake pump station dispatching environment model (which is a black box model and does not need to know a white box model in the existing research) is constructed by using historical operating data, and a multi-agent actor-attention-critic reinforcement learning algorithm is adopted as a training algorithm, so that the water intake pump station energy-saving dispatching method with high expandability and high efficiency is finally obtained. The method does not need to predict any uncertain parameter or know a clear scheduling model of the water taking pump station, and has the advantages of low calculation complexity, obvious energy-saving effect and the like.
The invention discloses a water intaking pump station energy-saving scheduling method based on multi-agent deep reinforcement learning, which is characterized by comprising the following steps:
(1) On the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, modeling the total energy consumption minimization problem of the water taking pump station into a Markov game, designing the corresponding environmental state, behavior and reward function of the Markov game, and constructing a multi-agent about the water taking pump station system.
(2) And constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network.
(3) And carrying out deep reinforcement learning training on the multi-agent based on the water taking pump station dispatching environment model and the multi-agent actor-attention-critic reinforcement learning algorithm.
(4) And deploying the multi-agent strategy obtained by training into an actual water taking pump station system.
Further, the expression of the problem of minimizing the total energy consumption of the water intake pump station is as follows:
in the formula (I), the compound is shown in the specification,is composed ofTime slot (,Representing the total number of the optimized time slots) total energy consumption of the water intake pump station;is a desired operator whose operation is primarily directed to uncertainty parameters (such as water supply);is composed ofThe working frequency (aiming at a variable frequency pump) or the state (aiming at a fixed frequency pump) of the time slot water taking pump station;is composed ofThe liquid level of the time slot water storage tank,andthe minimum and maximum liquid levels are within the safe range of the reservoir;is composed ofPressure intensity of main pipe of time slot water taking pump station,Is composed ofThe pressure intensity of the main pipe of the time slot water taking pump station,for the highest manifold pressure difference within the safe range,to indicate cutoff toThe time slot water taking pump station switches times in one day,the maximum switching times of the water taking pump station in a safety range in one day are obtained.
in the formula (I), the compound is shown in the specification,,taking 1,2, …,,to representThe number of water pumps that need to be controlled,and the total number of agents in the markov game (each agent is responsible for controlling 1 water pump). Wherein:is a multi-agentThe state of the environment of the time slot,is shown asA local observed state of an individual fixed or variable frequency pump agent,is composed ofThe relative time index of the current absolute time of the time slot within a day,is composed ofThe liquid level of the time slot reservoir,is composed ofThe pressure of the main pipe of the water intake pump station in the time slot,is composed ofThe time slot reservoir borrows water (i.e. the amount of water called into the reservoir from other waterworks),is composed ofThe timeslot reservoir water supply (i.e. the amount of water called out from the reservoir),to be cut off toThe time slot water taking pump station switches times in one day,as an agentThe controlled water pump is arranged atTime slot switch state.
in the formula (I), the compound is shown in the specification,indicating the number of water pumps that need to be controlled,is an integer which is the number of the whole,taking 1,2, …,. Wherein whenWhen the temperature of the water is higher than the set temperature,is less thanInteger of (2), agentIn order to provide a constant-frequency pump,for constant frequency pumping inOn-off state of the time slot whenIntelligent body of time, timing frequency pumpThe operation is closed, and the operation is carried out,and starting the fixed-frequency pump intelligent body. When the temperature is higher than the set temperatureTime, intelligent agentIs a variable-frequency pump and is characterized in that,,is a variable frequency pumpThe increase or decrease of the frequency of the time slot,indicating that the frequency pump is off,andrespectively indicating frequency reduction of variable frequency pumpAnd increase,,Indicating that the variable frequency pump frequency is unchanged.
Further, the expression of the winning function of the Markov game is as follows:
in the formula (I), the compound is shown in the specification,is composed ofThe end of the timeslot is used to control the reward received by the agent for each pump, where:is composed ofThe penalty cost associated with the time slot and the energy consumption of the water intake pumping station,is composed ofThe penalty cost associated with a time slot versus reservoir level violation of a safe range,is composed ofThe time slot is associated with a penalty cost which is violated from the pressure difference safety range of the water intake pump station main pipe,is composed ofPenalty cost related to the combined switching cost of the time slot and the water intaking pumping station,is composed ofAnd the penalty caused by the fact that the time slot and the water getting pump station are combined and switched for times which violate the safety range.For liquid level violation of reservoirThe importance coefficient of the penalty incurred by the back safety margin versus the penalty cost associated with energy consumption,the penalty incurred for a manifold pressure differential violation of the safety margin versus the penalty cost associated with energy consumption,the penalty incurred for switching the pumping station relative to the penalty cost associated with energy consumption,and the importance coefficient of punishment caused by the fact that the switching times of the water taking pumping station violate the safety range relative to the punishment cost related to energy consumption.
Further, the water intaking pump station scheduling environment model is constructed as follows:
in the formula (I), the compound is shown in the specification,is composed ofThe liquid level of the time slot water storage tank,is composed ofThe pressure intensity of the main pipe of the time slot water taking pump station,is composed ofThe time slot water taking pump station consumes energy,is composed ofThe water borrowing amount of the time slot water storage tank,is composed ofThe water supply amount of the time slot water storage tank,for energy consumption prediction Long Short Term Memory (LSTM) networks trained using real historical operational data,in order to utilize the liquid level prediction long-short term memory (LSTM) network obtained by real historical operation data training,long-short term memory (LSTM) networks are predicted for manifold pressures trained using real historical operating data.
Further, the multi-agent for the water intake pumping station system comprises: the intelligent agent quantity equals with water pump quantity, and every water pump is controlled by 1 intelligent agent. Each agent contains 1 actor network, 1 target actor network, 1 critic network, 1 target critic network and 1 attention network. The actor network and target actor network of each agent are identical in structure, and the reviewer network and target reviewer network of each agent are identical in structure.
In particular, an agent(i.e. with water pump)Corresponding agent) is a multi-layer deep neural network, and the actor network input isThe network output of the actor isThe activation function adopted by the deep neural network hidden layer is a leakage rectification function, and the activation function adopted by the deep neural network output layer is a normalized exponential function. The critic network in each intelligent agent comprises 3 perceptron modules which are respectively a first perceptron module, a second perceptron module and a third perceptron module. Wherein: the input to the first perceptron module is a local observed stateThe observation state code value is output after passing through the first sensor module(ii) a The input to the second perceptron module is a local observed stateAnd behaviorsThe output is a joint coded value of observed state and behavior(ii) a The output of the second sensor module in the critic networks of all the agents is used as the input of the attention network; the attention network returns other agents to the current agentContribution of (2)Said contributionAnd the output of the first perceptron moduleThe output of the third perceptron module is a function of the behavior values of all the agents currently in use as input to the third perceptron module,A shared weight parameter representing all agent critic networks,representing an agentThe multilayer perceptron of (1).
The attention network is internally provided withA sub-network of the same structure, corresponding toAn individual agent; with sub-networksFor example, its input comprises the output of a second perceptron module in all agent critics' networksSaid sub-networkOutput to all other agent pairsIs a contribution value ofThe contribution value is a weighted sum of outputs obtained after output values of the second perceptron modules in the critic networks of all other agents are sent to the single-layer perceptron through linear transformation, namely:wherein: weighting coefficientReflect the intelligent agentSecond perceptron module output value in critic networkAnd other agentsSecond perceptron module output value in critic networkThe similarity between the two groups is similar to each other,is a shared matrix of which the number of antennas is,is the leak ReLU activation function.
,Andare a shared matrix and are respectively pairedAndthe linear transformation is carried out, and the linear transformation is carried out,。
further, the deep reinforcement learning training process of the multi-agent comprises the following steps:
(1) And obtaining the current environment state according to the historical operating data of the water taking pump station.
(2) And the actor network of each agent outputs the current behavior of each water taking pump according to the current environment state.
(3) And according to the current environment state and the current behavior, utilizing a water taking pump station scheduling environment model to obtain the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot under the state and the behavior, and utilizing the information to reconstruct the environment state and the reward of the next time slot.
(4) And sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool.
(5) If the weight parameters of the deep neural network in the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, and updating the weight of the critic network by utilizing a multi-intelligent-body actor-attention-critic reinforcement learning algorithm and then updating the actor network.
Specifically, the critic network parameter update is performed according to the joint loss function minimization as follows:
wherein:in order to be a function of the joint loss,is a pool of experiences for storage;Which is representative of the desired operation,representing a discount coefficient;a vector of parameters representing all target actor networks, namely:wherein:representing an agentTarget actor network parameters of (1);anda shared weight parameter representing all agent reviewer networks and the target reviewer network,is a temperature parameter that balances the maximum entropy and the maximum reward.Representing the expected value of the chosen action when it complies with the target actor's network policy,policies representing a network of target actors.
The network updating of the actor is carried out by adopting a gradient ascending method, and the specific gradient updating formula is as follows:
in the formula:representing an agentPolicy function of actor network (i.e. from observed state)ToThe probability distribution mapping of (c),in addition toAverage value of other agent behaviors.Representing the gradient of the network of actors,representing the expected value of the action taken when it complies with the actor's network policy,express logarithmic functionAnd (5) calculating partial derivatives.
(6) And (3) judging whether the training process is finished or not after the weight parameters of the intelligent body deep neural network are updated, if not, skipping to the step (1), otherwise, terminating the training process, and using each actor network obtained by training as an optimal strategy (namely function mapping from a local observation state to a water pump control action) of the corresponding intelligent body for the control deployment of the actual water taking pump station.
Has the advantages that: compared with the prior art, the energy-saving dispatching method of the water intake pump station based on multi-agent deep reinforcement learning has the following beneficial effects that:
(1) Compared with scheduling methods based on nonlinear programming, dynamic programming and the like, the method disclosed by the invention does not need to know a clear dynamic model of the water taking pump station. Different from a scheduling method based on prediction, the multi-agent strategy obtained by the method only outputs the control decision of the water taking pump station according to the observation state of the current time slot, so that any uncertain parameter does not need to be predicted. In addition, the output process of the multi-agent strategy only relates to the forward conduction of the multi-layer deep neural network, and the execution time is in millisecond order, so that the computation complexity is extremely low. Therefore, the method has strong universality.
(2) Compared with a single-agent water pump scheduling method based on reinforcement learning, the method can utilize an attention mechanism among multiple agents to realize efficient coordinated scheduling among a plurality of water taking pumps; the energy consumption can be obviously reduced on the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump within a safety range. Therefore, the method of the invention has high efficiency.
Drawings
Fig. 1 is a flowchart of a water intake pumping station scheduling control method provided by the present invention.
FIG. 2 is a graph of the convergence of a training curve for an embodiment of the method of the present invention.
FIG. 3 is a graph comparing the average energy consumption of an embodiment of the method of the present invention with other embodiments.
FIG. 4 is a graph comparing the average liquid level crossing limits of an embodiment of the method of the present invention with other solutions.
Figure 5 is a graph comparing the mean manifold pressure overshoot for an embodiment of the method of the present invention with other embodiments.
Fig. 6 is a graph comparing the average water pump switching times of the embodiment of the method of the present invention with those of other schemes.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely for illustrating the technical solutions of the present invention more clearly, and the scope of the present invention should not be limited thereby.
As shown in fig. 1, a design flowchart of the energy-saving dispatching control method for a water intake pumping station based on multi-agent deep reinforcement learning provided by the invention comprises the following steps:
step 1: on the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, the problem of minimizing the total energy consumption of the water taking pump station is converted into a Markov game, and a corresponding environment state, behavior and reward function are designed.
And 2, step: and constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network.
And step 3: deep reinforcement learning agents are trained based on a dispatch environment model and a multi-agent actor-attention-critic reinforcement learning algorithm.
And 4, step 4: and deploying the intelligent agent strategy obtained by training into an actual system.
In step 1, the behavior of the markov game includes that the reservoir liquid level, the manifold pressure difference and the water pump switching times are required to be maintained within a safe range again: frequency decision of the frequency pump and switch decision of the fixed frequency pump; the constraints to be considered are: the constraints related to the reservoir liquid level, the constraint related to the manifold pressure difference and the constraint related to the water pump switching times are as follows:
(1) Liquid level of reservoirIn a safe range, i.e.Wherein:andrespectively representing the upper limit and the lower limit of the safe liquid level of the reservoir.
(2) The manifold pressure difference being less than the upper limit of the pressure difference, i.e.Wherein:water pump with indicationThe highest manifold pressure differential that can be accepted,to representThe manifold pressure at the time of day,representing the manifold pressure at time t-1.
(3) Number of times of switching water pumpLess than the safe switching range, i.e.Wherein:indicating the highest number of handovers acceptable within a day.
In step 1 above, the markov game may be defined by a series of states, behaviors, state transition functions, and reward functions. In the Markov game, each agent maximizes its expected return (i.e., the expected value of the jackpot) based on the current state and the selected behavior. The environment state, the behavior and the reward function of the Markov game are respectively designed as follows:
(1) The environmental state.Is a multi-agentThe environmental state of the time slot and the environmental state of the multi-agent are designed as follows:。time slotWater pump frequency decision related intelligent agentFor locally observing statesIs shown, in which:respectively represent:is composed ofThe relative time index of the current absolute time of the time slot within a day,is composed ofThe liquid level of the time slot water storage tank,is composed ofThe pressure intensity of the main pipe of the time slot water taking pump station,is composed ofThe water borrowing amount of the time slot water storage tank,is composed ofThe water supply amount of the time slot water storage tank,to be cut off toThe switching frequency condition of the water getting pump station of the time slot in one day.As an agentIn thatTime-slotted pump on-off conditions.
(3) A reward function. First water taking pump stationThe water pump is related to the intelligent bodyFor reward functions of time slotsIt comprises 5 components:
4. Punishment caused by closing of water taking pump station for time slot switching,Water pump with indicationIn a time slotOn and off state of whenWater pump with indicationIn a time slotIs turned off whenWater pump with indicationIn a time slotWhen the switch is turned on;water pump with indicationIn a time slotOn and off states of whenWater pump with indicationIn a time slotIs turned off whenWater pump with indicationIn a time slotAnd then is turned on.
In the formula:importance of penalty for reservoir level violations of safety margin versus penalty cost associated with energy consumptionThe number of the first and second groups is,the penalty incurred for a manifold pressure differential violation of the safety margin versus the penalty cost associated with energy consumption,the penalty incurred for switching the pumping station relative to the penalty cost associated with energy consumption,and the importance coefficient of punishment caused by the fact that the switching times of the water taking pumping station violate the safety range relative to the punishment cost related to energy consumption.
In the step 2, the water intake pump station dispatching system aims to minimize the energy consumption of the water intake pump station on the premise of maintaining the liquid level of the reservoir, the pressure difference of the header pipe and the switching frequency of the water pump within a safe range. In order to establish a water intake pump station dispatching environment model, historical data and a long-short term memory (LSTM) network are adopted for construction. Specifically, the LSTM network outputs the energy consumption of the water taking pump station scheduling by inputting the state and the action of the water taking pump stationLiquid levelAnd manifold pressure。
In step 3, a multi-agent actor-attention-critic reinforcement learning algorithm is used for training an optimal decision of the water taking pump station dispatching system for maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water taking pump stations within a safety range. The specific steps of training the deep reinforcement learning agent are as follows:
(1) Obtaining a current environment state according to historical operating data of a water taking pump station;
(2) The actor network of each agent outputs the current behavior of the water taking pump station according to the current environment state;
(3) According to the historical environment state and the current behavior, obtaining the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot by using a scheduling environment model under the state and the behavior, and reconstructing the environment state and the reward of the next time slot by using the information;
(4) Sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool;
(5) If the weight parameters of the deep neural network in the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, updating the weight of the deep neural network by using a multi-intelligent-body actor-attention-critic reinforcement learning algorithm, judging whether the training process is finished after the updating is finished, if not, skipping to the step (1), otherwise, terminating the training process. The actor network obtained after training is used for actual deployment as an optimal strategy (namely, function mapping from a local observation state to a water pump control action) of each agent.
Compared with the prior art, the embodiment of the invention can obtain the following beneficial effects:
1) The method provided by the invention has universality. The energy-saving dispatching method for the water intake pump station based on the multi-agent actor-attention-critic reinforcement learning algorithm is provided. Because the obtained intelligent agent strategy only obtains the control decision of each water intake pump according to the observation state of the current time slot, the method does not need to know the prior information or prediction uncertainty parameters (such as water supply quantity) of any uncertainty system parameters and the clear scheduling mechanism model of the water intake pump station;
2) The method provided by the invention has high efficiency. Compared with the existing scheduling method, the method can reduce the energy consumption by 12.8 percent while maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safe range.
Fig. 2 is a graph showing the convergence of the training curve according to the embodiment of the present invention. As can be seen from the curves, the training reward generally shows a growing trend and gradually becomes stable.
Fig. 3 is a graph showing the energy consumption of the embodiment of the method of the present invention compared with other embodiments. The scheme is a real scheduling scheme of a water intake pump station. The water intake, water supply and water intake pump station parameter data used by the invention are all from the actual data of the water plant from 11/month 1 in 2020 to 4/month 30 in 2021. Compared with the first scheme, the method can save the average energy consumption by 12.8 percent.
FIG. 4 is a graph showing the comparison between the average liquid level of the example of the method of the present invention and the average liquid level of other solutions. Compared with the first proposal, the average liquid level threshold under the method is reduced by 66.2 percent.
Fig. 5 is a graph comparing the average manifold pressure difference of an embodiment of the method of the present invention with other embodiments. As can be seen from the figure: compared with the first scheme, the method has smaller pressure difference of the main pipe, and the pressure difference of the main pipe of the method is always within a safety range.
Fig. 6 is a graph comparing the average water pump switching times of the embodiment of the method of the present invention with those of other solutions. Compared with the first scheme, the method can reduce the pump switching times by 50%.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (4)
1. A multi-agent deep reinforcement learning-based energy-saving scheduling method for a water intake pump station of a water plant is characterized by comprising the following steps:
step 1: on the premise of maintaining the liquid level of a water storage tank, the pressure difference of a main pipe and the switching times of a water pump in a safety range, modeling the total energy consumption minimization problem of a water taking pump station into a Markov game, designing the corresponding environmental state, behavior and reward function of the Markov game, and constructing a multi-agent about the water taking pump station system;
and 2, step: constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network;
and 3, step 3: carrying out deep reinforcement learning training on the multi-agent based on a water taking pump station scheduling environment model and a multi-agent actor-attention-critic reinforcement learning algorithm;
and 4, step 4: deploying the multi-agent strategy obtained by training into an actual water taking pump station system;
the expression of the problem of minimizing the total energy consumption of the water taking pump station is as follows:
in the formula (I), the compound is shown in the specification,is composed ofThe total energy consumption of the time slot water taking pump station,,representing the total number of the optimized time slots;is a desired operator;is composed ofThe working frequency or the fixed frequency pump state of a variable frequency pump of a time slot water taking pump station;is composed ofThe liquid level of the time slot reservoir,andthe lowest liquid level and the highest liquid level are in a safe range of the reservoir;is composed ofThe pressure of the main pipe of the water intake pump station in the time slot,is composed ofThe pressure of the main pipe of the water intake pump station in the time slot,for the highest manifold pressure difference within the safe range,to indicate cutoff toThe time slot water taking pump station switches times in one day,the maximum switching times of the water taking pump station in a safety range in one day are set;
in the formula (I), the compound is shown in the specification,,taking 1,2, …,,indicating the number of water pumps that need to be controlled,meanwhile, the total number of agents in the Markov game is also provided, and each agent is responsible for controlling 1 water taking pump; wherein:for multiple agents inThe state of the environment of the time slot,is shown asA local observed state of an individual fixed frequency pump agent or variable frequency pump agent,is composed ofThe relative time index of the current absolute time of the time slot within a day,is composed ofThe liquid level of the time slot reservoir,is composed ofThe pressure intensity of the main pipe of the time slot water taking pump station,is composed ofThe water borrowing amount of the time slot water storage tank,is composed ofThe water supply amount of the time slot water storage tank,to be cut off toThe time slot water taking pump station switches times in one day,to make an intelligenceEnergy bodyThe water pump is controlled atA time slot switch state;
in the formula (I), the compound is shown in the specification,indicating the number of water pumps that need to be controlled,is an integer which is the number of the whole,taking 1,2, …,(ii) a Wherein whenWhen the temperature of the water is higher than the set temperature,is less thanInteger of (2), agentIn order to provide a constant-frequency pump,for constant frequency pumping atOn-off state of the time slot whenIntelligent body of time, timing frequency pumpThe operation is closed, and the operation is carried out,starting the intelligent constant-frequency pump body; when the temperature is higher than the set temperatureTime, intelligent agentIs a variable-frequency pump, and is characterized in that,,for variable frequency pumps inThe increase or decrease of the frequency of the time slot,indicating that the frequency pump is off and,andrespectively indicating frequency reduction of variable frequency pumpAnd increase of,,The frequency of the variable frequency pump is unchanged;
the rewarding function expression in the Markov game is as follows:
in the formula (I), the compound is shown in the specification,is composed ofThe end of the timeslot is used to control the reward received by the agent of each pump, wherein:is composed ofThe penalty cost associated with the time slot and the energy consumption of the water intake pumping station,is composed ofThe penalty cost associated with a time slot versus reservoir level violation of a safe range,is composed ofThe time slot is associated with a penalty cost which is violated from the pressure difference safety range of the water intake pump station main pipe,is composed ofPenalty cost related to the combined switching cost of the time slot and the water intaking pumping station,is composed ofThe time slot and the water taking pump station combined switching times violate the punishment caused by the safety range;an importance coefficient for penalties resulting from reservoir level violations of safety limits versus energy consumption related penalty costs,the penalty incurred for a manifold pressure differential violation of the safety margin versus the penalty cost associated with energy consumption,the penalty incurred for switching the pumping station relative to the penalty cost associated with energy consumption,the importance coefficient of punishment caused by the fact that the switching times of the water taking pump station violate the safety range relative to the punishment cost related to energy consumption;
the multi-agent about water intaking pump station system includes: the number of the intelligent agents is equal to that of the water pumps, and each water pump is controlled by 1 intelligent agent; each intelligent agent internally comprises 1 actor network, 1 target actor network, 1 critic network, 1 target critic network and 1 attention network; the actor network and the target actor network of each intelligent body have the same structure, and the reviewer network and the target reviewer network of each intelligent body have the same structure;
intelligent agentThe actor network inputs areThe network output of the actor is(ii) a The critic network in each intelligent agent comprises 3 perceptron modules which are respectively a first perceptron module, a second perceptron module and a third perceptron module; wherein: the input to the first perceptron module is a local observed stateThe observation state code value is output after passing through the first sensor module(ii) a The input to the second perceptron module is a local observed stateAnd behaviorsThe output is a joint coded value of observed state and behavior(ii) a The output of the second sensor module in the critic networks of all the agents is used as the input of the attention network; attention network returns other agents to current agentContribution of (2)Said contributionAnd the output of the first perceptron moduleThe output of the third perceptron module is a function of the state behavior values of all agents present as input to the third perceptron module,A shared weight parameter representing all agent critic networks,representing an agentThe multilayer perceptron of (1); the attention network is internally provided withA sub-network of the same structure, corresponding toAn individual agent; sub-networksThe input comprises the output of the second perceptron module in all the intelligent agent criticizing networkSub-networksOutput to all other agent pairsIs a contribution value of(ii) a The contribution valueThe weighted sum of the output values of the second perceptron module in the critic networks of all other agents after being linearly transformed and sent into the single-layer perceptron is as follows:wherein: weighting coefficientReflect the intelligent agentSecond perceptron module output value in critic networkAnd other agentsThe output value of the second perceptron module in the critic networkThe similarity between the two groups is similar to each other,is a shared matrix of which the number of channels is,is the leak ReLU activation function.
2. The energy-saving dispatching method for the water intake pumping station of the water plant based on the multi-agent deep reinforcement learning of claim 1, wherein the dispatching environment model of the water intake pumping station is constructed as follows:
in the formula (I), the compound is shown in the specification,,is composed ofThe liquid level of the time slot water storage tank,is composed ofThe pressure intensity of the main pipe of the time slot water taking pump station,is composed ofThe time slot water taking pump station consumes energy,is composed ofThe water borrowing amount of the time slot water storage tank,is composed ofThe water supply amount of the time slot water storage tank,in order to predict the long-term and short-term memory network by using the energy consumption obtained by training the real historical operation data,in order to use the liquid level prediction long-term and short-term memory network obtained by real historical operation data training,and predicting the long-term and short-term memory network for the manifold pressure obtained by training by utilizing real historical operation data.
3. The multi-agent deep reinforcement learning-based energy-saving dispatching method for the water intake pump station of the water plant as claimed in claim 1, wherein the deep reinforcement learning training process of the multi-agent comprises the following steps:
step 4.1: obtaining a current environment state according to historical operating data of a water taking pump station;
and 4.2: the actor network of each agent outputs the current behavior of each water taking pump according to the current environment state;
step 4.3: according to the current environment state and the current behavior, the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot are obtained by utilizing a water taking pump station scheduling environment model under the state and the behavior, and the environment state and the reward of the next time slot are reconstructed by utilizing the information;
step 4.4: sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool;
step 4.5: if the weight parameters of the deep neural network of the actor network inside the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, firstly updating the weight of the critic network by utilizing a multi-intelligent-actor-attention-critic reinforcement learning algorithm, and then updating the actor network;
step 4.6: and (3) judging whether the training process is finished or not after the weight parameters of the deep neural network of the intelligent body are updated, if not, skipping to the step 4.1, otherwise, terminating the training process, and using each actor network obtained by training as an optimal strategy of the corresponding intelligent body for the control deployment of the actual water taking pump station.
4. The multi-agent deep reinforcement learning-based energy-saving scheduling method for water intake pumping stations in water plants as claimed in claim 3, wherein a joint loss function minimization method is adopted for weight update of a critic network, and the calculation formula of the joint loss function is as follows:
wherein:in order to be a function of the joint loss,is a pool of experiences for storage;Which is representative of the desired operation,represents a discount coefficient;a vector of parameters representing all target actor networks, namely:wherein:representing an agentTarget actor network parameters of (1);anda sharing weight parameter representing all agent reviewer networks and the target reviewer network,is a temperature parameter that balances the maximum entropy and the maximum reward;representing the expected value of the selected action subject to the network policy of the target actor,acting on behalf of a targetThe policy of the network of the user,representing the total number of agents;a behavior value function representing the network state of the critic;representing a behavior value function of a next time slot state of the target critic network;a current award;
the weight updating of the actor network adopts a gradient ascending method, and a gradient updating calculation formula is as follows:
in the formula:representing an agentThe policy function of the actor network may be,in addition toAverage value of other agent behaviors;representing the gradient of the network of actors,representing the expected value of the action taken when it complies with the actor's network policy,representing the partial derivation of a logarithmic function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211475230.4A CN115544899B (en) | 2022-11-23 | 2022-11-23 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211475230.4A CN115544899B (en) | 2022-11-23 | 2022-11-23 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115544899A CN115544899A (en) | 2022-12-30 |
CN115544899B true CN115544899B (en) | 2023-04-07 |
Family
ID=84721315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211475230.4A Active CN115544899B (en) | 2022-11-23 | 2022-11-23 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115544899B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116187208B (en) * | 2023-04-27 | 2023-08-01 | 深圳市广汇源环境水务有限公司 | Drainage basin water quantity and quality joint scheduling method based on constraint reinforcement learning |
CN116738874B (en) * | 2023-05-12 | 2024-01-23 | 珠江水利委员会珠江水利科学研究院 | Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning |
CN117588394B (en) * | 2024-01-18 | 2024-04-05 | 华土木(厦门)科技有限公司 | AIoT-based intelligent linkage control method and system for vacuum pump |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114279042A (en) * | 2021-12-27 | 2022-04-05 | 苏州科技大学 | Central air conditioner control method based on multi-agent deep reinforcement learning |
CN115289619A (en) * | 2022-07-28 | 2022-11-04 | 安徽大学 | Subway platform HVAC control method based on multi-agent deep reinforcement learning |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11657266B2 (en) * | 2018-11-16 | 2023-05-23 | Honda Motor Co., Ltd. | Cooperative multi-goal, multi-agent, multi-stage reinforcement learning |
US11586974B2 (en) * | 2018-09-14 | 2023-02-21 | Honda Motor Co., Ltd. | System and method for multi-agent reinforcement learning in a multi-agent environment |
US11295174B2 (en) * | 2018-11-05 | 2022-04-05 | Royal Bank Of Canada | Opponent modeling with asynchronous methods in deep RL |
CN109729528B (en) * | 2018-12-21 | 2020-08-18 | 北京邮电大学 | D2D resource allocation method based on multi-agent deep reinforcement learning |
CN110458443B (en) * | 2019-08-07 | 2022-08-16 | 南京邮电大学 | Smart home energy management method and system based on deep reinforcement learning |
CN111144793B (en) * | 2020-01-03 | 2022-06-14 | 南京邮电大学 | Commercial building HVAC control method based on multi-agent deep reinforcement learning |
CN112491818B (en) * | 2020-11-12 | 2023-02-03 | 南京邮电大学 | Power grid transmission line defense method based on multi-agent deep reinforcement learning |
CN112540535B (en) * | 2020-11-13 | 2022-08-30 | 南京邮电大学 | Office building thermal comfort control system and method based on deep reinforcement learning |
US20220230080A1 (en) * | 2021-01-20 | 2022-07-21 | Honda Motor Co., Ltd. | System and method for utilizing a recursive reasoning graph in multi-agent reinforcement learning |
CN114362187B (en) * | 2021-11-25 | 2022-12-09 | 南京邮电大学 | Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning |
CN114357569A (en) * | 2021-12-13 | 2022-04-15 | 南京邮电大学 | Commercial building HVAC control method and system based on evolution deep reinforcement learning |
CN114971819A (en) * | 2022-03-28 | 2022-08-30 | 东北大学 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
CN115291625A (en) * | 2022-07-15 | 2022-11-04 | 同济大学 | Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning |
-
2022
- 2022-11-23 CN CN202211475230.4A patent/CN115544899B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114279042A (en) * | 2021-12-27 | 2022-04-05 | 苏州科技大学 | Central air conditioner control method based on multi-agent deep reinforcement learning |
CN115289619A (en) * | 2022-07-28 | 2022-11-04 | 安徽大学 | Subway platform HVAC control method based on multi-agent deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN115544899A (en) | 2022-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115544899B (en) | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning | |
Huang | Enhancement of hydroelectric generation scheduling using ant colony system based optimization approaches | |
CN111242443B (en) | Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet | |
CN103729695A (en) | Short-term power load forecasting method based on particle swarm and BP neural network | |
WO2023070293A1 (en) | Long-term scheduling method for industrial byproduct gas system | |
CN116187601B (en) | Comprehensive energy system operation optimization method based on load prediction | |
CN112012875B (en) | Optimization method of PID control parameters of water turbine regulating system | |
Yang et al. | Optimal energy operation strategy for we-energy of energy internet based on hybrid reinforcement learning with human-in-the-loop | |
CN112460741A (en) | Control method of building heating, ventilation and air conditioning system | |
CN117057553A (en) | Deep reinforcement learning-based household energy demand response optimization method and system | |
CN115345380A (en) | New energy consumption electric power scheduling method based on artificial intelligence | |
CN113869795B (en) | Long-term scheduling method for industrial byproduct gas system | |
CN115986839A (en) | Intelligent scheduling method and system for wind-water-fire comprehensive energy system | |
CN114322208B (en) | Intelligent park air conditioner load regulation and control method and system based on deep reinforcement learning | |
CN116436033A (en) | Temperature control load frequency response control method based on user satisfaction and reinforcement learning | |
Yang et al. | Data-driven optimal dynamic dispatch for hydro-PV-PHS integrated power system using deep reinforcement learning approach | |
CN115411776B (en) | Thermoelectric collaborative scheduling method and device for residence comprehensive energy system | |
CN115526504A (en) | Energy-saving scheduling method and system for water supply system of pump station, electronic equipment and storage medium | |
CN114372645A (en) | Energy supply system optimization method and system based on multi-agent reinforcement learning | |
Salvador et al. | Historian data based predictive control of a water distribution network | |
CN115239133A (en) | Multi-heat-source heat supply system collaborative optimization scheduling method based on layered reinforcement learning | |
CN111275572B (en) | Unit scheduling system and method based on particle swarm and deep reinforcement learning | |
Cheng et al. | A cyber physical system model using genetic algorithm for actuators control | |
Xin et al. | Genetic based fuzzy Q-learning energy management for smart grid | |
Silva et al. | Framework for the development of a digital twin for solar water heating systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |