CN115544899B

CN115544899B - Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning

Info

Publication number: CN115544899B
Application number: CN202211475230.4A
Authority: CN
Inventors: 余亮; 檀洋阳; 李澳; 王冬生
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-04-07
Anticipated expiration: 2042-11-23
Also published as: CN115544899A

Abstract

The invention discloses a multi-agent deep reinforcement learning-based energy-saving scheduling method for a water intake pump station of a water plant, which comprises the following steps: (1) On the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, modeling the total energy consumption minimization problem of the water intake pump station of the water plant into a Markov game, and designing a corresponding environment state, behavior and reward function; (2) Constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network; (3) Training a deep reinforcement learning agent based on a scheduling environment model and a multi-agent actor-attention-critic reinforcement learning algorithm; (4) And deploying the intelligent agent strategy obtained by training into an actual system. Compared with the existing method, the method provided by the invention has stronger system safety maintenance capability, energy-saving potential (up to 12.8%) and universality.

Description

Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning

Technical Field

The invention relates to an energy-saving dispatching method for a water taking pump station of a water plant based on multi-agent deep reinforcement learning, and belongs to the crossing field of dispatching and artificial intelligence of the water taking pump station of the water plant.

Background

Water is the basis of sustainable development of economic society, but the comprehensive power consumption of water treatment plants is large due to the backward of processes, equipment or water treatment systems, wherein the power consumption of water taking pump stations accounts for the major part. The traditional pump station dispatching mainly depends on engineering experience to adjust the collocation of a pump set and adjust the frequency of a variable frequency pump. The regulation is qualitative regulation, the labor cost is high, the energy-saving level is unstable, and even the energy consumption is increased. In addition, the frequent opening and closing of the pump set can cause great impact on the pipeline, and the rapid change of the pressure can easily cause the water hammer phenomenon. Therefore, the energy consumption of the water taking pump station is reduced under the conditions of guaranteeing the water supply safety and meeting the water supply requirement, and the method has important significance for reducing the operation cost of a water plant, saving the urban power consumption and reducing the carbon dioxide emission.

Aiming at the energy-saving optimization scheduling research of a water intake pump station, a plurality of methods such as nonlinear programming, dynamic programming, genetic algorithm and the like are provided in the existing research. Although the above methods have certain advantages, these methods require knowledge of an explicit scheduling model of the water intake pumping station (e.g., an explicit relational expression between total energy consumption and water intake pumping station status and scheduling decisions). Since the performance of the water intake pumping station depends on many factors (such as internal parameters (lift, shaft power, motor speed, motor frequency, motor efficiency), external environments (such as water intake and water supply), and loss of liquid in the water pump due to limited blade number, friction, impact and leakage, etc.), it is very difficult to establish a water intake pumping station scheduling model which is accurate and easy to control. Furthermore, the research work considering the above method does not consider the loss of the water pump by frequently switching the pump group.

With the development of the internet of things technology and the artificial intelligence technology, a large amount of historical operating data of the water intake pump station of the water plant is easy to obtain and can be effectively utilized. For example: some work proposes a water intake pump station scheduling method based on data driving, such as a water intake pump station scheduling method combining a particle swarm algorithm and a support vector regression algorithm. However, this method requires prediction of water supply amount and the like for 24 hours in the future and rolling generation of a pump group schedule recommendation table, and thus is prone to introduce errors and to cause a large amount of calculation tasks. In addition, some of the work proposed a control method for a water distribution system based on reinforcement learning and deep reinforcement learning, using an algorithm comprising: q-learning, duel depth Q-network, near-end strategy gradient with knowledge assistance. Although the water pump station dispatching method based on reinforcement learning and deep reinforcement learning does not need to know a clear dispatching model of the water taking pump station, the water pump station dispatching method does not carry out research aiming at the energy saving problem of the water taking pump station, and a mode of controlling a water pump by a single intelligent body is adopted. When jointly considering the energy-saving scheduling of the fixed-frequency pump and the variable-frequency pump, the single intelligent body is directly adopted to control the water taking pump, so that the action space of the intelligent body is increased sharply, the learning efficiency is low, and the energy consumption is effectively saved on the premise of maintaining the safe work of the water taking pump and meeting the water supply requirement.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a water intake pump station energy-saving scheduling method for a water plant based on multi-agent deep reinforcement learning, and aims to reduce the energy consumption of the water intake pump station under the condition of maintaining the operation safety of a system. The method adopts the steps of carrying out low-dimensional discretization on the continuous motion space of the variable-frequency pump and controlling the water taking pump group by using a plurality of intelligent bodies. In order to realize the high-efficiency training of the multi-agent, a water intake pump station dispatching environment model (which is a black box model and does not need to know a white box model in the existing research) is constructed by using historical operating data, and a multi-agent actor-attention-critic reinforcement learning algorithm is adopted as a training algorithm, so that the water intake pump station energy-saving dispatching method with high expandability and high efficiency is finally obtained. The method does not need to predict any uncertain parameter or know a clear scheduling model of the water taking pump station, and has the advantages of low calculation complexity, obvious energy-saving effect and the like.

The invention discloses a water intaking pump station energy-saving scheduling method based on multi-agent deep reinforcement learning, which is characterized by comprising the following steps:

(1) On the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, modeling the total energy consumption minimization problem of the water taking pump station into a Markov game, designing the corresponding environmental state, behavior and reward function of the Markov game, and constructing a multi-agent about the water taking pump station system.

(2) And constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network.

(3) And carrying out deep reinforcement learning training on the multi-agent based on the water taking pump station dispatching environment model and the multi-agent actor-attention-critic reinforcement learning algorithm.

(4) And deploying the multi-agent strategy obtained by training into an actual water taking pump station system.

Further, the expression of the problem of minimizing the total energy consumption of the water intake pump station is as follows:

in the formula (I), the compound is shown in the specification,

is composed of

Time slot (

，

Representing the total number of the optimized time slots) total energy consumption of the water intake pump station;

is a desired operator whose operation is primarily directed to uncertainty parameters (such as water supply);

is composed of

The working frequency (aiming at a variable frequency pump) or the state (aiming at a fixed frequency pump) of the time slot water taking pump station;

is composed of

The liquid level of the time slot water storage tank,

and

the minimum and maximum liquid levels are within the safe range of the reservoir;

is composed of

Pressure intensity of main pipe of time slot water taking pump station，

Is composed of

The pressure intensity of the main pipe of the time slot water taking pump station,

for the highest manifold pressure difference within the safe range,

to indicate cutoff to

The time slot water taking pump station switches times in one day,

the maximum switching times of the water taking pump station in a safety range in one day are obtained.

Further, the environment state in the Markov game

The expression of (a) is as follows:

in the formula (I), the compound is shown in the specification,

，

taking 1,2, …,

，

to representThe number of water pumps that need to be controlled,

and the total number of agents in the markov game (each agent is responsible for controlling 1 water pump). Wherein:

is a multi-agent

The state of the environment of the time slot,

is shown as

A local observed state of an individual fixed or variable frequency pump agent,

is composed of

The relative time index of the current absolute time of the time slot within a day,

is composed of

The liquid level of the time slot reservoir,

is composed of

The pressure of the main pipe of the water intake pump station in the time slot,

is composed of

The time slot reservoir borrows water (i.e. the amount of water called into the reservoir from other waterworks),

is composed of

The timeslot reservoir water supply (i.e. the amount of water called out from the reservoir),

to be cut off to

The time slot water taking pump station switches times in one day,

as an agent

The controlled water pump is arranged at

Time slot switch state.

Further, behavior in the Markov game

The expression of (c) is as follows:

in the formula (I), the compound is shown in the specification,

indicating the number of water pumps that need to be controlled,

is an integer which is the number of the whole,

taking 1,2, …,

. Wherein when

When the temperature of the water is higher than the set temperature,

is less than

Integer of (2), agent

In order to provide a constant-frequency pump,

for constant frequency pumping in

On-off state of the time slot when

Intelligent body of time, timing frequency pump

The operation is closed, and the operation is carried out,

and starting the fixed-frequency pump intelligent body. When the temperature is higher than the set temperature

Time, intelligent agent

Is a variable-frequency pump and is characterized in that,

，

is a variable frequency pump

The increase or decrease of the frequency of the time slot,

indicating that the frequency pump is off,

and

respectively indicating frequency reduction of variable frequency pump

And increase

，

，

Indicating that the variable frequency pump frequency is unchanged.

Further, the expression of the winning function of the Markov game is as follows:

in the formula (I), the compound is shown in the specification,

is composed of

The end of the timeslot is used to control the reward received by the agent for each pump, where:

is composed of

The penalty cost associated with the time slot and the energy consumption of the water intake pumping station,

is composed of

The penalty cost associated with a time slot versus reservoir level violation of a safe range,

is composed of

The time slot is associated with a penalty cost which is violated from the pressure difference safety range of the water intake pump station main pipe,

is composed of

Penalty cost related to the combined switching cost of the time slot and the water intaking pumping station,

is composed of

And the penalty caused by the fact that the time slot and the water getting pump station are combined and switched for times which violate the safety range.

For liquid level violation of reservoirThe importance coefficient of the penalty incurred by the back safety margin versus the penalty cost associated with energy consumption,

the penalty incurred for a manifold pressure differential violation of the safety margin versus the penalty cost associated with energy consumption,

the penalty incurred for switching the pumping station relative to the penalty cost associated with energy consumption,

and the importance coefficient of punishment caused by the fact that the switching times of the water taking pumping station violate the safety range relative to the punishment cost related to energy consumption.

Further, the water intaking pump station scheduling environment model is constructed as follows:

in the formula (I), the compound is shown in the specification,

is composed of

The liquid level of the time slot water storage tank,

is composed of

is composed of

The time slot water taking pump station consumes energy,

is composed of

The water borrowing amount of the time slot water storage tank,

is composed of

The water supply amount of the time slot water storage tank,

for energy consumption prediction Long Short Term Memory (LSTM) networks trained using real historical operational data,

in order to utilize the liquid level prediction long-short term memory (LSTM) network obtained by real historical operation data training,

long-short term memory (LSTM) networks are predicted for manifold pressures trained using real historical operating data.

Further, the multi-agent for the water intake pumping station system comprises: the intelligent agent quantity equals with water pump quantity, and every water pump is controlled by 1 intelligent agent. Each agent contains 1 actor network, 1 target actor network, 1 critic network, 1 target critic network and 1 attention network. The actor network and target actor network of each agent are identical in structure, and the reviewer network and target reviewer network of each agent are identical in structure.

In particular, an agent

(i.e. with water pump)

Corresponding agent) is a multi-layer deep neural network, and the actor network input is

The network output of the actor is

The activation function adopted by the deep neural network hidden layer is a leakage rectification function, and the activation function adopted by the deep neural network output layer is a normalized exponential function. The critic network in each intelligent agent comprises 3 perceptron modules which are respectively a first perceptron module, a second perceptron module and a third perceptron module. Wherein: the input to the first perceptron module is a local observed state

The observation state code value is output after passing through the first sensor module

(ii) a The input to the second perceptron module is a local observed state

And behaviors

The output is a joint coded value of observed state and behavior

(ii) a The output of the second sensor module in the critic networks of all the agents is used as the input of the attention network; the attention network returns other agents to the current agent

Contribution of (2)

Said contribution

And the output of the first perceptron module

The output of the third perceptron module is a function of the behavior values of all the agents currently in use as input to the third perceptron module

，

A shared weight parameter representing all agent critic networks,

representing an agent

The multilayer perceptron of (1).

The attention network is internally provided with

A sub-network of the same structure, corresponding to

An individual agent; with sub-networks

For example, its input comprises the output of a second perceptron module in all agent critics' networks

Said sub-network

Output to all other agent pairs

Is a contribution value of

The contribution value is a weighted sum of outputs obtained after output values of the second perceptron modules in the critic networks of all other agents are sent to the single-layer perceptron through linear transformation, namely:

wherein: weighting coefficient

Reflect the intelligent agent

Second perceptron module output value in critic network

And other agents

Second perceptron module output value in critic network

The similarity between the two groups is similar to each other,

is a shared matrix of which the number of antennas is,

is the leak ReLU activation function.

，

And

are a shared matrix and are respectively paired

And

the linear transformation is carried out, and the linear transformation is carried out,

。

further, the deep reinforcement learning training process of the multi-agent comprises the following steps:

(1) And obtaining the current environment state according to the historical operating data of the water taking pump station.

(2) And the actor network of each agent outputs the current behavior of each water taking pump according to the current environment state.

(3) And according to the current environment state and the current behavior, utilizing a water taking pump station scheduling environment model to obtain the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot under the state and the behavior, and utilizing the information to reconstruct the environment state and the reward of the next time slot.

(4) And sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool.

(5) If the weight parameters of the deep neural network in the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, and updating the weight of the critic network by utilizing a multi-intelligent-body actor-attention-critic reinforcement learning algorithm and then updating the actor network.

Specifically, the critic network parameter update is performed according to the joint loss function minimization as follows:

wherein:

in order to be a function of the joint loss,

is a pool of experiences for storage

；

Which is representative of the desired operation,

representing a discount coefficient;

a vector of parameters representing all target actor networks, namely:

wherein:

representing an agent

Target actor network parameters of (1);

and

a shared weight parameter representing all agent reviewer networks and the target reviewer network,

is a temperature parameter that balances the maximum entropy and the maximum reward.

Representing the expected value of the chosen action when it complies with the target actor's network policy,

policies representing a network of target actors.

The network updating of the actor is carried out by adopting a gradient ascending method, and the specific gradient updating formula is as follows:

in the formula:

representing an agent

Policy function of actor network (i.e. from observed state)

To

The probability distribution mapping of (c),

in addition to

Average value of other agent behaviors.

Representing the gradient of the network of actors,

representing the expected value of the action taken when it complies with the actor's network policy,

express logarithmic functionAnd (5) calculating partial derivatives.

(6) And (3) judging whether the training process is finished or not after the weight parameters of the intelligent body deep neural network are updated, if not, skipping to the step (1), otherwise, terminating the training process, and using each actor network obtained by training as an optimal strategy (namely function mapping from a local observation state to a water pump control action) of the corresponding intelligent body for the control deployment of the actual water taking pump station.

Has the advantages that: compared with the prior art, the energy-saving dispatching method of the water intake pump station based on multi-agent deep reinforcement learning has the following beneficial effects that:

(1) Compared with scheduling methods based on nonlinear programming, dynamic programming and the like, the method disclosed by the invention does not need to know a clear dynamic model of the water taking pump station. Different from a scheduling method based on prediction, the multi-agent strategy obtained by the method only outputs the control decision of the water taking pump station according to the observation state of the current time slot, so that any uncertain parameter does not need to be predicted. In addition, the output process of the multi-agent strategy only relates to the forward conduction of the multi-layer deep neural network, and the execution time is in millisecond order, so that the computation complexity is extremely low. Therefore, the method has strong universality.

(2) Compared with a single-agent water pump scheduling method based on reinforcement learning, the method can utilize an attention mechanism among multiple agents to realize efficient coordinated scheduling among a plurality of water taking pumps; the energy consumption can be obviously reduced on the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump within a safety range. Therefore, the method of the invention has high efficiency.

Drawings

Fig. 1 is a flowchart of a water intake pumping station scheduling control method provided by the present invention.

FIG. 2 is a graph of the convergence of a training curve for an embodiment of the method of the present invention.

FIG. 3 is a graph comparing the average energy consumption of an embodiment of the method of the present invention with other embodiments.

FIG. 4 is a graph comparing the average liquid level crossing limits of an embodiment of the method of the present invention with other solutions.

Figure 5 is a graph comparing the mean manifold pressure overshoot for an embodiment of the method of the present invention with other embodiments.

Fig. 6 is a graph comparing the average water pump switching times of the embodiment of the method of the present invention with those of other schemes.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely for illustrating the technical solutions of the present invention more clearly, and the scope of the present invention should not be limited thereby.

As shown in fig. 1, a design flowchart of the energy-saving dispatching control method for a water intake pumping station based on multi-agent deep reinforcement learning provided by the invention comprises the following steps:

step 1: on the premise of maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safety range, the problem of minimizing the total energy consumption of the water taking pump station is converted into a Markov game, and a corresponding environment state, behavior and reward function are designed.

And 2, step: and constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network.

And step 3: deep reinforcement learning agents are trained based on a dispatch environment model and a multi-agent actor-attention-critic reinforcement learning algorithm.

And 4, step 4: and deploying the intelligent agent strategy obtained by training into an actual system.

In step 1, the behavior of the markov game includes that the reservoir liquid level, the manifold pressure difference and the water pump switching times are required to be maintained within a safe range again: frequency decision of the frequency pump and switch decision of the fixed frequency pump; the constraints to be considered are: the constraints related to the reservoir liquid level, the constraint related to the manifold pressure difference and the constraint related to the water pump switching times are as follows:

(1) Liquid level of reservoir

In a safe range, i.e.

Wherein:

and

respectively representing the upper limit and the lower limit of the safe liquid level of the reservoir.

(2) The manifold pressure difference being less than the upper limit of the pressure difference, i.e.

Wherein:

water pump with indication

The highest manifold pressure differential that can be accepted,

to represent

The manifold pressure at the time of day,

representing the manifold pressure at time t-1.

(3) Number of times of switching water pump

Less than the safe switching range, i.e.

Wherein:

indicating the highest number of handovers acceptable within a day.

In step 1 above, the markov game may be defined by a series of states, behaviors, state transition functions, and reward functions. In the Markov game, each agent maximizes its expected return (i.e., the expected value of the jackpot) based on the current state and the selected behavior. The environment state, the behavior and the reward function of the Markov game are respectively designed as follows:

(1) The environmental state.

Is a multi-agent

The environmental state of the time slot and the environmental state of the multi-agent are designed as follows:

。

time slot

Water pump frequency decision related intelligent agent

For locally observing states

Is shown, in which:

respectively represent:

is composed of

is composed of

The liquid level of the time slot water storage tank,

is composed of

is composed of

The water borrowing amount of the time slot water storage tank,

is composed of

The water supply amount of the time slot water storage tank,

to be cut off to

The switching frequency condition of the water getting pump station of the time slot in one day.

As an agent

In that

Time-slotted pump on-off conditions.

(2) And (6) behaviors.

For the behaviour of time slots

It is shown that,

。

(3) A reward function. First water taking pump station

The water pump is related to the intelligent body

For reward functions of time slots

It comprises 5 components:

1.

penalty associated with energy consumption of time slot water intaking pump station

；

2.

Penalty due to time slot reservoir level crossing

，

。

3.

Penalty due to time slot violating safety manifold pressure difference range

，

。

4.

Punishment caused by closing of water taking pump station for time slot switching

，

Water pump with indication

In a time slot

On and off state of when

Water pump with indication

In a time slot

Is turned off when

Water pump with indication

In a time slot

When the switch is turned on;

water pump with indication

In a time slot

On and off states of when

Water pump with indication

In a time slot

Is turned off when

Water pump with indication

In a time slot

And then is turned on.

5.

Punishment caused by safe switching range of time slot water taking pump station

。

In the formula:

importance of penalty for reservoir level violations of safety margin versus penalty cost associated with energy consumptionThe number of the first and second groups is,

In the step 2, the water intake pump station dispatching system aims to minimize the energy consumption of the water intake pump station on the premise of maintaining the liquid level of the reservoir, the pressure difference of the header pipe and the switching frequency of the water pump within a safe range. In order to establish a water intake pump station dispatching environment model, historical data and a long-short term memory (LSTM) network are adopted for construction. Specifically, the LSTM network outputs the energy consumption of the water taking pump station scheduling by inputting the state and the action of the water taking pump station

Liquid level

And manifold pressure

。

In step 3, a multi-agent actor-attention-critic reinforcement learning algorithm is used for training an optimal decision of the water taking pump station dispatching system for maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water taking pump stations within a safety range. The specific steps of training the deep reinforcement learning agent are as follows:

(1) Obtaining a current environment state according to historical operating data of a water taking pump station;

(2) The actor network of each agent outputs the current behavior of the water taking pump station according to the current environment state;

(3) According to the historical environment state and the current behavior, obtaining the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot by using a scheduling environment model under the state and the behavior, and reconstructing the environment state and the reward of the next time slot by using the information;

(4) Sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool;

(5) If the weight parameters of the deep neural network in the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, updating the weight of the deep neural network by using a multi-intelligent-body actor-attention-critic reinforcement learning algorithm, judging whether the training process is finished after the updating is finished, if not, skipping to the step (1), otherwise, terminating the training process. The actor network obtained after training is used for actual deployment as an optimal strategy (namely, function mapping from a local observation state to a water pump control action) of each agent.

Compared with the prior art, the embodiment of the invention can obtain the following beneficial effects:

1) The method provided by the invention has universality. The energy-saving dispatching method for the water intake pump station based on the multi-agent actor-attention-critic reinforcement learning algorithm is provided. Because the obtained intelligent agent strategy only obtains the control decision of each water intake pump according to the observation state of the current time slot, the method does not need to know the prior information or prediction uncertainty parameters (such as water supply quantity) of any uncertainty system parameters and the clear scheduling mechanism model of the water intake pump station;

2) The method provided by the invention has high efficiency. Compared with the existing scheduling method, the method can reduce the energy consumption by 12.8 percent while maintaining the liquid level of the water storage tank, the pressure difference of the main pipe and the switching times of the water pump in a safe range.

Fig. 2 is a graph showing the convergence of the training curve according to the embodiment of the present invention. As can be seen from the curves, the training reward generally shows a growing trend and gradually becomes stable.

Fig. 3 is a graph showing the energy consumption of the embodiment of the method of the present invention compared with other embodiments. The scheme is a real scheduling scheme of a water intake pump station. The water intake, water supply and water intake pump station parameter data used by the invention are all from the actual data of the water plant from 11/month 1 in 2020 to 4/month 30 in 2021. Compared with the first scheme, the method can save the average energy consumption by 12.8 percent.

FIG. 4 is a graph showing the comparison between the average liquid level of the example of the method of the present invention and the average liquid level of other solutions. Compared with the first proposal, the average liquid level threshold under the method is reduced by 66.2 percent.

Fig. 5 is a graph comparing the average manifold pressure difference of an embodiment of the method of the present invention with other embodiments. As can be seen from the figure: compared with the first scheme, the method has smaller pressure difference of the main pipe, and the pressure difference of the main pipe of the method is always within a safety range.

Fig. 6 is a graph comparing the average water pump switching times of the embodiment of the method of the present invention with those of other solutions. Compared with the first scheme, the method can reduce the pump switching times by 50%.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A multi-agent deep reinforcement learning-based energy-saving scheduling method for a water intake pump station of a water plant is characterized by comprising the following steps:

step 1: on the premise of maintaining the liquid level of a water storage tank, the pressure difference of a main pipe and the switching times of a water pump in a safety range, modeling the total energy consumption minimization problem of a water taking pump station into a Markov game, designing the corresponding environmental state, behavior and reward function of the Markov game, and constructing a multi-agent about the water taking pump station system;

and 2, step: constructing a water intake pump station dispatching environment model by using historical operation data and a long-term and short-term memory network;

and 3, step 3: carrying out deep reinforcement learning training on the multi-agent based on a water taking pump station scheduling environment model and a multi-agent actor-attention-critic reinforcement learning algorithm;

and 4, step 4: deploying the multi-agent strategy obtained by training into an actual water taking pump station system;

the expression of the problem of minimizing the total energy consumption of the water taking pump station is as follows:

in the formula (I), the compound is shown in the specification,

is composed of

The total energy consumption of the time slot water taking pump station,

，

representing the total number of the optimized time slots;

is a desired operator;

is composed of

The working frequency or the fixed frequency pump state of a variable frequency pump of a time slot water taking pump station;

is composed of

The liquid level of the time slot reservoir,

and

the lowest liquid level and the highest liquid level are in a safe range of the reservoir;

is composed of

is composed of

for the highest manifold pressure difference within the safe range,

to indicate cutoff to

The time slot water taking pump station switches times in one day,

the maximum switching times of the water taking pump station in a safety range in one day are set;

environment in the Markov gameStatus of state

The expression of (a) is as follows:

in the formula (I), the compound is shown in the specification,

，

taking 1,2, …,

，

indicating the number of water pumps that need to be controlled,

meanwhile, the total number of agents in the Markov game is also provided, and each agent is responsible for controlling 1 water taking pump; wherein:

for multiple agents in

The state of the environment of the time slot,

is shown as

A local observed state of an individual fixed frequency pump agent or variable frequency pump agent,

is composed of

is composed of

The liquid level of the time slot reservoir,

is composed of

is composed of

The water borrowing amount of the time slot water storage tank,

is composed of

The water supply amount of the time slot water storage tank,

to be cut off to

The time slot water taking pump station switches times in one day,

to make an intelligenceEnergy body

The water pump is controlled at

A time slot switch state;

behavior in the Markov game

The expression of (c) is as follows:

in the formula (I), the compound is shown in the specification,

indicating the number of water pumps that need to be controlled,

is an integer which is the number of the whole,

taking 1,2, …,

(ii) a Wherein when

When the temperature of the water is higher than the set temperature,

is less than

Integer of (2), agent

In order to provide a constant-frequency pump,

for constant frequency pumping at

On-off state of the time slot when

Intelligent body of time, timing frequency pump

The operation is closed, and the operation is carried out,

starting the intelligent constant-frequency pump body; when the temperature is higher than the set temperature

Time, intelligent agent

Is a variable-frequency pump, and is characterized in that,

，

for variable frequency pumps in

The increase or decrease of the frequency of the time slot,

indicating that the frequency pump is off and,

and

respectively indicating frequency reduction of variable frequency pump

And increase of

，

，

The frequency of the variable frequency pump is unchanged;

the rewarding function expression in the Markov game is as follows:

in the formula (I), the compound is shown in the specification,

is composed of

The end of the timeslot is used to control the reward received by the agent of each pump, wherein:

is composed of

is composed of

is composed of

is composed of

is composed of

The time slot and the water taking pump station combined switching times violate the punishment caused by the safety range;

an importance coefficient for penalties resulting from reservoir level violations of safety limits versus energy consumption related penalty costs,

the importance coefficient of punishment caused by the fact that the switching times of the water taking pump station violate the safety range relative to the punishment cost related to energy consumption;

the multi-agent about water intaking pump station system includes: the number of the intelligent agents is equal to that of the water pumps, and each water pump is controlled by 1 intelligent agent; each intelligent agent internally comprises 1 actor network, 1 target actor network, 1 critic network, 1 target critic network and 1 attention network; the actor network and the target actor network of each intelligent body have the same structure, and the reviewer network and the target reviewer network of each intelligent body have the same structure;

intelligent agent

The actor network inputs are

The network output of the actor is

(ii) a The critic network in each intelligent agent comprises 3 perceptron modules which are respectively a first perceptron module, a second perceptron module and a third perceptron module; wherein: the input to the first perceptron module is a local observed state

(ii) a The input to the second perceptron module is a local observed state

And behaviors

The output is a joint coded value of observed state and behavior

(ii) a The output of the second sensor module in the critic networks of all the agents is used as the input of the attention network; attention network returns other agents to current agent

Contribution of (2)

Said contribution

And the output of the first perceptron module

The output of the third perceptron module is a function of the state behavior values of all agents present as input to the third perceptron module

，

A shared weight parameter representing all agent critic networks,

representing an agent

The multilayer perceptron of (1); the attention network is internally provided with

A sub-network of the same structure, corresponding to

An individual agent; sub-networks

The input comprises the output of the second perceptron module in all the intelligent agent criticizing network

Sub-networks

Output to all other agent pairs

Is a contribution value of

(ii) a The contribution value

The weighted sum of the output values of the second perceptron module in the critic networks of all other agents after being linearly transformed and sent into the single-layer perceptron is as follows:

wherein: weighting coefficient

Reflect the intelligent agent

Second perceptron module output value in critic network

And other agents

The output value of the second perceptron module in the critic network

The similarity between the two groups is similar to each other,

is a shared matrix of which the number of channels is,

is the leak ReLU activation function.

2. The energy-saving dispatching method for the water intake pumping station of the water plant based on the multi-agent deep reinforcement learning of claim 1, wherein the dispatching environment model of the water intake pumping station is constructed as follows:

in the formula (I), the compound is shown in the specification,

，

is composed of

The liquid level of the time slot water storage tank,

is composed of

is composed of

The time slot water taking pump station consumes energy,

is composed of

The water borrowing amount of the time slot water storage tank,

is composed of

The water supply amount of the time slot water storage tank,

in order to predict the long-term and short-term memory network by using the energy consumption obtained by training the real historical operation data,

in order to use the liquid level prediction long-term and short-term memory network obtained by real historical operation data training,

and predicting the long-term and short-term memory network for the manifold pressure obtained by training by utilizing real historical operation data.

3. The multi-agent deep reinforcement learning-based energy-saving dispatching method for the water intake pump station of the water plant as claimed in claim 1, wherein the deep reinforcement learning training process of the multi-agent comprises the following steps:

step 4.1: obtaining a current environment state according to historical operating data of a water taking pump station;

and 4.2: the actor network of each agent outputs the current behavior of each water taking pump according to the current environment state;

step 4.3: according to the current environment state and the current behavior, the energy consumption, the liquid level of the next time slot and the total pipe pressure of the next time slot are obtained by utilizing a water taking pump station scheduling environment model under the state and the behavior, and the environment state and the reward of the next time slot are reconstructed by utilizing the information;

step 4.4: sending the current environment state, the current behavior, the next time slot environment state and the next time slot reward to an experience pool;

step 4.5: if the weight parameters of the deep neural network of the actor network inside the intelligent body need to be updated, extracting a small batch of training samples from the experience pool, firstly updating the weight of the critic network by utilizing a multi-intelligent-actor-attention-critic reinforcement learning algorithm, and then updating the actor network;

step 4.6: and (3) judging whether the training process is finished or not after the weight parameters of the deep neural network of the intelligent body are updated, if not, skipping to the step 4.1, otherwise, terminating the training process, and using each actor network obtained by training as an optimal strategy of the corresponding intelligent body for the control deployment of the actual water taking pump station.

4. The multi-agent deep reinforcement learning-based energy-saving scheduling method for water intake pumping stations in water plants as claimed in claim 3, wherein a joint loss function minimization method is adopted for weight update of a critic network, and the calculation formula of the joint loss function is as follows: