CN115185190A

CN115185190A - Urban drainage system control method and device based on multi-agent reinforcement learning

Info

Publication number: CN115185190A
Application number: CN202211106987.6A
Authority: CN
Inventors: 董欣; 王一茗; 徐智伟
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2022-10-14
Anticipated expiration: 2042-09-13
Also published as: CN115185190B

Abstract

The invention provides a city drainage system control method and device based on multi-agent reinforcement learning, wherein the method comprises the following steps: carrying out generalization processing on the urban drainage system, and creating an urban drainage system virtual model of a virtual digital object; constructing a neural network-based urban drainage system agent model according to the water quantity and water quality change process, the flow of key nodes and the liquid level of the urban drainage system as control targets; constructing a single intelligent agent consisting of a current network and a target network, identifying the linkage relation among the intelligent agents by adopting a value decomposition method, and constructing a plurality of intelligent agents; after the multi-agent is trained, the control strategy of the urban drainage system is evaluated and verified. By the urban drainage system control method and device based on multi-agent reinforcement learning, control efficiency and control effect of an actual drainage system are improved at the same time.

Description

Urban drainage system control method and device based on multi-agent reinforcement learning

Technical Field

The invention relates to the technical field of internet, in particular to a city drainage system control method and device based on multi-agent reinforcement learning.

Background

The urban drainage system (hereinafter referred to as system) comprises drainage pipelines, pump stations, storage tanks, gates and other accessory facilities, has the functions of collecting and discharging rainwater, sewage and the like generated in an urban range, and is an important infrastructure for ensuring the quality of urban water environment. The actual performance of the system is directly influenced by the management level, and the high-level management can effectively avoid the occurrence of adverse events (such as urban waterlogging, overflow and the like). However, the current management mode mainly depends on manual experience, and is difficult to deal with system operation pressure caused by frequent extreme rainfall and accelerated urbanization process.

Under the background, the control potential of the existing facilities can be fully excavated by adopting real-time control on the premise of not building other facilities, and the operation efficiency of the system is maximized. The real-time control refers to an operation technology for dynamically regulating and controlling controllable facilities (such as a pump station and a gate) in a system according to the system state (such as liquid level and flow) and the system input (such as current rainfall intensity and future forecast rainfall intensity) acquired in real time so as to realize a control target.

Reinforcement learning is an important algorithm closely related to an automatic control theory, and an optimal strategy is learned from interaction data samples by training an intelligent agent to continuously interact with the environment, so that the intelligent agent obtains the minimum global loss value in the interaction process. The method can obtain better control effect without accurate system modeling and has strong adaptability to environmental change, thereby being widely applied to various technical fields.

In the technical field of real-time control of urban drainage systems, the existing patents and documents focus on a real-time control method based on model predictive control. The invention discloses an intelligent drainage grading real-time control method based on model predictive control, and the prior patent 1 with the application number of CN202110858448.7 discloses an intelligent drainage grading real-time control method based on model predictive control, and relates to the technical field of drainage grading control. The method combines a drainage physical model and a data analysis deep learning model, and realizes automatic generation of a control strategy and issuing of a control instruction through an optimization algorithm.

The prior patent 2 with the application number of CN 2020105632919.7, entitled "a comprehensive urban drainage system combined dispatching real-time simulation control method and a system used by the same", discloses a combined dispatching real-time control method, which relates to the field of urban drainage system real-time control, in particular to a comprehensive urban drainage system combined dispatching real-time simulation control method and a system used by the same. The method comprises the steps of constructing an online mechanism model of a drainage system, then simplifying the online mechanism model, constructing an offline simulation generalized model, coupling the online mechanism model, the simplified online mechanism model and the offline simulation generalized model, and then calculating the optimal control value of a controlled position in real time by using an optimization algorithm.

The prior patent 3 entitled "drainage system control method based on robust reinforcement learning" with application number CN202110335721.8 discloses a drainage system control method based on robust reinforcement learning. The patent constructs a reinforcement learning environment to realize interaction between a control method and a drainage system, and introduces a conditional risk value function CVaR to improve the robustness of an algorithm framework aiming at the characteristic of large random disturbance of the drainage system.

In prior patent 1 and prior patent 2, the real-time control strategy of the drainage system is generated in an optimized manner, and the process is performed on line. The drainage system model needs to be called repeatedly during optimization, a large number of equations need to be solved in the simulation process of the drainage system model, and the complexity of the drainage system model needs to be limited in order to ensure that a strategy can be generated in real time. In the above patent, there is a contradiction between the simulation accuracy of the drainage system model and the strategy generation time, and the more the model is close to the real situation, the longer the simulation time is, and the more difficult it is to complete the strategy generation in real time. The linear simplification of prior patent 1 and prior patent 2 does not completely approximate the complex non-linear physical relationships in real drainage systems.

The intelligent agent is trained in the existing patent 3 in an off-line interaction mode of the intelligent agent and a drainage system model, the trained intelligent agent directly outputs a control strategy in the rainfall process according to the system state and input signals without additional optimization, and the requirement of real-time control on the strategy generation time can be met. However, for complex drainage systems, there are typically multiple controllable facilities, there is a linkage relationship between the facilities, and the control action space grows exponentially as controllable facilities increase. According to the prior patent 3, all controllable facilities are regulated and controlled by one intelligent body, the learning efficiency of the intelligent body is reduced due to the huge action space of the intelligent body, the difficulty of training convergence is greatly increased, and the popularization and application of the method are hindered.

Disclosure of Invention

In order to solve the existing technical problems, embodiments of the present invention provide a method and an apparatus for controlling an urban drainage system based on multi-agent reinforcement learning, an electronic device, and a computer-readable storage medium.

In a first aspect, an embodiment of the present invention provides a method for controlling an urban drainage system based on multi-agent reinforcement learning, including:

carrying out generalization processing on the urban drainage system, and creating an urban drainage system virtual model of a virtual digital object;

constructing a neural network-based urban drainage system proxy model according to the water quantity and water quality change process, the flow of key nodes and the liquid level of the urban drainage system as control targets;

constructing a single intelligent agent consisting of a current network and a target network, identifying linkage relation among the intelligent agents by adopting a value decomposition method, and constructing multiple intelligent agents;

and after the multi-agent is trained, evaluating and verifying the control strategy of the urban drainage system.

In a second aspect, an embodiment of the present invention provides a city drainage system control device based on multi-agent reinforcement learning, including:

the virtual model creating module is used for carrying out generalized processing on the urban drainage system and creating a virtual model of the urban drainage system of a virtual digital object;

the agent model creation module is used for constructing an urban drainage system agent model based on a neural network according to the water quantity and water quality change process, the flow of key nodes and the liquid level of the urban drainage system as control targets;

the intelligent agent constructing module is used for constructing a single intelligent agent consisting of a current network and a target network, identifying the linkage relation among the intelligent agents by adopting a value decomposition method and constructing a plurality of intelligent agents;

and the evaluation and verification module is used for evaluating and verifying the control strategy of the urban drainage system after the multi-agent is trained.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and executable on the processor, where the transceiver, the memory, and the processor are connected via the bus, and when the computer program is executed by the processor, the steps in the city drainage system control method based on multi-agent reinforcement learning as described above are implemented.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps in the multi-agent reinforcement learning-based urban drainage system control method as described above.

The method, the device, the electronic equipment and the computer readable storage medium provided by the embodiment of the invention solve the problem that the real-time control of the actual drainage system is difficult to realize both the control efficiency and the control effect, and realize the effect of simultaneously controlling the control efficiency and the control effect of the actual drainage system in real time; by offline interaction with the agent model of the urban drainage system and fusion of expert experience in the expert experience pool, the embodiment of the invention trains a multi-agent which can regulate and control a plurality of facilities according to the system state, and adopts a memory sharing and value decomposition algorithm to realize the joint regulation and joint control of a plurality of controllable facilities.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present invention, the drawings required to be used in the embodiments or the background art of the present invention will be described below.

FIG. 1 is a flow chart of a city drainage system control method based on multi-agent reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a flow chart showing the construction of a municipal drainage system in step S101 according to an embodiment of the invention;

FIG. 3 is a schematic diagram illustrating the structure of the proxy model in step S103 according to the embodiment of the present invention;

FIG. 4 is a diagram illustrating a training process of the agent in step S107 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram showing the structure of a city drainage system model in the Fuxing district according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a city drainage system control device based on multi-agent reinforcement learning according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device for controlling a municipal drainage system based on multi-agent reinforcement learning according to an embodiment of the present invention.

Detailed Description

In order to solve the defects in the prior art, the embodiment of the invention provides a multi-agent reinforcement learning-based urban drainage system control method, which integrates learning-based and experience-based urban drainage systems in real time control and solves the problem of low efficiency of an online control strategy caused by high complexity of an urban drainage system model.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as methods, apparatus, electronic devices, and computer-readable storage media. Thus, embodiments of the invention may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), a combination of hardware and software. Furthermore, in some embodiments, embodiments of the invention may also be implemented in the form of a computer program product in one or more computer-readable storage media having computer program code embodied in the storage medium.

The computer-readable storage media described above may take any combination of one or more computer-readable storage media. The computer-readable storage medium includes: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium include: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only Memory (ROM), an erasable programmable read-only Memory (EPROM), a Flash Memory (Flash Memory), an optical fiber, a compact disc read-only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any combination thereof. In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, device, or apparatus.

The computer program code embodied on the computer readable storage medium may be transmitted using any appropriate medium, including: wireless, wire, fiber optic cable, radio Frequency (RF), or any suitable combination thereof.

Computer program code for carrying out operations for embodiments of the present invention may be written in one or more programming languages, including an object oriented programming language such as: java, smalltalk, C + +, and also include conventional procedural programming languages, such as: c or a similar programming language. The computer program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be over any of a variety of networks, including: a Local Area Network (LAN) or a Wide Area Network (WAN), which may be connected to the user's computer, may be connected to an external computer.

Embodiments of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices and computer-readable storage media according to embodiments of the invention.

It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions. These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner. Thus, the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The embodiments of the present invention will be described below with reference to the drawings.

Fig. 1 shows a flowchart of a city drainage system control method based on multi-agent reinforcement learning according to an embodiment of the present invention. As shown in fig. 1, the method includes:

step S101: carrying out generalization processing on the urban drainage system, and creating an urban drainage system virtual model of a virtual digital object;

as shown in fig. 2, the virtual model (hereinafter referred to as a model) of the urban drainage system is to create a virtual digital object by using a hydrodynamics equation or a data driving method after highly generalizing an actual urban drainage system (hereinafter referred to as a system) for predicting a state (flow, liquid level, etc.) change process of a real system. The model of the embodiment of the invention adopts the SWMM software of a rainstorm flood management model developed by the United states environmental protection agency. When the model is constructed, basic information such as structure and attribute information of the urban drainage system, system input information, control facility information, control target information, online monitoring data and the like needs to be collected, objects such as nodes, pipelines, control facilities and the like are determined on the basis, the model is constructed, and parameters are calibrated by using the monitored liquid level and flow data, so that the model can reflect the operation condition of the system, as shown in fig. 2.

Step S103: constructing a neural network-based urban drainage system agent model according to the water quantity and water quality change process, the flow of key nodes and the liquid level of the urban drainage system as control targets;

for a complex system, the model constructed in step S101 has a problem of long calculation time, and to solve this problem, a neural network-based proxy model for an urban drainage system is constructed in the embodiment of the present invention, as shown in fig. 3. The urban drainage system agent model needs to quickly predict the water quantity and water quality change process of the system and can predict the flow, liquid level and other calculation control targets of key nodes (such as a pump station, a regulation and storage tank and a water inlet of a sewage plant). As the urban drainage system has obvious external influence factors and presents stronger seasonal and periodic rules, the embodiment of the invention adopts a Long Short-Term Memory neural network (LSTM) model to construct an urban drainage system agent model. As shown in FIG. 3, the input of the agent model is rainfall event characteristics, sewage generation characteristics, controllable facility (such as a water pump) state, and the output is the predicted value of the system state (such as the water collecting well liquid level) of the next control time step.

Step S105: in order to improve the learning effect of the intelligent agent, an expert experience pool is constructed, and the regulation and control experience accumulated in the system regulation and control process is stored;

empirical format of quadruplet

I.e. system status, action, reward, system status at the next time. Actions herein refer to actions that adjust the state of controllable facilities (e.g. changing the number of water pumps started by a pump station, adjusting the opening of a gate, etc.), rewards are reinforcement learning terms used to evaluate the profits of actions, the goal of an agent is to maximize the sum of rewards obtained in the whole control process, and rewards are usually weighted functions of environmental, economic and safety goals in system control.

Step S107: constructing a single intelligent agent consisting of a current network and a target network, identifying the linkage relation among the intelligent agents by adopting a value decomposition method, and constructing a plurality of intelligent agents;

the construction of the multi-agent comprises two steps of construction of a single agent and identification of linkage relation among the agents.

Step S1071: constructing a single intelligent agent: the single intelligent agent consists of 2 sets of neural networks such as a current network and a target network; current network

For selecting an action and updating the model parameters theta in real time, with inputs of the system state s and the action a taken, and outputs of the model parameters theta being approximate estimates of the state-action cost function; target network

The method is used for calculating the value Q of the value function, the network delays and updates the model parameters theta', namely, the model parameters are copied from the current network to the target network at intervals of fixed time step length, the calculation stability of the value function value is ensured, and the output and the input of the value function value are consistent with those of the current network.

Step S1073: and (3) identifying the linkage relationship between the intelligent agents: and identifying the linkage relation among the intelligent agents by adopting a value decomposition method, wherein the control targets of an upstream pump station, a peer pump station and a downstream pump station are also required to be considered by the reward function of the intelligent agent besides the control target of the corresponding pump station. The control target of each pump station comprises overflow risk, operation energy consumption and the frequency of starting and stopping the pump station. The reward function calculation formula for agent i is as follows:

（1）

（2）

in the formula (1), the first and second groups,

is a control objective function of a pumping station I in a drainage pipe network,

the control objective functions of the pump stations controlled by the upstream pump station, the peer pump station and the downstream pump station can be calculated by formula (2).

The weight coefficients of the influence of the upstream pump station, the peer pump station and the downstream pump station on the control behavior of the pump station can be determined according to the mutual influence degree of different pump stations in regulation and control.

、

、

Is the weight coefficient of the corresponding control target

And corresponding control target weight coefficients can be determined according to different system state stages so as to represent different control target key points.

The specific calculation method of the indexes such as overflow risk, operation energy consumption and pump station start-stop frequency and the like related to the formula (2) is as follows:

（3）

wherein the content of the first and second substances,

calculating a function of overflow risk indexes of the number i pump station in the drainage pipe network at the moment t;

the overflow risk coefficient of the No. i pump station zone is used for representing the sensitivity degree of different control units to overflow risks;

the real-time liquid level at the t moment of the water collecting well of the pump station I is obtained;

is the overflow risk threshold level for pump station number i. Wherein the content of the first and second substances,

the determination is required according to the simulation result of the urban drainage system agent model or the experience of the urban drainage system manager.

（4）

Wherein the content of the first and second substances,

the method is a calculation function of the operation energy consumption index of the pumping station I in the drainage pipe network at the moment t;

the operation energy consumption coefficient of the pump station I is used for representing the unit operation energy consumption of water pumps in the pump stations of different control units;

the number of the pump stations is the number of the pump stations which are started at the time t and the number i of the pump stations.

（5）

Wherein the content of the first and second substances,

pump station of i-number pump station in t-time drainage pipe networkCalculating a function of the start-stop frequency index;

the safety coefficient of the pump station I is used for representing the importance degree of water pumps in the pump stations of different control units;

the number of the pump stations started at the time t, i.

Step S109: the agent adopts DDQN algorithm to train;

as shown in FIG. 4, the agent is trained using a Double deep Q learning (DDQN) algorithm, which is as follows:

（6）

wherein the content of the first and second substances,αis the learning rate and gamma is a discount factor used to discount the current value of future rewards.

In addition, the DDQN algorithm can also adopt an empirical playback (Experience playback) mechanism to convert the system state, the action, the reward and the system state at the next moment

When the sample size stored in the experience pool is large enough, the parameters of the neural network can be updated by randomly selecting a batch of sample data every time, so that all historical data can be uniformly and sufficiently used.

In order to reflect the relationship of the upstream and downstream mutual influence of the system, the embodiment of the invention adopts a multi-agent reinforcement learning algorithm based on experience sharing. And constructing a global shared memory experience pool by using a shared memory algorithm for all the intelligent agent training. The construction method of the global shared memory experience pool is different from the construction method of the experience pool in thatIn local shared memory experience pools

The system state s in the quadruple contains the states of all control units and controllers, and the global shared memory experience pool will be used for parameter updates of all agents. The value decomposition is mainly realized by constructing the incidence relation among the multi-agent reward functions.

Step S111: and evaluating and verifying the control strategy of the urban drainage system.

For the evaluation and verification of the control strategy of the urban drainage system, a specific index system is adopted to carry out quantitative comparative analysis, so that the real-time control effect of the urban drainage system is objectively and accurately evaluated and verified. The control targets of the real-time control of the urban drainage system mainly comprise: and (4) evaluating environmental performance, economic performance, safety performance and the like.

For the evaluation index of the environmental performance, the overflow risk is mainly considered, and the overflow risk of the area is evaluated through the liquid level of the water outlet pump station water collecting well in each control unit in the urban drainage system. Theoretically, the higher the overall level of the control unit, the greater the potential for flooding of the area. And the overflow risk is mainly evaluated through the liquid level of the water collecting well of the water outlet pump station of each control unit.

According to the formula (3), the overflow Risk at a single moment can be calculated, and indexes such as the overflow Risk threshold standard Rate (QR) and the Average overflow Risk (AR) of rainfall events can be calculated on the basis, wherein the specific calculation formula is as follows:

（7）

（8）

wherein the content of the first and second substances,

、

respectively the overflow risk threshold standard-reaching rate of the pump station No. i and the average overflow risk of rainfall events,Tfor the total time of the rainfall event, the other variable meanings can refer to formula (3).

For the evaluation index of economic performance, the embodiment of the invention mainly evaluates the energy consumption of the area according to the running state of the water outlet pump station in each control unit in the urban drainage system. In theory, the energy consumption of each control unit is mainly concentrated on the power consumption of the water pump in the urban drainage system mainly based on water pump regulation. The power consumption of the water pump is highly related to the running state of the water pump, and the more the water pumps started by the pump station, the higher the energy consumption of the pump station. Therefore, the energy consumption is mainly evaluated through the running state of the water outlet pump station of each control unit.

According to the formula (4), the Pump station Energy consumption at a single moment can be calculated, and on the basis, indexes such as total open Pump times (SO) of rainfall events, average Energy consumption (AE) of rainfall events and the like can be calculated, wherein the specific calculation formula is as follows:

（9）

（10）

wherein, the first and the second end of the pipe are connected with each other,

、

the total pump opening times of the pump station I and the average energy consumption of rainfall events are respectively,

for the total time of the rainfall event, the other variable meanings can refer to formula (4).

For the evaluation index of the safety performance, the embodiment of the invention evaluates the safety performance of the area according to the frequency of starting and stopping the water pump of the water outlet pump station in each control unit in the urban drainage system. The urban drainage system mainly based on water pump regulation and control has the safety mainly depending on the safety of pump station equipment. The safety of the pump station equipment is highly related to the frequency of starting and stopping the water pump, and theoretically, the more frequent the starting and stopping of the water pump are, the shorter the service life of the water pump is, and the higher the failure rate is.

The evaluation of the water pump start-stop frequency is mainly realized by evaluating the start-stop frequency of the water outlet pump station of each control unit. The Pump station water Pump start-stop Frequency at a single moment can be calculated according to a formula (5), and indexes such as rainfall event Pump start-up stage Variance (VO), rainfall event Average start-stop Frequency (AF) and the like can be calculated on the basis, wherein the specific calculation formula is as follows:

（11）

（12）

、

respectively setting up the variance of the number of pump stations for the rainfall event of the pump station I and the average start-stop frequency of the rainfall event,

the total time of the rainfall event is the total time of the rainfall event,

the average number of pump stations of the pump station I in the rainfall event can be opened, and the other variable meanings can be referred to the formula (5).

The evaluation includes on-site monitoring evaluation and mathematical experiment evaluation.

As shown in fig. 5, the control object parcel of the embodiment is a fossa parcel in a central city district of suzhou city, the total area of the case area is about 46.1 km, and the length of the water pipeline is about 600km. Sewage and outside water are discharged into a sewage pipeline through a sewage discharge land and a water converging area, then enter a corresponding pump station area, and enter a Fuxing sewage treatment plant after being lifted by a multistage pump station. The whole area can be divided into 18 pump station district control units.

Firstly, a mechanism model of a control object area is constructed by adopting SWMM software. Through the calculation time test, the time consumption of 1-hour rainfall event is simulated and calculated by the drainage system mechanism model through the high-performance server and is about 0.5 hour, and the simulation calculation time is long. Therefore, the LSTM neural network is selected for constructing the urban drainage system agent model of the control object district, so that the time of model simulation calculation is reduced, and the efficiency of constructing the control object district is improved.

When the agent model is constructed, a set of neural network models is respectively constructed for 18 control units, each set of neural network model uses the system state of the whole system as input, and the liquid level value of a time step length in the water collecting well of the outlet pump of the control unit is predicted. The 18 sets of constructed neural network structures are the same, and the optimal network structure and parameters are selected through multiple optional selections according to the simulation test result. The constructed neural network structure can be divided into 5 layers in total, and comprises 2 LSTM layers and 3 fully-connected layers. The loss function used is the root Mean Square Error (MSE), the single Batch size (Batch size) is 64, and the maximum training algebra is 500. When the loss function of the 20 successive generations of the validation set does not decrease any more, the model is determined to have converged.

Due to the characteristics of the LSTM neural network, the input data needs to contain historical data of several time steps, and the output is data of 1 time step. In order to ensure the generalization simulation capability of the neural network, an L1 regularization coefficient is set on the final output layer, and the value is 0.01. And (3) constructing a training set, a verification set and a test set by historical monitoring data (including pump station flow and rainfall data), wherein the training set totals 20736 groups of data, the verification set totals 2592 groups of data, and the test set totals 2592 groups of data. The Nash efficiency coefficient of the neural network models of all the control units on the test set reaches 0.59-0.99, the average relative error of the liquid level prediction is 1.13-3.71%, and the precision requirement of model simulation is met.

And (3) putting the experience summarized in field operation into the experience pool, and constructing a multi-agent according to the method, wherein the system state of the agent comprises the water collecting well liquid levels of all control unit pump stations at the current moment, the sewage production amount of all control units, the action value of a time step on the control unit pump station and the rainfall prediction value of 6 time steps in the future, and the action value is output as the step value of the control unit at the next time step. In the objective function, the target function is,

、

、

the values of 0.8, 0.1 and 0.1 in rainy season, and the values of 0.1, 0.7 and 0.2 in dry season,

、

、

the values are 0.2, 0.1 and 0.3 respectively.

After training of the intelligent agent is completed, the intelligent agent is applied to field monitoring and verification, the verification conditions are 6-hour rainfall events of 2021, 5, 23, 6 to 12, and the rainfall is 13.2mm. Compared with the currently adopted control method, the urban drainage system control method based on multi-agent reinforcement learning provided by the embodiment of the invention can greatly reduce the average overflow risk, the reduction proportion reaches 94.9%, and the threshold liquid level standard reaching rate is increased from 86.50% to 94.83%. The average energy consumption is increased by about 15.8%, and the total pump starting times of all the pump stations are increased by 410 times, which is mainly caused by that the reinforcement learning control method needs to start the pump in advance to reserve a storage space before rainfall occurs or rainfall intensity is increased. The average start-stop frequency of the rainfall events is increased by about 18.3%, and the pump starting variance is reduced from 0.35 to 0.32. The comprehensive performance of the urban drainage system control method based on multi-agent reinforcement learning is proved to be superior to that of the control method adopted at present, and the effectiveness of the embodiment of the invention is demonstrated.

The urban drainage system control method based on multi-agent reinforcement learning of the embodiment of the invention can greatly reduce the average overflow risk, and the reduction proportion reaches 94.9%; the standard reaching rate of the threshold liquid level is greatly improved from 86.50 percent to 94.83 percent.

The urban drainage system control method based on multi-agent reinforcement learning solves the problem that real-time control of an actual drainage system is difficult to take control efficiency and control effect into consideration, and achieves simultaneous improvement of control efficiency and control effect of the actual drainage system.

According to the urban drainage system control method based on multi-agent reinforcement learning, through offline interaction with the urban drainage system agent model and fusion of expert experiences in the expert experience pool, the multi-agent capable of regulating and controlling multiple facilities according to the system state is trained, and joint regulation and joint control of multiple controllable facilities are achieved by adopting a memory sharing and value decomposition algorithm.

The method for controlling the urban drainage system based on multi-agent reinforcement learning according to the embodiment of the invention is described in detail with reference to fig. 1 to 5, and the device for controlling the urban drainage system based on multi-agent reinforcement learning according to the embodiment of the invention is described in detail with reference to fig. 6.

Fig. 6 shows a schematic structural diagram of a city drainage system control device based on multi-agent reinforcement learning according to an embodiment of the present invention. As shown in fig. 6, the urban drainage system control device based on multi-agent reinforcement learning comprises:

the virtual model creating module 10 is used for carrying out generalization treatment on the urban drainage system and creating a virtual model of the urban drainage system of a virtual digital object;

the agent model creation module 20 is used for constructing an agent model of the urban drainage system based on a neural network according to the water quantity and water quality change process, the flow rate of key nodes and the liquid level of the urban drainage system as control targets;

an agent construction module 30, configured to construct a single agent composed of a current network and a target network, identify a linkage relationship between the agents by using a value decomposition method, and construct a plurality of agents;

and the evaluation and verification module 40 is used for evaluating and verifying the control strategy of the urban drainage system after the multi-agent is trained.

In the embodiment of the present invention, optionally, the method further includes:

and the experience pool building module 50 is used for building an expert experience pool for storing the control experience accumulated in the control process of the system, wherein the experience format of the expert experience pool comprises quadruples of the current system state, the action, the reward and the system state at the next moment.

In the embodiment of the present invention, optionally, as shown in fig. 6, the agent building module 30 specifically includes:

the function determination submodule 31 is used for determining the reward function of the intelligent agent according to the control targets of the upstream pump station, the peer pump station and the downstream pump station;

the control targets comprise overflow risks, operation energy consumption and the frequency of starting and stopping the pump station.

In the embodiment of the present invention, optionally, the control targets of the urban drainage system include evaluation of environmental performance, evaluation of economic performance, and evaluation of safety performance:

evaluating the environmental performance, and evaluating the overflow risk of the region through the liquid level of a water outlet pump station water collecting well in each control unit in the urban drainage system;

evaluating economic performance, namely evaluating the energy consumption of the region according to the running state of a water outlet pump station in each control unit in the urban drainage system;

and evaluating the safety performance of the region according to the frequent starting and stopping degree of the water pumps of the water outlet pump stations in each control unit in the urban drainage system.

The urban drainage system control device based on multi-agent reinforcement learning can greatly reduce the average overflow risk, and the reduction proportion reaches 94.9%; the standard reaching rate of the threshold liquid level is greatly improved from 86.50 percent to 94.83 percent.

The urban drainage system control device based on multi-agent reinforcement learning solves the problem that real-time control of an actual drainage system is difficult to take control efficiency and control effect into consideration, and achieves simultaneous improvement of control efficiency and control effect of the actual drainage system.

The urban drainage system control device based on multi-agent reinforcement learning provided by the embodiment of the invention is used for training multi-agents capable of regulating and controlling multiple facilities according to the system state through offline interaction with the urban drainage system agent model and fusion of expert experiences in the expert experience pool, and the joint regulation and joint control of multiple controllable facilities is realized by adopting a memory sharing and value decomposition algorithm.

In addition, an embodiment of the present invention further provides an electronic device, which includes a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the transceiver, the memory, and the processor are connected via the bus, and when the computer program is executed by the processor, the processes of the embodiment of the urban drainage system control method based on multi-agent reinforcement learning are implemented, and the same technical effects can be achieved, and are not described herein again to avoid repetition.

Specifically, referring to fig. 7, an embodiment of the present invention further provides an electronic device, which includes a bus 71, a processor 72, a transceiver 73, a bus interface 74, a memory 75, and a user interface 76.

In an embodiment of the present invention, the electronic device further includes: a computer program stored on the memory 75 and executable on the processor 72, the computer program when executed by the processor 72 performing the steps of:

after the multi-agent is trained, the control strategy of the urban drainage system is evaluated and verified.

Optionally, the computer program when executed by the processor 72 may further implement the steps of:

and constructing an expert experience pool for storing the control experience accumulated in the control process of the system, wherein the experience format of the expert experience pool comprises four-tuple of the current system state, the action, the reward and the system state at the next moment.

determining a reward function of the intelligent agent according to control targets of an upstream pump station, a peer pump station and a downstream pump station;

the control targets of the urban drainage system comprise the evaluation of environmental performance, the evaluation of economic performance and the evaluation of safety performance:

the evaluation of the environmental performance evaluates the overflow risk of the area through the liquid level of a water collecting well of a water outlet pump station in each control unit in the urban drainage system;

evaluating economic performance and evaluating the energy consumption of the region according to the running state of a water outlet pump station in each control unit in the urban drainage system;

and evaluating the safety performance of the region according to the frequency of starting and stopping of the water pumps of the water outlet pump stations in each control unit in the urban drainage system.

A transceiver 73 for receiving and transmitting data under the control of the processor 72.

In FIG. 7, a bus architecture (represented by bus 71), bus 71 may include any number of interconnected buses and bridges, bus 71 connecting various circuits including one or more processors, represented by processor 72, and memory, represented by memory 75.

Bus 71 represents one or more of any of several types of bus structures, including a memory bus, and memory controller, a peripheral bus, an Accelerated Graphics Port (AGP), a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include: an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA), a Peripheral Component Interconnect (PCI) bus.

The processor 72 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits in hardware or instructions in software in a processor. The processor described above includes: general purpose processors, central Processing Units (CPUs), network Processors (NPs), digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), complex Programmable Logic Devices (CPLDs), programmable Logic Arrays (PLAs), micro Control Units (MCUs) or other Programmable Logic devices, discrete gates, transistor Logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. For example, the processor may be a single core processor or a multi-core processor, which may be integrated on a single chip or located on multiple different chips.

The processor 72 may be a microprocessor or any conventional processor. The steps of the method disclosed in connection with the embodiments of the present invention may be directly performed by a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software modules may be located in a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), a register, and other readable storage media known in the art. The readable storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The bus 71 may also connect various other circuits such as peripherals, voltage regulators, or power management circuits together, and a bus interface 74 provides an interface between the bus 71 and the transceiver 73, which are well known in the art. Therefore, the embodiments of the present invention will not be further described.

The transceiver 73 may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other devices over a transmission medium. For example: the transceiver 73 receives external data from other devices, and the transceiver 73 is used to transmit data processed by the processor 72 to other devices. Depending on the nature of the computer system, a user interface 76 may also be provided, such as: touch screen, physical keyboard, display, mouse, speaker, microphone, trackball, joystick, stylus.

It should be appreciated that in embodiments of the present invention, the memory 75 may further include memory remotely located from the processor 72, which may be connected to a server over a network. One or more portions of the aforementioned networks may be an ad hoc network (ad hoc network), an intranet (intranet), an extranet (extranet), a Virtual Private Network (VPN), a Local Area Network (LAN), a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), a Wireless Wide Area Network (WWAN), a Metropolitan Area Network (MAN), the Internet (Internet), a Public Switched Telephone Network (PSTN), a plain old telephone service network (POTS), a cellular telephone network, a wireless fidelity (Wi-Fi) network, and a combination of two or more of the aforementioned networks. For example, the cellular telephone network and the wireless network may be a global system for Mobile Communications (GSM) system, a Code Division Multiple Access (CDMA) system, a Worldwide Interoperability for Microwave Access (WiMAX) system, a General Packet Radio Service (GPRS) system, a Wideband Code Division Multiple Access (WCDMA) system, a Long Term Evolution (LTE) system, an LTE Frequency Division Duplex (FDD) system, an LTE Time Division Duplex (TDD) system, a long term evolution-advanced (LTE-a) system, a Universal Mobile Telecommunications (UMTS) system, an enhanced Mobile Broadband (eMBB) system, a mass Machine Type Communication (mtc) system, an Ultra Reliable Low Latency Communication (urrllc) system, or the like.

It will be appreciated that memory 75 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. Wherein the nonvolatile memory includes: read-Only Memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), or Flash Memory.

The volatile memory includes: random Access Memory (RAM), which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as: static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced Synchronous DRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 75 of the electronic device described in the embodiments of the present invention includes, but is not limited to, the above and any other suitable types of memory.

In an embodiment of the present invention, memory 75 stores the following elements of operating system 751 and application programs 752: an executable module, a data structure, or a subset thereof, or an expanded set thereof.

Specifically, the operating system 751 comprises various system programs, such as: a framework layer, a core library layer, a driver layer, etc. for implementing various basic services and processing hardware-based tasks. Applications 752 include various applications such as: media Player (Media Player), browser (Browser), for implementing various application services. A program implementing the method of an embodiment of the present invention may be included in the application 752. The application programs 752 include: applets, objects, components, logic, data structures, and other computer system executable instructions that perform particular tasks or implement particular abstract data types.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements each process of the above-mentioned urban drainage system control method based on multi-agent reinforcement learning, and can achieve the same technical effect, and is not described herein again to avoid repetition.

In particular, the computer program may, when executed by a processor, implement the steps of:

constructing a neural network-based urban drainage system agent model according to the water quantity and water quality change process, the flow of key nodes and the liquid level of the urban drainage system as control targets;

Optionally, the computer program when executed by the processor may further implement the steps of:

and constructing an expert experience pool for storing the control experience accumulated in the control process of the system, wherein the experience format of the expert experience pool comprises quadruples of the current system state, the action, the reward and the system state at the next moment.

the control targets comprise overflow risks, running energy consumption and the frequency of starting and stopping the pump station.

The computer-readable storage medium includes: permanent and non-permanent, removable and non-removable media may be tangible devices that retain and store instructions for use by an instruction execution apparatus. The computer-readable storage medium includes: electronic memory devices, magnetic memory devices, optical memory devices, electromagnetic memory devices, semiconductor memory devices, and any suitable combination of the foregoing. The computer-readable storage medium includes: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), non-volatile random access memory (NVRAM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic tape cartridge storage, magnetic tape disk storage or other magnetic storage devices, memory sticks, mechanically encoded devices (e.g., punched cards or raised structures in a groove having instructions recorded thereon), or any other non-transmission medium useful for storing information that may be accessed by a computing device. As defined in embodiments of the present invention, the computer-readable storage medium does not include transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses traveling through a fiber optic cable), or electrical signals transmitted through a wire.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed in the subject specification can be implemented as electronic hardware, computer software, or combinations of both, and that the elements and steps of the examples have been described in a functional generic sense in the foregoing description for the purpose of illustrating the interchangeability of hardware and software. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer program instructions. The computer program instructions comprise: assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as: smalltalk, C + + and procedural programming languages, such as: c or a similar programming language.

The procedures or functions according to the embodiments of the present invention are generated in whole or in part when the computer program instructions are loaded and executed on a computer, which may be through a computer, a special purpose computer, a computer network, or other editable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, such as: the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, twisted pair, fiber optic, digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy disk, magnetic tape), an optical medium (e.g., optical disk), or a semiconductor medium (e.g., solid State Drive (SSD)), among others. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing embodiments of the method of the present invention, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to solve the problem to be solved by the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be substantially or partially contributed by the prior art, or all or part of the technical solutions may be embodied in a software product stored in a storage medium and including instructions for causing a computer device (including a personal computer, a server, a data center, or other network devices) to execute all or part of the steps of the methods of the embodiments of the present invention. And the storage medium includes various media that can store the program code as listed in the foregoing.

The above description is only a specific implementation of the embodiments of the present invention, but the scope of the embodiments of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the embodiments of the present invention, and should be covered by the scope of the embodiments of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims

1. A city drainage system control method based on multi-agent reinforcement learning is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 1 or 2, wherein the step of identifying the linkage relationships between the agents using a value decomposition method comprises:

4. The method according to claim 1 or 2, wherein the control objectives of the municipal drainage system include evaluation of environmental performance, evaluation of economic performance, and evaluation of safety performance:

the assessment of the environmental performance is used for assessing the overflow risk of the area through the liquid level of a water collecting well of a water outlet pump station in each control unit in the urban drainage system;

the economic performance is evaluated by evaluating the energy consumption of the region according to the running state of a water outlet pump station in each control unit in the urban drainage system;

5. A city drainage system control device based on multi-agent reinforcement learning is characterized by comprising:

the virtual model creating module is used for carrying out generalized processing on the urban drainage system and creating a virtual model of the urban drainage system with a virtual digital object;

6. The apparatus of claim 5, further comprising:

the experience pool building module is used for building an expert experience pool for storing the control experience accumulated in the control process of the system, and the experience format of the expert experience pool comprises quadruples of the current system state, the action, the reward and the system state at the next moment.

7. The apparatus of claim 5 or 6, wherein the agent building module comprises:

the function determination submodule is used for determining a reward function of the intelligent agent according to control targets of an upstream pump station, a peer pump station and a downstream pump station;

8. The apparatus according to claim 5 or 6, wherein the control targets of the municipal drainage system include evaluation of environmental performance, evaluation of economic performance, and evaluation of safety performance:

the assessment of the economic performance assesses the energy consumption of the region according to the running state of a water outlet pump station in each control unit in the urban drainage system;

9. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, implements the steps in the multi-agent reinforcement learning-based urban drainage system control method according to any one of claims 1 to 4.