CN116819974B

CN116819974B - Intelligent drainage method and system for tail end of drainage pipe network based on deep reinforcement learning

Info

Publication number: CN116819974B
Application number: CN202311102920.XA
Authority: CN
Inventors: 袁冬海; 李雷; 王旻昊; 王辉; 申宇洋; 王家卓; 寇莹莹
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-11-03
Anticipated expiration: 2043-08-30
Also published as: CN116819974A

Abstract

The invention relates to the technical field of urban pipe network drainage overflow monitoring, in particular to an intelligent drainage method and system for the tail end of a drainage pipe network based on deep reinforcement learning, wherein the method comprises the following steps: s1, a water quality acquisition terminal is arranged at a drainage port at the tail end of a drainage pipe network in advance, and real-time water quality data are acquired; s2, analyzing the collected real-time water quality data based on a pre-trained DQN model, and controlling a gate at the discharge port to execute opening or closing actions according to an analysis result; and S3, visually displaying the real-time water quality data and the gate state. According to the invention, the opening and closing of the sewage interception gate can be automatically adjusted in real time according to the sewage state, so that the manpower consumption can be reduced, and the real-time state of the discharge opening is provided for the manager.

Description

Intelligent drainage method and system for tail end of drainage pipe network based on deep reinforcement learning

Technical Field

The invention relates to the technical field of urban pipe network drainage overflow monitoring, in particular to an intelligent drainage method and system for the tail end of a drainage pipe network based on deep reinforcement learning.

Background

At present, a closure mode is adopted for the urban drainage pipe network terminal drainage treatment technology, but the sewage closure multiple of the traditional closure type confluence pipe network is only 1, the design is carried out according to 2 times of the dry season second flow, the closure of the confluence sewage with serious rain pollution cannot be carried out, and the design flow of the sewage closure pipe is far less than the peak flow of the rain. The traditional interception mode can only control the total annual runoff pollution amount, and is difficult to control the pollutants in each field runoff, especially when the rainfall intensity is large and the rainfall is small, the pollutants overflow seriously. The sewage interception mode needs manual operation, the manual operation is not timely and easy to cause the overflow of pollutants, and the sewage interception mode consumes manpower and has a certain danger in the river operation in rainy seasons.

Disclosure of Invention

In view of the above, the invention provides a deep reinforcement learning-based intelligent drainage method and system for the tail end of a drainage pipe network, which can automatically adjust the opening and closing of a sewage interception gate in real time according to the sewage state, can reduce the manpower consumption and provide a real-time drainage state for management staff.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, the present invention provides a deep reinforcement learning-based intelligent drainage method for a drainage pipe network end, including the following steps:

s1, a water quality acquisition terminal is arranged at a drainage port at the tail end of a drainage pipe network in advance, and real-time water quality data are acquired;

s2, analyzing the collected real-time water quality data based on a pre-trained DQN model, and controlling a gate at the discharge port to execute opening or closing actions according to an analysis result;

and S3, visually displaying the real-time water quality data and the gate state.

Further, the water quality data includes: ammonia nitrogen content, total phosphorus content, COD data and TDS data.

Further, in S2, the training process for the DQN model includes:

s201: initializing the weight theta, iteration times threshold, experience pool and iteration times of a Q neural network of the DQN model;

s202: reading current water quality data, including: the data of ammonia nitrogen, total phosphorus, COD and TDS are established, and the initial state value s of the tail end discharge of the current drainage pipe network is established _t ；

S203: for the current state value s _t Judging, wherein the judging mode is as follows: is thatIf any index is not less than 90% of the standard value of the index emission, continuing to step S204 if any index is not less than 90% of the standard value of the index emission, and repeating steps S202-S203 if no index is not;

s204: calculating a comprehensive pollution index, and if the comprehensive pollution index is higher than or equal to the emission standard, taking the Q value under the index as an optimal threshold for controlling the opening of the gate and controlling the gate action a _t For on, continue step S205; if the comprehensive pollution index is lower than the emission standard, controlling the gate action a _t Closing and continuing the steps S202-S204;

s205: collecting water quality data of the tail end discharge port of the water discharge pipe network at the next time step to obtain a state s _t+1 Feedback value r _t Will(s) _t , a _t , r _t , s _t+1 ) Storing into an experience pool;

s206: judging whether the experience pool is full, if not, repeating S202-S205; if the training is full, the Q neural network is trained repeatedly, S202-S206 are executed repeatedly, and the weight theta of the Q neural network is updated until a preset training target is met.

Further, in S206, the weight θ is updated by using a gradient descent method for the loss function according to the Q value calculated by the DQN algorithm;

the loss function is: l (L) _i (θ _i )＝E _{s,a～ρ(s,a)} [(y _i -Q(s,a;θ _i )) ² ]；

Wherein Q (s, a; θ) _i ) An estimated value representing Q (s, a); e represents the desire; y is _i Represents Q (s, a); i is the number of iterations; θ _i Representing the weight of the Q neural network under the ith iteration; s represents the current water quality state; a represents a current command for controlling the opening of the shutter.

Further, in S206, Q (S, a) obtained from the Bellman equation is denoted as y _i ，y _i The specific calculation formula of (2) is as follows: y is _i ＝E _s ′[r+γmax _a ′Q _i (s′ , a′ ; θ _i - ₁ )|s,a]；

Wherein E is _s ' indicating the desire for the current water quality status; r represents the value obtained after performing action a;max _a ' expressed in all actions _a ' the maximum Q value in; q (Q) _i (s′ , a′ ; θ _i -1) represents the Q value of each action a 'in the next state s' after execution of action a; the expression is under (s, a); gamma represents the attenuation coefficient; s' represents the state of water quality after the gate is opened; a' represents the action threshold for the opening of the next control gate.

Further, in S204, an optimal threshold selection function is defined as Q (S, a), and Q (S, a) is calculated by the function as follows: q (s, a) =max pi E [ r ] _t |s _t ＝s，a _t ＝a|π]；

Wherein E represents a desire; s represents the state of water quality; a is an instruction for controlling the opening of the gate in the state; pi represents action and state mapping; s is(s) _t The water quality state is the water quality state at the time step t; a, a _t Is s _t Transmitting a gate opening instruction in a state; r is (r) _t Is the water quality state s _t And transmitting a feedback value obtained by the gate opening command a.

Further, in S204, a gate opening instruction with the largest Q value is selected according to the epsilon-greedy rule; the epsilon-greedy rule is to select the action with the largest Q value according to the probability of 1-epsilon, randomly select the action according to the probability of epsilon, and randomly explore an unknown state space.

Further, in S205, when the water quality state is S, a gate opening command a is transmitted to obtain a feedback value r _t Feedback value r _t The calculation formula of (2) is as follows:

；

wherein t represents the current time step; t' represents the time step of opening the gate; mu (mu) ^t’-t A discount factor representing a time step from when the shutter is opened to a current time step; r is (r) _t’ A prize value representing a step in time when the gate is open; r is (r) _t The feedback value is the t time step; mu represents the discount factor.

In a second aspect, the present invention provides a drainage network terminal intelligent drainage method system based on deep reinforcement learning, including: the intelligent water quality monitoring system comprises a water quality acquisition terminal, an intelligent drainage device, a remote control terminal and a visual platform;

the water quality acquisition terminal is used for acquiring real-time water quality data;

the remote control terminal is used for analyzing the collected real-time water quality data based on a pre-trained DQN model and controlling a gate of the intelligent drainage device to execute opening or closing actions according to analysis results;

the visual platform is used for visually displaying real-time water quality data and gate states.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, the intelligent drainage port is arranged at the tail end of the urban drainage pipe network, and the water quality acquisition terminal is arranged to acquire the water quality state at the drainage port in real time, the real-time water quality state is analyzed and judged according to the pre-trained DQN (deep reinforcement learning) model, whether the current water quality meets the emission standard is judged, the remote control gate is in a closed or open state, and the real-time water quality data and the gate state are visually displayed. The whole process does not need to participate in a human field, and the opening and closing of the sewage interception gate can be automatically adjusted in real time according to the sewage state, so that the manpower consumption is reduced, and the danger is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an intelligent drainage method at the tail end of a drainage pipe network based on deep reinforcement learning;

fig. 2 is a schematic structural diagram of an intelligent drainage system at the tail end of a drainage pipe network based on deep reinforcement learning.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the embodiment of the invention discloses an intelligent drainage method for the tail end of a drainage pipe network based on deep reinforcement learning, which comprises the following steps:

s1, a water quality acquisition terminal is arranged at a drainage port at the tail end of a drainage pipe network in advance, and real-time water quality data are acquired; the water quality data includes: ammonia nitrogen content, total phosphorus content, COD data and TDS data;

s2, analyzing the collected real-time water quality data based on a pre-trained DQN model, and controlling a gate at the discharge port to execute opening or closing actions according to an analysis result; because the corresponding representative index exists at each water outlet, if the representative index is more than or equal to 90% of the index emission standard, the gate is opened, water flows to the sewage pipeline to the sewage treatment plant, and if the representative index is less than 90% of the index emission standard, the water is judged to reach the standard, the gate is closed, and the water flows to the river channel.

In a specific embodiment, in S2, the training process for the DQN model includes:

s201: initializing the weight theta, the iteration number threshold, the experience pool and the iteration number of the Q neural network of the DQN model.

S202: reading current water quality data, including: the data of ammonia nitrogen, total phosphorus, COD and TDS are established, and the initial state value s of the tail end discharge of the current drainage pipe network is established _t 。

S203: for the current state value s _t Judging, wherein the judging mode is as follows: if any index is greater than or equal to 90% of the standard value, continuing step S204 if any index is greater than or equal to 90% of the standard value, otherwiseAnd repeating the steps S202-S203.

S204: calculating a comprehensive pollution index, and if the comprehensive pollution index is higher than or equal to the emission standard, taking the Q value under the index as an optimal threshold for controlling the opening of the gate and controlling the gate action a _t For on, continue step S205; if the comprehensive pollution index is lower than the emission standard, controlling the gate action a _t And closing, and continuing steps S202-S204.

Because the sewage quality at each water outlet is different, typical pollutants are different, and an index with the highest linear correlation with the comprehensive pollution index is iterated from the four water quality data indexes according to the DQN algorithm to serve as a representative index of the corresponding water outlet, and the representative index at each water outlet can be different.

In this step, an optimal threshold selection function is defined as Q (s, a), and the Q value is calculated from this function, and the formula of Q (s, a) is: q (s, a) =max pi E [ r ] _t |s _t ＝s，a _t ＝a|π]；

The equation represents the largest expected cumulative expected value for state s action a in all policies pi, E represents the expected; s represents the state of water quality; a is an instruction for controlling the opening of the gate in the state; pi represents action and state mapping; s is(s) _t The water quality state is the water quality state at the time step t; a, a _t Is s _t Transmitting a gate opening instruction in a state; r is (r) _t Is the water quality state s _t And transmitting a feedback value obtained by the gate opening command a.

Selecting a gate opening instruction with the maximum Q value according to an epsilon-greedy rule; the epsilon-greedy rule is to select the action with the largest Q value according to the probability of 1-epsilon, randomly select the action according to the probability of epsilon, and randomly explore an unknown state space.

S205: collecting water quality data of the tail end discharge port of the water discharge pipe network at the next time step to obtain a state s _t+1 Feedback value r _t Will(s) _t , a _t , r _t , s _t+1 ) And storing into an experience pool.

In the step, when the water quality state is s, a gate opening instruction a is transmitted to obtain a feedback value r _t Feedback value r _t The calculation formula of (2) is as follows:

；

S206: judging whether the experience pool is full, if not, repeating S202-S205, and continuously collecting samples; if the training is full, training the Q neural network, repeatedly executing S202-S206, and updating the weight theta of the Q neural network until the preset training target is met.

In this step, the weight θ is updated by using a gradient descent method for the loss function according to the Q value calculated by the DQN algorithm.

Wherein Q (s, a; θ) _i ) Representing an estimate of the Q (s, a) neural network; e (E) _{s,a～ρ(s,a)} The subscripts s, a- ρ (s, a) represent the probability distribution between the water quality state s and the gate execution action a; y is _i Representing the Q value obtained by the ith iteration; i is the number of iterations; θ _i Representing the weight of the Q neural network under the ith iteration; s represents the current water quality state; a represents a current command for controlling the opening of the shutter.

Q (s, a) derived from the Bellman equation is denoted as y _i ；

The Bellman equation is: v(s) =max _a (R(s,a)+γV(s′))；

Wherein R is a reward function; s is the water quality state at a specific time point; a is the action taken after the current state is calculated; v (s') is the discount cost function for the subsequent state; gamma is the attenuation coefficient; s' represents the subsequent water quality status; v(s) represents the cost function in the state of s at a particular point in time.

y _i The specific calculation formula of (2) is as follows: y is _i ＝E _s ′[r+γmax _a ′Q _i (s′ , a′ ; θ _i - ₁ )|s,a]；

Wherein E is ₃ ' indicating the desire for the current water quality status; r represents the value obtained after performing action a; max (max) _a ' expressed in all actions _a ' the maximum Q value in; q (Q) _i (s′ , a′ ; θ _i -1) represents the Q value of each action a 'in the next state s' after execution of action a; the expression is under (s, a); gamma represents the attenuation coefficient; s' represents the state of water quality after the gate is opened; a' represents the action threshold for the opening of the next control gate.

In other embodiments, as shown in fig. 2, the present invention further provides an intelligent drainage system at the end of a drainage pipe network based on deep reinforcement learning, including: the intelligent water quality monitoring system comprises a water quality acquisition terminal, an intelligent drainage device, a remote control terminal and a visual platform;

the remote control terminal is used for analyzing the collected real-time water quality data based on a pre-trained DQN model and controlling a gate of the intelligent drainage device to execute opening or closing actions according to an analysis result;

Specifically, the remote control terminal comprises a platform communication unit, a data processing unit, a data experience pool, a model server, a PLC controller and a gate starter; the water quality acquisition terminal and the platform communication unit are communicated through the RTU remote terminal, water quality data are transmitted to the data processing unit to carry out operations such as duplicate removal and screening, then the model server is utilized to train the DQN model, and a gate opening and closing instruction is sent to the PLC according to a judging result of the model, and the PLC controls the opening and closing state of the gate.

In addition, the water quality detection terminal is also integrated with a liquid level sensor for realizing liquid level acquisition at the tail end discharge port of the drainage pipe network, and simultaneously, the real-time power consumption of the water quality detection terminal can be monitored.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The intelligent drainage method for the tail end of the drainage pipe network based on deep reinforcement learning is characterized by comprising the following steps of:

s2, analyzing the collected real-time water quality data based on a pre-trained DQN model, and controlling a gate at the discharge port to execute opening or closing actions according to an analysis result; each water outlet is provided with a corresponding representative index, if the representative index is more than or equal to 90% of the index emission standard, the gate is opened, and if the representative index is less than 90% of the index emission standard, the water is judged to reach the standard, and the gate is closed;

s3, visually displaying the real-time water quality data and the gate state;

in S2, the training process for the DQN model includes:

s202: reading current water quality data, including: ammonia nitrogen, total phosphorus, COD and TDS numberAccording to the initial state value s of the end discharge port of the current drainage pipe network is established _t ；

S203: for the current state value s _t Judging, wherein the judging mode is as follows: if any index is more than or equal to 90% of the standard value of the index emission, continuing to step S204 if any index is more than or equal to 90% of the standard value of the index emission, and repeating steps S202-S203 if no index is more than or equal to 90% of the standard value of the index emission;

iterating an index with the highest linear correlation with the comprehensive pollution index from the four water quality data indexes according to the DQN algorithm to serve as a representative index of the corresponding water outlet;

2. The deep reinforcement learning-based intelligent drainage method at the tail end of a drainage pipe network according to claim 1, wherein the water quality data comprises: ammonia nitrogen content, total phosphorus content, COD data and TDS data.

3. The deep reinforcement learning-based intelligent drainage method at the end of a drainage pipe network according to claim 1, wherein in S206, the weight θ is updated by using a gradient descent method on the loss function according to the Q value calculated by the DQN algorithm;

Wherein Q (s, a; θ) _i ) An estimated value representing Q (s, a); e (E) _{s,a～ρ(s,a)} The subscripts s, a- ρ (s, a) represent the probability distribution between the water quality state s and the gate execution action a; y is _i Represents Q (s, a); i is the number of iterations; θ _i Representing the weight of the Q neural network under the ith iteration; s represents the current water quality state; a represents a current command for controlling the opening of the shutter.

4. The intelligent drainage method based on deep reinforcement learning of the drainage network end of claim 3, wherein in S206, Q (S, a) obtained according to Bellman equation is denoted as y _i ，y _i The specific calculation formula of (2) is as follows: y is _i ＝E _s ′[r+γmax _a ′Q _i (s′ , a′ ; θ _i - ₁ )|s,a]；

Wherein E is _s ' indicating the desire for the current water quality status; r represents the value obtained after performing action a; max (max) _a ' expressed in all actions _a ' the maximum Q value in; q (Q) _i (s′ , a′ ; θ _i -1) represents the Q value of each action a 'in the next state s' after execution of action a; the expression is under (s, a); gamma represents the attenuation coefficient; s' represents the state of water quality after the gate is opened; a' represents the action threshold for the opening of the next control gate.

5. The intelligent drainage method of a drainage network terminal based on deep reinforcement learning according to claim 1, wherein in S204, an optimal threshold selection function is defined as Q (S, a), and the formula of Q (S, a) is as follows: q (s, a) =max pi E [ r ] _t |s _t ＝s，a _t ＝a|π]；

Wherein E represents a desire; s represents the state of water quality; a is an instruction for controlling the opening of the gate in the state; pi represents action and state mapping; s is(s) _t The water quality state is the water quality state at the time step t; a, a _t Is s _t In the state of firing gate openingAn instruction; r is (r) _t Is the water quality state s _t And transmitting a feedback value obtained by the gate opening command a.

6. The intelligent drainage method of the drainage pipe network end based on the deep reinforcement learning according to claim 1, wherein in S204, a gate opening instruction with the largest Q value is selected according to epsilon-greedy rule; the epsilon-greedy rule is to select the action with the largest Q value according to the probability of 1-epsilon, randomly select the action according to the probability of epsilon, and randomly explore an unknown state space.

7. The intelligent drainage method of the drainage pipe network end based on deep reinforcement learning according to claim 1, wherein in S205, when the water quality state is S, a gate opening instruction a is transmitted, and the obtained feedback value r is obtained _t Feedback value r _t The calculation formula of (2) is as follows:

；

8. Intelligent drainage system at end of drainage pipe network based on degree of depth reinforcement study, its characterized in that includes: the intelligent water quality monitoring system comprises a water quality acquisition terminal, an intelligent drainage device, a remote control terminal and a visual platform;

the remote control terminal is used for analyzing the collected real-time water quality data based on a pre-trained DQN model and controlling a gate of the intelligent drainage device to execute opening or closing actions according to analysis results; each water outlet is provided with a corresponding representative index, if the representative index is more than or equal to 90% of the index emission standard, the gate is opened, and if the representative index is less than 90% of the index emission standard, the water is judged to reach the standard, and the gate is closed;

the training process for the DQN model comprises:

s206: judging whether the experience pool is full, if not, repeating S202-S205; if the training is full, repeatedly training the Q neural network, repeatedly executing S202-S206, and updating the weight theta of the Q neural network until the preset training target is met;