CN117195443A

CN117195443A - Rainwater pipe network optimization method based on depth deterministic strategy gradient algorithm

Info

Publication number: CN117195443A
Application number: CN202310913925.4A
Authority: CN
Inventors: 杨祺琪; 陈以恒; 陈泰生; 程璐; 冯驰
Original assignee: Suzhou University of Science and Technology
Current assignee: Suzhou University of Science and Technology
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-12-08

Abstract

The invention relates to a rainwater pipe network optimization method based on a depth deterministic strategy gradient algorithm, which comprises the following steps: step S1: defining environment and model parameters, wherein the states comprise water depths of all nodes of a pipe network, flow rates of all pipelines and current rainfall, and the actions comprise changing the diameter of the pipelines, changing the burial depth of the pipelines and the like; step S2: constructing and initializing an Actor and a Critic deep neural network; step S3: defining physical and economic constraints and processing through a reward function; step S4: designing a training process by utilizing an urban hydrological model SWMM, and training and testing by adopting experience playback and a target network technology; step S5: and evaluating and iteratively optimizing the model to meet the preset performance standard. The invention provides a high-efficiency self-adaptive optimizing tool for the design of urban flood control and drainage facilities, and has important value for improving urban flood control capacity, reducing investment cost and operation and maintenance cost and guaranteeing urban resident life quality.

Description

Rainwater pipe network optimization method based on depth deterministic strategy gradient algorithm

Technical Field

The invention relates to the field of information, in particular to a rainwater pipe network optimization method based on a depth deterministic strategy gradient algorithm (DeterministicPolicy Gradient, DDPG).

Background

The urban rainwater pipe network system is a key facility for urban flood control and drainage, and the design optimization of the urban rainwater pipe network system relates to the problems of multiple targets and multiple constraints, such as flood control capacity, manufacturing cost, running cost and the like. Therefore, the effective optimization model is important to improving the flood control capacity of the city, reducing the investment cost and the operation and maintenance cost, and is an important means for guaranteeing the life quality of urban residents.

Traditional rainwater pipe network design optimization methods mainly depend on experience and engineering specifications, such as a weight method and a constraint method. These approaches have some effect on handling single-objective optimization problems, but tend to be frustrating when handling multi-objective optimization and multi-constraint problems. Meanwhile, the methods are limited in constraint condition processing of the pipe network system, manual adjustment is often needed, and the optimization process lacks of dynamics and adaptivity.

In order to solve these problems, researchers have introduced evolutionary algorithms, such as genetic algorithms, particle swarm optimization algorithms, etc., to break through in dealing with multi-objective optimization problems and constraints. However, these methods have difficulty in dealing with continuous state and motion space problems, have limited optimization for large scale problems, and may require significant computational time and resources.

In particular, the problem of optimal design of a rainwater network is a typical multi-objective, multi-constraint continuous state and action space problem. The state space mainly consists of factors such as water depth of each node of the pipe network, flow velocity of each pipeline, current rainfall and the like, while the action space mainly relates to the adjustment of the diameter and the burial depth of a certain pipeline, and even comprises the measures of adding new drainage nodes and the like. The traditional evolution algorithms are mainly suitable for discrete states and action spaces, and for continuous states and action spaces, the algorithms often need to carry out complex encoding and decoding processes, and meanwhile, the problem of low efficiency can also occur in the searching process.

Based on the method, the invention provides a rainwater pipe network optimization model based on a depth deterministic strategy gradient algorithm (DDPG). DDPG is an advanced deep reinforcement learning algorithm, can process continuous states and action spaces, and has high applicability to solving the problem of rainwater pipe network optimization. The DDPG can directly learn the optimal strategy in the continuous state and action space, does not need to carry out complex encoding and decoding processes, and greatly improves the optimization efficiency and precision.

The new model provides an effective tool for the optimization design of the urban rainwater pipe network. The method can effectively process the multi-objective optimization problem and the multi-constraint condition, and can better adapt and process the problems of continuous states and action spaces, thereby showing advantages in the aspect of processing the optimization problem of a large-scale urban rainwater pipe network. The proposal of the model has important theoretical significance and practical value at present of the continuous acceleration of the urban process.

Disclosure of Invention

The invention provides a rainwater pipe network optimization method based on a depth deterministic strategy gradient algorithm, which combines a deep learning and reinforcement learning method, can effectively process multi-objective optimization problems and multi-constraint conditions, simultaneously shows good optimization effects on continuous states and action spaces, provides an efficient and self-adaptive optimization tool for urban flood control and drainage facility design, and has important value for improving urban flood control capacity, reducing investment cost and operation maintenance cost and guaranteeing urban resident life quality.

The invention provides a rainwater pipe network optimization method based on a depth deterministic strategy gradient algorithm, which specifically comprises the following steps: the method comprises the following steps:

step S1: defining the environment and parameters of a model, defining the water depth of each node of a rainwater pipe network, the flow rate of each pipeline and the current rainfall as states, changing the diameter of the pipeline, changing the burial depth of the pipeline or adding new drainage nodes as actions, and finally setting a reward function by considering flood control capacity, cost and design constraint;

step S2: constructing an Actor network and a Critic network, wherein the Actor network receives the current state as an input and outputs an action, and the Critic network receives the current state and the action as an input and outputs an action value and simultaneously initializes network weights;

Step S3: defining physical and economic constraints, including pipe diameter and burial depth limitations, cost limitations, and the ability to cope with a certain intensity of heavy rain, and incorporating these constraints into a reward function, which gives a negative reward if violated;

step S4: simulating rainwater flow and pipe network performance by using an urban hydrological model SWMM model, generating training data, updating an Actor and a Critic network through training, and performing stable training by using experience playback and a target network technology; testing under various possible environments simultaneously to ensure the generalization capability of the model;

step S5: and (3) evaluating the performance and generalization capability of the model by testing the model under different storm intensities, if the model is not well represented, adjusting the model parameters and returning to the step (S4) for iterative training until the model reaches a preset performance standard or the maximum iteration number is reached.

Further, the step S1 includes the steps of:

step S11: collecting and arranging pipe network information, wherein the pipe network information comprises pipe network layout, pipe size and drainage node position, and meanwhile, rainfall data is prepared and processed;

step S12: defining a state space based on the data collected in the step S11, wherein the state space comprises node water depth, pipeline flow rate and current rainfall intensity;

Step S13: defining an action space of the model, including changing the diameter of a pipeline, changing the burial depth or adding a new drainage node, wherein the actions are in an actual operable range;

step S14: constructing a reward function according to flood control capacity, cost and basic design constraint factors;

step S15: a simulation environment is established by using the city hydrologic model SWMM, and the simulation environment can output a simulation new state according to the model action and return corresponding rewards.

Further, the step S2 includes the steps of:

step S21: constructing an Actor network input as a current state for determining an optimal action to be executed for the current state, wherein a state space s consists of water depth h of each node of a pipe network, flow velocity v of each pipe and current rainfall r, which is marked as s= { h, v, r }, an action space a consists of diameter d of the pipe, burial depth l of the pipe, newly added drainage node N, which is marked as a= { d, l, N }, the Actor network is regarded as a mapping function from the state space to the action space, the mapping function is realized by a deep neural network, the Actor network is provided with N layers, and the weight and the bias of the ith layer are respectively W _i And b _i The expression of the activation function ReLU is f (x), assuming that the Actor network input is the state space s, the network output is a, i.e. the pipe network configuration decision, then the Actor network is expressed as:

a＝f _N (W _N f _N-1 (W _N-1 ...f ₂ (W ₂ f ₁ (W ₁ s+b ₁ )+b ₂ )...+b _N-1 )+b _N ) (1)

Wherein f ₁ ，f ₂ ，…f _N-1 ，f _N Respectively representing the activation functions of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network; w (W) ₁ ，W ₂ ，…W _N-1 ，W _N Respectively representing the weight values of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network, b ₁ ，b ₂ ，…b _N-1 ，b _N Respectively representing the bias values of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network;

step S22: constructing a Critic network, wherein the network receives the current state and the action output by the Actor network as input and outputs corresponding actionThe purpose of doing the value, i.e. the expected system performance assessment, in particular the Critic network, which is seen as a mapping function from state-action space to action value, is to assess the value of taking a certain action in the current state, i.e. the effectiveness of the chosen stormwork configuration, assuming that the Actor network has M layers, where the j-th layer weights and biases are W 'respectively' _j And b' _j The activation function ReLU is expressed as g (x), the network input is { s, a }, i.e., the current state and action, the network output is named Q, i.e., the value of the action, and the Critic network is expressed as:

Q＝g _M (W′ _M g _M-1 (W _M-1 ...f ₂ (W′ ₂ g ₁ (W′ ₁ s+b′ ₁ )+b′ ₂ )...+b′ _M-1 )+b′ _M ) (2)

wherein g ₁ ，g ₂ ，…g _M-1 ，g _M Respectively represent the activation functions of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Critic network, W' ₁ ，W′ ₂ ，…W′ _M-1 ，W′ _M Respectively represent the weight value of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Critic network, b' ₁ ，b′ ₂ ，…b′ _N-1 ，b′ _N Respectively representing bias values of a layer 1, a layer 2, a layer …, an N-1 and an N layer of the Critic network;

step S23: before training starts, initializing weights of an Actor network and a Critic network by adopting normal distribution, and setting an initial state for subsequent network optimization;

step S24: an optimizer and a loss function of the network are defined, the model uses an Adam optimizer, and the loss function is set to be the mean square error between the predicted action value and the actual rewards of the Critic network, so that the network is optimized according to the loss function feedback.

Further, the step S3 includes the steps of:

step S31: defining physical constraints, including restrictions on pipe diameter and burial depth, and connectivity of pipe flow;

step S32: defining economic constraints according to economic considerations in practical applications;

step S33: defining flood control capacity constraint, and ensuring that a model-optimized rainwater pipe network can cope with storm with preset intensity;

step S34: the model implements the processing of the above-mentioned various constraints by including constraints in the reward function and giving negative rewards to actions that violate the constraints, in particular, the reward function R is defined as:

R＝r1-λ1*C-λ2*V-λ3*D (3)

where r1 is a base prize, typically set to a positive value; c is the part of the actual cost that exceeds the budget, if not, c=0; v is the part of the flood overflow volume exceeding the tolerance value, if not, v=0; d is the degree of violation of the physical constraint; λ1, λ2, λ3 are the corresponding penalty coefficients;

Step S35: in the training process, the constraints are fused into a model, and each step is used for checking whether the new state meets the constraints or not so as to ensure that the optimized rainwater pipe network meets the physical, economic and flood control capacity constraints.

Further, the step S4 includes the steps of:

step S41: constructing an urban waterlogging model, using an urban hydrological model SWMM, simulating the flowing condition of rainwater in a pipe network, and generating corresponding new states and rewards according to given actions;

step S42: training a model, namely training an Actor network and a Critic network by using the new state and rewards generated in the step S41, wherein the aim is to maximize the flood control capacity of the pipe network and minimize the cost on the premise of meeting various constraints;

step S43: experience playback and a target network technology are adopted, and the experience playback technology is used for randomly extracting past experiences including states, actions and rewards in training to break the correlation between data and increase the training stability; the target network technology solves the problem of instability in the Q-learning training by creating a network with slower parameter updating to provide a stable target Q value, specifically, in the experience playback, the rainwater pipe network optimization model stores a series of past experiences e= (s, a, r, s '), wherein s is the current state, a is the action performed, r is the obtained reward, s' is the new state after the action a is performed, the experiences are stored in an experience playback buffer zone D of the rainwater pipe network optimization model, and a small batch of experiences are randomly extracted from the experience playback buffer zone D for updating the network parameters during each training; in the target network technology, the model has two networks of the same structure: an Actor network a and its corresponding target network a ', and a Critic network C and its corresponding target network C'; the parameters of networks a and C are updated by gradient descent and the parameters of networks a 'and C' are updated by soft update expressed as:

θ′ _A ＝τθ _A +(1-τ)θ′ _A (4)

θ′ _C ＝τθ _C +(1-τ)θ′ _C (5)

Wherein θ _A And theta _C Are parameters of networks A and C, θ' _A And θ' _C Is a parameter of the networks a 'and C', τ is a parameter of soft update;

step S44: model testing is performed to ensure generalization ability of the model by testing the model under different stormwater conditions while removing noise added in step S42 during the testing phase in order to accurately evaluate the performance of the model and make predictions from the model.

Further, in step S42, in order to process the continuous state and the action space, noise is added to encourage the model to explore during the training process; in the prediction stage, removing noise to enable the model to output optimal actions; the method specifically comprises the following steps:

step S421: setting parameters of an Ornstein-Uhlenbeck process, which is used for generating random noise in a continuous motion space, wherein the Ornstein-Uhlenbeck process is a random process, and an update formula of the Ornstein-Uhlenbeck process under discrete time is as follows:

X _t+1 ＝X _t +θ(μ-X _t )Δt+σsqrt(Δt)W _t (6)

wherein X is _t Is the noise value at time t, X _t+1 Is the noise value at time t+1, θ is the velocity parameter, determines the velocity at which the noise value returns to its long term average μ, μ is the long term average, typically set to 0, σ is the standard deviation of the noise, determines the amplitude of the noise, Δt is the time step, W _t Is a random variable sampled from a standard normal distribution, sqrt (Δt) is … …;

step S422: in the model training stage, the random noise generated in the step S421 is added to the output action of the Actor network to encourage the model to explore more possible solutions, specifically, the action space is a= { d, l, n }, and according to the ortein-Uhlenbeck process, noise is generated for each action parameter, namely, noise X is generated for d, l and n respectively _d，t+1 、X _l，t+1 And X _n，t+1 Then, the action a' after adding noise is expressed as:

a′＝a+X _t+1 ＝{d+X _d,t+1 ，l+X _l，t+1 ，n+X _n,t+1 } (7)

in this way, noise generated by the Ornstein-Uhlenbeck process directly affects each decision variable of the rainwater pipe network, including the diameter d of the pipe, the burial depth l and the newly added drainage node n, so that a method for better exploring the solution space is provided for the model; furthermore, the noise added to each decision variable requires a sigma adjustment according to the characteristics and dimensions of these decision variables in order to keep the effect of the noise in all decision variables balanced;

step S423: according to the training effect of the model, adjusting random noise, and if the model is not explored or the prediction is conservative, increasing the noise standard deviation; otherwise, if the model behavior is too random or prediction is too aggressive, the noise standard deviation is reduced, and the model can be ensured to perform well under various rainfall conditions.

Further, the step S5 includes the steps of:

step S51: the model predicts under the condition of simulating storm with various intensities so as to evaluate the performance of a rainwater pipe network;

step S52: according to the prediction result, evaluating the performance of the model in terms of flood control capacity, economic cost, meeting basic design constraint and the like;

step S53: if the model performance does not reach the preset standard or the generalization capability is insufficient, the model super-parameters or the network structure are required to be adjusted, and then training is restarted;

step S54: and if the model performance reaches a preset standard or reaches the maximum iteration number, finishing optimization and recording the current model parameters for subsequent application.

According to the technical scheme, the rainwater pipe network optimization model based on the depth deterministic strategy gradient algorithm has the technical effects that:

1. the model realizes the automatic optimization design of the rainwater pipe network, and effectively improves the efficiency and accuracy of the pipe network design. The DDPG-based algorithm can learn and predict the optimal rainwater pipe network design scheme from historical data, and is faster and more accurate than the traditional design method based on artificial experience. At the same time, the performance of the model will be improved over time through continuous learning and iteration.

2. The model considers various constraint conditions in actual engineering, so that the optimization result is more practical. This includes physical constraints (e.g., pipe diameter and burial depth constraints, connectivity of the water stream), economic constraints (e.g., cost budget), and flood control constraints, etc., so that the optimization results of the model can be optimized while meeting these basic conditions.

3. The model adopts a strategy optimization method of continuous action space, so that an optimization result is finer, and more complex practical problems can be solved. By introducing the Ornstein-Uhlenbeck noise process, the model can better explore the solution space during the training process, thereby finding a better solution. By continuously evaluating and optimizing the performance of the model, the model can exhibit good performance in a variety of different environments (e.g., stormwater conditions of different intensity). Through repeated iterative training, the generalization capability of the model can be remarkably improved.

Drawings

FIG. 1 shows a flow chart of a rainwater pipe network optimization model based on a depth deterministic strategy gradient algorithm provided by an embodiment of the invention;

FIG. 2 shows a block diagram of a specific application of a rainwater pipe network optimization model based on a depth deterministic strategy gradient algorithm provided by an embodiment of the invention;

FIG. 3 is a flow chart of an algorithm of the present invention defining various constraints and processing the constraints;

FIG. 4 shows a schematic diagram of a training process of a rainwater pipe network optimization model provided by an embodiment of the invention;

fig. 5 shows the result of optimizing the rainstorm water network provided by the embodiment of the invention in different rainstorm reproduction periods.

Detailed Description

The following describes and illustrates embodiments of the technical scheme of the present invention in detail with reference to the accompanying drawings:

according to the illustration in fig. 1, a rainwater pipe network optimization model based on a depth deterministic strategy gradient algorithm specifically comprises the following steps:

step S1: environmental and model parameters are defined. Including status, actions, and rewards. The state comprises the water depth of each node of the pipe network, the flow rate of each pipeline, the current rainfall and the like. Actions include changing the diameter of a pipe, changing the burial depth of a pipe, adding new drainage nodes, etc. The bonus function needs to comprehensively consider multiple factors such as flood control capacity, cost, and meeting basic design constraints.

Step S2: and (5) establishing a model. An Actor network is constructed, the input is the current state, and the output is the action. And constructing a Critic network, inputting the current state and the action, and outputting the current state and the action as action values. Network weights are initialized.

Step S3: constraints are defined and processed. Physical and economic constraints are defined for the action and condition, including pipe diameter and burial depth limitations, cost limitations, the ability to cope with a certain intensity of heavy rain, etc. Constraints may be handled by inclusion in the reward function, giving a negative reward if the constraint is violated.

Step S4: the training process is designed. The method comprises the steps of constructing an urban waterlogging model by utilizing a SWMM model (Storm Water Management Model), simulating rainwater flow and pipe Network performance, continuously updating states and rewards, obtaining multiple groups of data, training an Actor and a Critic Network, adopting an experience playback (Experience Replay) and Target Network (Target Network) technology to stabilize the training process, and simultaneously testing the model under various possible environments (such as different storm conditions) so as to ensure the generalization capability of the model.

Step S5: processing continuous states and action spaces. During training, noise is added to encourage the model to explore. In the prediction stage, noise is removed, and an optimal action is output.

Step S6: the optimization model is evaluated and iterated continuously. The model was tested at various storm intensities and the performance and generalization ability of the model were evaluated. If the model does not perform well, the model parameters are adjusted and the process returns to step S4. And continuing to perform the steps S4 to S6 until the model reaches a preset performance standard or the maximum iteration number is reached.

In this embodiment, rainwater pipe network optimization is performed on an area which is often submerged in a certain district in Nanjing city, and the area includes 39 rainwater pipes and 40 rainwater pipe points, and other data in the area include DEM, basic geographic data of buildings, roads and the like. By utilizing the formula of the intensity of the storm in Nanjing, four rains are designed in the embodiment, the rainfall time is two hours, and the reproduction period of the storm is one year, five years, ten years and twenty years. The implementation steps are as follows, as shown in fig. 2.

The problem of optimizing the rain network in this embodiment involves a number of complications, including environment, climate, geography, and economy. Therefore, the invention aims to optimize the pipe network design in an intelligent mode so as to improve the flood control capacity and reduce the cost, and simultaneously meet the basic design constraint. To achieve this goal, it is necessary to construct an environment that closely mimics the real world environment and define a bonus function that covers these goals. Therefore, the invention collects and sorts the rainwater pipe network geographic information and rainfall data of the urban river basin required by the rainwater pipe network optimization, defines the state space and the action space reflecting the real world situation, and constructs the rewarding function capable of reflecting multiple targets such as flood control capability, cost and the like and reflecting design constraint. Finally, using tools such as an urban hydrologic model and the like to build a simulation environment, the environment can simulate a corresponding new state according to the action output of the model, and corresponding rewards are returned. Therefore, through accurate simulation and reasonable rewarding design of the real world, the model can continuously optimize the model in the learning process, so that the aims of optimizing a rainwater pipe network, improving flood control capacity and reducing cost are achieved. Thus, the step S1 comprises the steps of:

Step S11: data is collected and consolidated. And collecting the geographical information of the rainwater pipe network, including the layout of the pipe network, the length, the diameter, the burial depth, the positions of drainage nodes and the like, of the rainwater pipe network to be used for optimizing the urban river basin. At the same time, there is a need to obtain historical and predicted rain data, including stormwater data of various intensities and frequencies.

Step S12: a state space is defined. Based on the collected data, a state space is defined, which may include the water depth of each node of the pipe network, the flow rate of each pipe, and the current rainfall intensity (including storms of various intensities and frequencies), etc.

Step S13: an action space is defined. Determining actions that the model can perform may include changing the diameter of a pipe, changing the burial depth of a pipe, adding new drainage nodes, etc. These actions should be within realistic operable ranges.

Step S14: and constructing a reward function. The bonus function should comprehensively consider multiple factors such as flood control capability, cost and meeting basic design constraints. For example, a positive prize is given when the flood is not overflowing, a negative prize is given when the total cost of the network exceeds the budget, and a negative prize is given when the design constraints are not met (the upstream pipe diameter is equal to or greater than the downstream pipe diameter). In addition, corresponding reward functions are set for heavy rains with different intensities and frequencies, so that the optimized pipe network can cope with rainfall conditions with different intensities.

Step S15: and (5) building a simulation environment. And constructing a simulation environment of the rainwater pipe network according to the defined states and actions and the corresponding reward functions by using an urban hydrologic model SWMM. The environment should be able to simulate the corresponding new state based on the action output of the model and return the corresponding rewards.

In this embodiment, the design of the S2 step of the rainwater pipe network optimization model based on the depth deterministic strategy gradient algorithm is mainly used for establishing two depth neural networks, namely an Actor network and a Critic network, to form a strategy and evaluate the value of the strategy. The two networks cooperate with each other to achieve effective optimization of the rainwater network. First, the Actor network acts as a policy generator whose basic principle is to learn and determine the optimal actions to be taken in a particular environmental state. In the rainwater pipe network optimization problem, the environment states comprise the current rainfall conditions, pipe network states, urban terrains and the like, and the actions comprise measures of adjusting the diameter of a pipeline, changing the connection mode of the pipe network and the like. The Actor network gradually understands the relationship between the environmental state and the action through repeated learning and experiments, so as to learn to formulate the optimal pipe network adjustment strategy under specific conditions. The Critic network then acts as a policy evaluator whose basic principle is to assign a value to each state-action pair, thereby evaluating the policy merits of the Actor network. In the rainwater pipe network optimization problem, the Critic network needs to comprehensively consider various influencing factors, such as flood control capacity, construction and maintenance costs and the like, and comprehensively evaluate the strategy. The design of Critic networks is particularly important in multi-objective optimization problems because it needs to be able to accurately evaluate the trade-offs between different objectives. Finally, the weight initialization of the Actor network and the Critic network is to give the network an initial learning state at the beginning of training. These weights will be updated continuously during the training process to gradually approach the optimal solution. Therefore, the step S2 includes the steps of:

Step S21: and constructing an Actor network. The input is the current state, namely the information including the water depth of each node of the pipe network, the flow rate of each pipeline, the current rainfall, and various stormwater which may exist. These information are combined to form a state description of the stormwater network. The Actor network needs to learn from these conditions and determine the most appropriate actions, including changing the diameter of a pipe, changing the burial depth of a pipe, adding new drainage nodes, etc.

Specifically, the state space is defined by the water depth of each node of the pipe networkh. The flow velocity v of each pipeline, the current rainfall r and other factors are marked as s= { h, v, r }, and the action space is formed by decisions such as the diameter d of the pipeline, the burial depth l of the pipeline, the newly added drainage node n and the like, and is marked as a= { d, l, n }. An Actor network may be viewed as a mapping function from a state space to an action space. This mapping function may be implemented with a deep neural network, defined as N layers, where the weight and bias of the ith layer are W, respectively _i And b _i The activation function is ReLU expressed as f (x). Assuming that the network input is named s, i.e., the pipe network state, and the network output is named a, i.e., the pipe network configuration decision, then the Actor network may be represented as:

Wherein f ₁ ,f ₂ ,…f _N-1 ,f _N Respectively representing the activation functions of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network; w (W) ₁ ,W ₂ ,…W _N-1 ,W _N Respectively representing the weight values of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network, b ₁ ，b ₂ ，…b _N-1 ，b _N Respectively representing the bias values of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network;

step S22: and constructing a Critic network. The Critic network is also a deep neural network, the input of which is the current state and the action output by the Actor network, and the output of which is the corresponding action value. The action value represents the expected performance of the rainwater pipe network system after taking a certain action under a given state. This performance includes a comprehensive assessment of flood protection capability, cost, and multiple factors that meet basic design constraints.

Specifically, the goal of Critic networks is to evaluate the value of taking some action in the current state, i.e., to evaluate the effectiveness of a selected stormwater pipe network configuration. According to the state space s= { h, v, r } (including the factors of the water depth h of each node, the flow velocity v of each pipeline, the current rainfall r and the like) and the action space a= { d, l, nIncluding decisions to change the diameter d of a pipe, change the burial depth l of a pipe, add new drainage nodes n, etc.). Then the Critic network can be regarded as a mapping function from state-action space to action value, assuming that the neural network has M layers, where the j-th layer weights and biases are W 'respectively' _j And b' _j The activation function is expressed as ReLU as g (x), the network input as { s, a }, i.e., current state and action, the network output named Q, i.e., the value of the action, and the Critic network can be expressed as:

step S23: network weights are initialized. Before training begins, the weights of the Actor and Critic networks need to be initialized. It determines the state of the network at the beginning of training. The initial value of the weight may be selected according to the nature of the problem, and the present invention uses a normal distribution.

Step S24: an optimizer and a loss function of the network are defined. In this model, using an Adam optimizer, the loss function is typically defined as the mean square error of the predicted action value of the Critic network and the actual rewards. Therefore, the network can be continuously optimized according to the feedback of the loss function, so that the rainwater pipe network optimization method can be better adapted to and solve the problem of rainwater pipe network optimization.

Constraints are carefully defined and handled in this embodiment to ensure that the optimization process meets the actual situation and requirements. These constraints mainly include physical constraints, economic constraints and flood protection constraints. Physical constraints, including the limitations of pipe diameter and burial depth and connectivity of the pipe network water flow, are set according to the actual physical characteristics and engineering requirements of the rainwater pipe network. In the practical rainwater pipe network design, the basic principles that the embedded depth elevation of an upstream pipe section needs to be larger than the elevation of a downstream pipe section, the diameter of the upstream pipe section needs to be larger than or equal to the diameter of the downstream pipe section and the like are required to be met. The implementation of these constraints in a deep learning model is achieved by setting certain penalty terms, and when the generated actions violate these physical constraints, the model gets a negative reward, so that such actions are gradually avoided in the learning process. Economic constraints mainly refer to the construction and maintenance costs of a rainwater pipe network, because in practical engineering applications, economic factors are often critical constraint factors. This constraint is also achieved by setting a penalty term, and if the construction and maintenance costs incurred by the optimization result exceed a predetermined budget, the model will be negatively rewarded. The flood control capacity constraint is set according to the main function of the rainwater pipe network, namely water drainage and flood control. A rainwater pipe network needs to be able to cope with a certain intensity of heavy rain, which is a basic requirement of flood control capability. Also, when the optimization results cannot meet the flood protection capability requirements, the model will also get negative rewards.

After defining the above constraints, the present invention incorporates these constraints into the model's reward function by introducing penalty terms so that the model always follows these constraints during the learning process. In a specific implementation manner, in each training step, whether the new state meets the constraint is checked, and if not, a negative reward is given, so that the learning direction of the model is guided. The design aims at enabling the model to learn an optimization strategy meeting the actual application requirements by tightly combining the actual engineering constraint conditions with the training process of the deep learning model. In the learning process of the deep learning model, the constraint conditions play a definite guiding role, so that the model can find out strategies for solving the problems, and the strategies conform to the actual engineering constraint, thereby improving the practicability and the application value of the model. Thus, as shown in fig. 3, the step S3 includes the steps of:

step S31: physical constraints are defined. This includes pipe diameter constraints (upstream pipe section diameter needs to be greater than or equal to downstream pipe section diameter), burial depth constraints (upstream pipe section, pipe point burial depth elevation needs to be greater than downstream pipe section, pipe point elevation), and connectivity constraints. Typically, the elevation of the burial depth of the upstream pipe section needs to be greater than the elevation of the downstream pipe section, and the diameter of the upstream pipe section needs to be greater than or equal to the diameter of the downstream pipe section.

Step S32: economic constraints are defined. In view of the fact that in practical application, a certain economic cost is required for the construction and maintenance of a rainwater pipe network, so that a cost limit is required to be set in a model, and the pipe network cost is required to be kept within a budget as much as possible.

Step S33: flood control capacity constraints are defined. The rainwater pipe network needs to be capable of coping with a certain intensity of heavy rain, which is a basic flood control capability requirement, and therefore corresponding constraints are also needed to be set in the model. The rain pipe network in this embodiment needs to be able to cope with four storms, including one year, five years, ten years and twenty years. When these storms occur, the rainwater pipe network does not overflow.

Step S34: processing constraints. After defining the various constraints described above, the model designs a mechanism to handle the constraints so that the results of the optimization meet the constraints, i.e., include the constraints in the reward function. When an action violates a constraint, a penalty may be paid by giving a negative reward. If the new state does not meet the flood protection capability constraint, i.e., the model predicts a flood overflow exceeding a specified tolerance value, then a negative reward may be awarded.

Specifically, the reward function may be defined as:

R＝r1-λ1*C-λ2*V-λ3*D (3)

Where r1 is a base prize, typically set to a positive value; c is the part of the actual cost that exceeds the budget, if not, c=0; v is the part of the flood overflow volume exceeding the tolerance value, if not, v=0; d is the degree of violation of the physical constraint; λ1, λ2, and λ3 are corresponding penalty coefficients, which can be adjusted according to actual conditions.

Step S35: and realizing constraint processing. And merging the defined constraint into training of the rainwater pipe network optimization deep learning model. A particular operation may require checking whether the new state satisfies the constraint at each step of model training or whether the constraint is violated after each action is performed.

In this embodiment, the problems in the aspects of model construction, training, testing, evaluation, optimization and the like need to be solved. The model builds the urban waterlogging model through the urban hydrologic model SWMM, and provides a vivid simulation environment, so that the model can learn an optimal pipe network design scheme through training under the condition of diversified rainwater. Meanwhile, the model introduces experience playback and a target network technology, breaks the relevance between data, improves training stability, provides a stable target Q value, and avoids unstable phenomenon possibly occurring in the training process. After model training is finished, the model is tested by setting different rainfall intensities and frequencies, so that the model is ensured to have good generalization capability and practicality. Finally, through the evaluation of the test results, if the model performance is not expected, tuning can be further performed, including adjusting the network structure, the learning rate and the like. The optimization method processes the optimization process step by step, considers the global and local balance, and ensures the stability and practicability of the model while meeting the optimization targets of flood control capacity and economic cost. Specifically, as shown in fig. 4, the step S4 includes the steps of:

Step S41: and (5) constructing a model. Urban hydrologic model SWMM was used to build urban inland inundation models. This model can simulate the flow of stormwater in a network and generate new states and rewards based on given actions (i.e. decisions).

Step S42: and (5) model training. The generated data is used to train an Actor network (decision network) and a Critic network (value evaluation network). The training aims to maximize the cumulative rewards, namely, the flood control capacity of the pipe network is as high as possible and the cost is as low as possible on the premise of ensuring that the constraint is met.

Step S43: empirical playback and target network techniques are employed. To make the training process more stable, the present invention employs empirical playback (Experience Replay) and Target Network (Target Network) techniques. Experience playback is used by preserving past experiences (state, action, rewards, etc.) and randomly extracting a portion of the training, which breaks the correlation between data and improves the stability of the training. The target network is to solve the problem of instability caused by using the same parameters as the target Q value and the actual Q value in the updating process in the Q-learning, and provides a stable target Q value by creating a network with the same structure as the original network but slower parameter updating.

Specifically, in the experience playback, the stormwater pipe network optimization model preserves a past series of experiences e= (s, a, r, s '), where s is the current state, a is the action performed, r is the reward earned, and s' is the new state after performing action a. These experiences are stored in the experience playback buffer D of the stormwater pipe network optimization model from which a small batch of experiences are randomly extracted for each training to update the network parameters. In the target network technology, the model has two networks of the same structure: an Actor network a and its corresponding target network a ', and a Critic network C and its corresponding target network C'. The parameters of networks a and C are updated by gradient descent, while the parameters of networks a 'and C' are updated by soft update, which can be expressed as:

θ′ _A ＝τθ _A +(1-τ)θ′ _A (4)

θ′ _C ＝τθ _C +(1-τ)θ′ _C (5)

wherein θ _A And theta _C Are parameters of networks A and C, θ' _A And θ' _C Is a parameter of the networks a 'and C', τ is a parameter of soft update, and takes a smaller value.

Step S44: and (5) model testing. To ensure generalization of the model, the model needs to be tested in various possible environments (e.g., under different stormwater conditions). This is done by setting different intensities and frequencies of rainfall.

In the embodiment, in the multi-objective optimization problem of the rainwater pipe network, the application of an Ornstein-Uhlenbeck noise process is involved. Ornstein-Uhlenbeck noise process parameters need to be defined in order to accommodate the continuous action space involved in the optimization of a stormwater pipe network, such as diameter of pipes, burial depth and location of drainage nodes. The introduction of such a noise process may increase the exploratory ability of the model during the training phase, and therefore, may add this noise to the actions of the model output during the training phase to encourage the model to explore more possible solutions. However, when the model makes decisions or predictions in practical applications, it is necessary to remove this noise so that the model can output the optimal solution it believes without being disturbed by random noise. In addition, parameters of the noise process also need to be dynamically adjusted. If the model is not explored enough in the training process or the prediction result is too conservative, the standard deviation of noise can be properly improved; conversely, if the model is behaving too randomly or the prediction results are too aggressive, the standard deviation of the noise should be properly reduced. Through the adjustment, the model can have better performance under different rainfall conditions, and the rainwater pipe network is further optimized, so that the requirements of flood control and economy are met. Specifically, the step S5 includes the steps of:

Step S51: ornstein-Uhlenbeck noise process parameters are defined. In the optimization problem of the rainwater pipe network, since the action space is continuous, including the diameter of the pipe, the burial depth of the pipe, the position of the drainage node, etc., the Ornstein-Uhlenbeck process can be used to generate continuous random noise. Parameters of this process (such as standard deviation and theta values of noise) affect the amplitude and frequency of the noise, and need to be set according to practical optimization problems. The Ornstein-Uhlenbeck process is a random process whose updated formula at discrete times is:

X _t+1 ＝X _t +θ(μ-X _t )Δt+σsqrt(Δt)W _t (6)

wherein X is _t Is the noise value at time t, θ is the velocity parameter, determines the velocity at which the noise value returns to its long term average μ, μ is the long term average, typically set to 0, σ is the standard deviation of the noise, determines the amplitude of the noise, Δt is the time step,W _t is a random variable sampled from a standard normal distribution.

Step S52: noise is added during the training phase. In the model training stage, ornstein-Uhlenbeck noise generated in step S51 is added to the action output by the Actor network. This means that some random variation is added on the basis of the model advice when deciding on the diameter of the pipe, the depth of burial and the location of the drainage nodes, encouraging the model to explore more possible solutions, which can help to find a better stormwater pipe network configuration.

Specifically, the motion space of the invention comprises the diameter d of the pipeline, the burial depth l of the pipeline and the newly added drainage node n, and the motion vector is a= { d, l, n }. According to Ornstein-Uhlenbeck process, we can generate noise for each action parameter, i.e. generate noise X for d, l and n respectively _d，t+1 、X _l，t+1 And X _n，t+1 . Then, the action a' after adding noise can be expressed as:

a′＝a+X _t+1 ＝{d+X _d,t+1 ，l+X _l，t+1 ，n+X _n,t+1 } (7)

thus, noise generated by the Ornstein-Uhlenbeck process directly affects each decision variable of the rainwater pipe network, including the diameter d of the pipe, the burial depth l and the newly added drainage node n, so that a method for better exploring the solution space is provided for the model. Furthermore, the noise added to each decision variable needs to be adjusted σ according to the characteristics and dimensions of these decision variables in order to keep the effect of the noise in all decision variables balanced.

Step S53: noise is removed during the prediction phase. When the model training is completed, the noise added in step S52 should be removed when making predictions or decisions using the model. In this way, in practical application, the rainwater pipe network configuration which is considered to be optimal by the model can be obtained without being interfered by noise.

Step S54: and adjusting noise parameters according to the training effect. If the model is found to be insufficiently explored in the training process or the predicted result is too conservative, the standard deviation of noise can be properly increased; conversely, if the behavior of the model is too random, or the predicted outcome is too aggressive, the standard deviation of the noise may be reduced appropriately. This ensures that the model will perform well in a variety of rainfall situations.

In this embodiment, the model needs to be evaluated and optimized. The effect of the model can be evaluated by simulating storms of different intensities, such as ten years, twenty years and the like, and predicting the performances of the rainwater pipe network under various storms by using the current model. And evaluating various performances such as flood control capacity, economy and whether basic design constraint is met or not according to the prediction result. Such evaluation may be performed by comparing the predicted result with the actual situation, or according to a preset evaluation index. Furthermore, the model is subject to iterative optimization decisions. If the performance of the model fails to meet the preset standard or the capability of the model to handle heavy rain with different intensities is unbalanced, further optimization of the model is required. The optimization mode can comprise adjusting super parameters of a depth deterministic strategy gradient algorithm, such as learning rate, discount factors and the like, and can also adjust the structure of a network, such as the number of layers, the number of nodes and the like. Specifically, the step S6 includes the steps of:

step S61: and (5) evaluating a model. The SWMM simulates the conditions of different storm intensities (such as ten years of storm, twenty years of storm and the like), and the current model is used for prediction, so that the prediction results of the model on the performance of the rainwater pipe network under various storm intensities are obtained.

Step S62: performance evaluation. And evaluating the performance of the model according to the prediction result, wherein the performance comprises flood control capability, economic cost, satisfaction of basic design constraint and the like. This can be done by comparison with the actual situation or according to a preset evaluation index.

Step S63: and (5) iterative optimization judgment. If the performance of the model does not meet the preset criteria or if the generalization of the model is insufficient, for example, the handling capacity of the model is not balanced for storms of different intensities, then further optimization of the model is required. The super parameters of DDPG, including learning rate, discount factor, etc., may be adjusted, or the network structure, including layer number, node number, etc., may be adjusted, and then step S4 is returned to perform a new training round.

Step S64: updating the model. If the performance of the model reaches a preset standard or a maximum number of iterations is reached, this optimization ends. The model parameters at this time are recorded for later use.

The optimized result of the rainwater pipe network in this embodiment can cope with four kinds of rainfall in the period of one year, five years, ten years and twenty years, respectively, and the spatial distribution diagram of the pipe section and the pipe point changed after optimization is shown in fig. 5, where the dashed line part is the pipe network with optimized variation, and the solid line part is the pipe network which is originally not optimized.

In summary, a rainwater pipe network optimization model based on a depth deterministic strategy gradient algorithm (DDPG) is an advanced model combining a deep learning and reinforcement learning method. The model can effectively deal with multi-objective optimization problems and multi-constraint conditions by defining clear environment and model parameters, including states, actions and rewards. In this model, an Actor network is used to determine the optimal actions and a Critic network is used to evaluate the actions of the Actor network. Meanwhile, the model sets physical and economic constraints on states and actions, and can realize the processing of continuous states and action spaces. The training process of the model adopts a SWMM model, and states and rewards are updated by simulating rainwater flow and pipe network performance so as to train an Actor and a Critic network. The training process also adopts experience playback and target network technology to ensure the training stability. To increase the exploratory nature of the model, some noise is added to the output action of the Actor network during training. Through continuous iterative optimization, the model can be tested under various storm intensities, so that the performance and generalization capability of the model are evaluated. The model has excellent performance in processing multi-objective optimization problems and multi-constraint conditions and in continuous states and action spaces, thereby providing an efficient and self-adaptive optimization tool for urban flood control and drainage facility design.

The foregoing is merely illustrative of general procedures and is not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. which fall within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. The rainwater pipe network optimization method based on the depth deterministic strategy gradient algorithm is characterized by comprising the following steps of:

2. The rainwater network optimization method based on the depth deterministic strategy gradient algorithm according to claim 1, wherein the step S1 comprises the following steps:

3. The rainwater network optimization method based on the depth deterministic strategy gradient algorithm according to claim 1, wherein the step S2 comprises the following steps:

Wherein the method comprises the steps of，f ₁ ，f ₂ ，…f _N-1 ，f _N Respectively representing the activation functions of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network; w (W) ₁ ，W ₂ ，…W _N-1 ，W _N Respectively representing the weight values of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network, b ₁ ，b ₂ ，…b _N-1 ，b _N Respectively representing the bias values of the 1 st layer, the 2 nd layer, the … N-1 st layer and the N th layer of the Actor network;

step S22: constructing a Critic network which takes as input the current state and the action output by the Actor network, and outputs the value of the corresponding action, i.e. the expected system performance evaluation, specifically, the Critic network aims to evaluate the value of taking a certain action in the current state, i.e. the effectiveness of the selected rainwater pipe network configuration, and the Critic network is regarded as a mapping function from the state-action space to the action value, and the Actor network is assumed to have M layers, wherein the weight and bias of the j-th layer are W 'respectively' _j And b' _j The activation function ReLU is expressed as g (x), the network input is { s, a }, i.e., the current state and action, the network output is named Q, i.e., the value of the action, and the Critic network is expressed as:

4. The rainwater network optimization method based on the depth deterministic strategy gradient algorithm according to claim 1, wherein the step S3 comprises the following steps:

R＝r1-λ1*C-λ2*V-λ3*D (3)

5. The rainwater network optimization method based on the depth deterministic strategy gradient algorithm according to claim 1, wherein the step S4 comprises the following steps:

θ′ _A ＝τθ _A +(1--T)θ′ _A (4)

θ′ _C ＝τθ _C +(1--T)θ′ _C (5)

and S44, performing model testing to ensure the generalization capability of the model, namely testing the model under different storm conditions, and simultaneously removing noise added in the step S42 in a testing stage so as to accurately evaluate the performance of the model and predict according to the model.

6. The method for optimizing a rainwater pipeline network based on a depth deterministic strategy gradient algorithm according to claim 1, wherein in step S42, in order to deal with continuous states and action spaces, noise is added to encourage the model to explore during training; in the prediction stage, removing noise to enable the model to output optimal actions; the method specifically comprises the following steps:

X _t+1 ＝X _t +θ(μ-X _t )Δt+σsqrt(Δt)W _t (6)

a′＝a+X _t+1 ＝{d+X _d，t+1 ，l+X _l，t+1 ，n+X _n，t+1 } (7)

7. The rainwater network optimization method based on the depth deterministic strategy gradient algorithm according to claim 1, wherein the step S5 comprises the following steps: