CN110533244B

CN110533244B - Optimal scheduling method and system for cascade dam and computer readable storage medium

Info

Publication number: CN110533244B
Application number: CN201910803576.4A
Authority: CN
Inventors: 钟将; 杨昱睿; 吕昱峰; 常婷婷
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2023-04-18
Anticipated expiration: 2039-08-28
Also published as: CN110533244A

Abstract

The invention discloses a method and a system for optimal scheduling of a cascade dam and a computer readable storage medium, belonging to the technical field of optimal scheduling of dams and comprising the following steps: s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology; s2, acquiring an environment state related to scheduling benefits of the stepped dam in the drainage basin in real time; s3, receiving scheduling parameters and constraint conditions set by a user; s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma; s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning; and S6, generating and outputting an optimized scheduling scheme of the step dam. The method and the device solve the technical problem that the traditional optimization method faces huge search space and is difficult to obtain an optimized scheduling scheme.

Description

Optimal scheduling method and system for cascade dam and computer readable storage medium

Technical Field

The invention relates to the technical field of dam optimization scheduling, in particular to a method and a system for cascade dam optimization scheduling and a computer readable storage medium.

Background

The optimal utilization of the water resources in the drainage basin is always a difficult problem in the aspect of dam optimal scheduling technology, and a great challenge exists in the technology, and the problem of the collaborative optimal scheduling of a plurality of dams in the drainage basin is the basis for realizing the optimal utilization of the water resources in the drainage basin. Due to the fact that the multi-dam optimization scheduling scheme is quite complex, factors needing to be considered are quite multiple, and not only are factors such as a large number of hydrology, environmental constraints and industrial pollution emission in a basin needing to be considered comprehensively, but also factors such as agricultural irrigation, power generation requirements, water supply requirements and flood control and water transportation requirements need to be considered. Therefore, the multi-dam cooperative scheduling in the drainage basin needs to solve a plurality of mutually contradictory scheduling targets, so that the global optimization of water resources is realized.

In the prior art, deep reinforcement learning becomes a powerful tool for optimizing scheduling, benefits under different environments are explored by adopting a certain learning mechanism through an intelligent agent, and an optimized scheduling model is gradually formed. An agent is a structure with decision function, which can be a simple algorithm or a function represented by a neural network. However, in the prior art, the control method for scheduling multiple dams in a watershed has few technical researches on changing the running state of each dam according to the same scheduling period. The scheduling of multiple dams in a flow domain usually comprises a huge search space, and the existing optimization method is difficult to obtain a global optimal scheduling scheme from the huge search space, so that a technical scheme is needed to solve the technical problem of multi-dam scheduling, and the global optimal scheduling scheme is obtained.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and the optimal scheduling method and the optimal scheduling system for the cascade dam can search a globally optimal scheduling scheme by utilizing deep reinforcement learning according to the running state of the dam and environmental factors. The method, the system and the computer readable storage medium provided by the invention can be used for obtaining the global optimal scheduling scheme.

In order to achieve the above purpose, the invention provides the following technical scheme:

in one aspect, the invention provides a method for optimizing and scheduling a step dam, which specifically comprises the following steps: s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology; s2, acquiring an environmental state related to scheduling benefits of the step dam in the drainage basin in real time; s3, receiving scheduling parameters and constraint conditions set by a user; s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma; s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning; and S6, generating and outputting an optimized scheduling scheme of the step dam.

Further, step S5 specifically includes: s51, constructing and initializing a playback memory, a prediction network and a learning network, and setting network parameters thereof as the same parameters, wherein prediction functions corresponding to the prediction network and the learning network are Q and Q' respectively; s52, initializing t =0, wherein the state of the dam group is initialized to be

S53, exploring a scheduling scheme, obtaining scheduling scheme tuple related data of the dam group according to the environment state obtained in the S2 and the constraint condition obtained in the S3, and storing the obtained tuple in a playback memory; s54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function; and S55, when the times of exploring the scheduling scheme reach specific times, updating the parameters in the learning network into the prediction network.

Further, step S53 specifically includes: s531, randomly selecting a feasible dam group scheduling scheme as a according to the probability epsilon _t Or with a probability of 1-epsilon, the scheme a is chosen such that the desired learning network choice is the largest one _t (ii) a S532, calculating the profit r in the current scheduling period according to the environment state and the constraint condition _t As the instant income, and the current running state of the dam group is updated to a new state s _t+1 And the resulting tuples<s _t ,a _t ,r _t ,s _t+1 >And storing the data into a playback memory.

Further, step S54 specifically includes: s541, randomly selecting a batch of tuples in the playback memory to generate samples; s542, selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network; s543, updating the parameters in the learning network according to the loss function.

Further, step S55 specifically includes: s551, if the value of the variable is less than or equal to the length period of the scheduling scheme, the variable is increased automatically, and the step S53 is returned to and executed continuously; s552, if the profit value of the scheduling scheme is greater than the optimal profit value, updating the optimal profit value to the profit value of the scheduling scheme, and updating the optimal scheduling scheme; s553, setting a circulation function, and if the circulation variable of the circulation function meets a first condition, updating the parameters of the learning network into the parameters of the prediction network; s554, if the loop variable satisfies the second condition, jumping to step S52 to continue executing; and if the circulation variable does not meet the second condition, jumping out of the circulation and carrying out the next step of operation.

Further, the basic characteristics of the dams comprise the storage capacity, the dead water level, the highest water level and the pre-metering quantity of each dam; the static characteristic values of the dam comprise upstream and downstream relations of the dam, the highest water level line and silt characteristics; the historical data of the river hydrology comprises the annual average runoff, the normal dam water consumption and the average sand transportation.

Further, the environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, irrigation requirements, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and power supply on the internet, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam and warehousing flow data.

Further, the scheduling parameters include a time span and a minimum scheduling period of the scheduling scheme; the constraints include the maximum water level at each dam, maximum discharge flow, maximum silt deposit, environmental requirements at each dam, and maximum content limits for various pollutants.

In another aspect, the present invention further provides an optimized dispatching system for a step dam, including the following units:

the historical data acquisition unit is used for acquiring the basic characteristics of each dam, the static characteristic value of each dam and the historical data of river hydrology;

the real-time data acquisition unit is used for acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time;

a scheduling state receiving unit, configured to receive a scheduling parameter and a constraint condition set by a user;

a deep learning setting unit for setting a learning rate, a greedy degree and a reward decrement value parameter of the deep reinforcement learning, wherein the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma;

the scheduling scheme calculating unit is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;

and the scheduling scheme output unit generates and outputs an optimized scheduling scheme of the step dam.

Meanwhile, the present invention also provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the steps in the method as described above.

Compared with the prior art, the invention has the beneficial effects that:

the method, the system and the computer readable storage medium provided by the invention are simple to operate, and introduce machine learning into the cascade dam scheduling model, thereby avoiding the technical problem that the traditional optimization method is difficult to obtain an optimized scheduling scheme due to huge search space. As the reinforcement learning better adopts heuristic strategies for finding different states, the search of the optimal scheduling scheme is accelerated, and the global optimal scheduling scheme or the approximate optimal scheduling scheme can be quickly converged on a huge solution space.

Drawings

FIG. 1 is a schematic flow chart of an optimized scheduling method for a cascade dam according to the present invention;

FIG. 2 is a schematic diagram illustrating a detailed flow of step S5 in the optimal scheduling method for a step dam according to the present invention;

fig. 3 is a schematic structural diagram of an optimized dispatching system for a step dam according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.

In recent years, deep reinforcement learning becomes a powerful tool for optimizing scheduling, benefits under different environments are explored by adopting a certain learning mechanism, and an optimized scheduling model is gradually formed. The multi-dam scheduling in the drainage basin can adopt a synchronous scheduling control mode according to a fixed period, namely, each dam can change the running state of the dam according to the same scheduling period. The state of the next scheduling cycle in the watershed scheduling depends only on the initial state of the previous cycle and the scheduling action performed on the cycle, i.e. the state of the watershed dam is a markov process. Multiple dam schedules in the flow domain usually contain a very large search space, so solving the problem is very suitable for finding sub-optimal solutions by adopting an enhanced deep learning method.

The invention discloses a method, a system and a computer readable storage medium for optimal scheduling of a step dam, and the specific implementation mode is as follows as well known:

the existence of m dams in a flow domain requires coordinated scheduling, the length of a scheduling scheme is L periods, and the length of each period is any specified time length, and can be 1 day, several hours or even several minutes. If 4 dams exist in the preset flow domain and need to be cooperatively scheduled, the number of the dams needs to be cooperatively scheduled is m =4; if the length of the scheduling scheme is 15 cycles, the length cycle of the scheduling scheme is L =15; the length of each cycle is 1 day, i.e. a scheduling scheme is generated for the future 15 days.

It should be noted that the current operation state of the dam bank is recorded as s _t (ii) a The dispatching line of the dam group in the t-th period is marked as a _t (ii) a Scheduling revenue function r _t ＝f(s _t ,a _t ) Representing the scheduling behavior a of the dam group according to the t-th scheduling period _t The obtained profit amount has profit values of [ - ∞, + ∞ [ ]]The specific scheduling revenue function is related according to factors such as agricultural production, industrial activities, market conditions, meteorological conditions and the like associated with the dam group; recording a playback Memory (Replay Memory) of the learning process as D; and the prediction grid for predicting the benefit of the scheduling behavior is recorded as a Target _ Q network, and the learning network for performing online learning of the scheduling scheme is recorded as a Main _ Net network.

For convenience of description, each dam only has four types of basic scheduling actions of water storage, flood smoothening, sand flushing and power generation, and the action numbers of the basic scheduling actions are 0,1,2 and 3 respectively. Since the power generation type scheduling action may include starting a plurality of generator sets, various possible power generation scheduling modes of the generator sets are discretized into a plurality of scheduling actions, and different scheduling action numbers are given. Also for flood-clearing typeThe scheduling may correspond to different traffic, or it may be discretized into different scheduling actions and given different numbers. The scheduling actions that each dam can perform can be numbered with integers. For convenience, the impoundment scheduling action numbers of all dams may be set to 0. Suppose A _i For the scheduling action set of the ith dam, the union of all the scheduling action sets of the dams is A, and the A = A ₁ ∪A ₂ ....∪A ₄ =0,1,2,3. Then the scheduling action for the dam group in the t +1 th scheduling period can be expressed as a vector, which is denoted as a _t ＝[act ₁ ,act ₂ ,...,act ₄ ]In act therein ₄ ∈A。

The scheduling action type of the dam group is n, and the number of the scheduling schemes selectable for the dam group formed by m dams in each period is m x n. Since the scheduling actions of the dams are four actions of water storage, flood smoothening, sand flushing and power generation, the scheduling action type n =4 of the dam group, and the number m =4 of the set dams needing scheduling is set, so that the number of the scheduling schemes of the dam group consisting of 4 dams per period is 16. The revenue for the cycle and the final expected revenue are set to- ∞forthe combination in which the scheduling constraint is violated.

The day i state vector of a dam bank is represented as s _i ＝[a ₀ ,...,a _k ,...,a ₁₅ ]Wherein the element a _k And (4) representing the scheduling behavior of the dam group in the kth scheduling period, so that the scheduling scheme starts to adjust the scheduling scheme of the dam group from the initial scheduling state until the action of selecting the last scheduling period so as to find the optimal cooperative scheduling scheme.

The scheduling of the dam group in the river basin adopts a synchronous scheduling mode, namely the adjustment state adopts the same interval period; various scheduling behaviors of dams in the drainage basin are represented as unique numbers, the scheduling behaviors of dam groups in the drainage basin in a certain period can be represented by a vector, and each element represents the scheduling behavior of a corresponding dam in the certain period; one way of scheduling is to represent the scheduling behavior over a certain number of cycles as a sequentially concatenated vector.

Fig. 1 is a flowchart illustrating a method for optimized scheduling of a step dam according to an exemplary embodiment. Referring to fig. 1, an optimized scheduling method for a step dam of the present embodiment includes the following steps:

s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology.

The basic characteristics of the cascade dams in the river basin comprise the data of the storage capacity, the dead water level, the highest water level, the siltation amount and the like of each dam; static characteristic values such as upstream and downstream relations between dams, highest water level line, silt characteristics and the like; the characteristic value of the dam and historical data of river hydrological data, such as annual average runoff, normal dam water consumption, average sand transportation and the like.

And S2, acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time.

The environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, the irrigation demand, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and internet power, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam, warehousing flow data and the like. Data such as crop future price, weather statistical information and the like can be acquired from relevant websites in real time according to internet crawlers, data of the running water level and the warehousing flow of each dam can be acquired from a monitoring site of the dam, and emission data of various industrial pollutants can be acquired from a data exchange platform between environmental protection bureaus.

And S3, setting scheduling parameters and constraint conditions.

The scheduling parameters include a time span and a minimum scheduling period of the scheduling scheme, such as time span L =15 days, and the minimum scheduling period is 1 day; the constraint conditions comprise the highest water level, the maximum discharge flow, the maximum sediment deposition and the like of each dam; the constraint conditions also comprise the environmental requirements of each dam and the maximum content limit values of various pollutants, such as the environmental protection limit values of pollutants such as sulfate, chloride, total ammonia nitrogen, total phosphorus and the like in river water.

And S4, setting the learning rate, the greedy degree and the reward decrement value parameters of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning by eta, epsilon and gamma.

The learning rate, the greedy degree, the reward decrement value and other parameters of the deep reinforcement learning are set, the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma, the three parameters are not more than 1.0, the optimal parameters are obtained according to experiments and are respectively set to be eta =0.3, epsilon =0.5 and gamma =0.8, so that the parameters are set to be the optimal parameters in default, and a user can modify related parameters according to actual conditions. The parameters set here are the initial usual parameters, but the Q function used later will use these several parameters.

And S5, calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning.

Fig. 2 is a flowchart illustrating step S5 of a method for optimized scheduling of a step dam according to an exemplary embodiment. Referring to fig. 2, step S5 specifically includes the following steps:

step S51, a playback memory, a prediction network and a learning network are constructed and initialized, network parameters are set to be the same, and functions Q and Q' represent prediction functions corresponding to two neural networks.

And constructing and initializing a playback memory D, wherein the size of the playback memory D can be set to 1000, and the playback memory D is used for recording the latest 1000 suboptimal scheduling results. The scheduling algorithm comprises two deep learning neural networks and a playback memory, wherein one is a prediction network Target _ Q for predicting the profit of the scheduling behavior and is used for representing the expected profit of the scheduling behavior in a scheduling state, and the other is a learning network Main _ Net for performing online learning of the scheduling scheme, online training is performed in the playback memory D according to scheduling historical data, and parameters are periodically copied into the prediction network Target _ Q. The Target _ Q network and the Main _ Net network are initialized and the network structure and the grid parameters are set to be the same, for example, the Target _ Q network and the Main _ Net network can be set to be a full-connection BP neural network with 4 layers. Setting maximum number of search rounds M =100000, setting reinforcement learning network updateThe period C =100 times, and the optimal scheduling scheme s is set _best Empty, bestP = - ∞, initial T =1. And performing online training in a playback memory according to scheduling historical data, and periodically copying the parameters into a prediction network Target _ Q, wherein prediction functions corresponding to the two neural networks are Q and Q' respectively. Playback of data stored in memory D as<s _j ,a _j ,r _j ,s _j+1 >Respectively representing the current running state of the dam group, the scheduling behavior of the dam group in the jth period, and the scheduling behavior a of the dam group in the jth period _j The current gain obtained and the next operating state of the dam bank.

Step S52, initializing t =0, and initializing the state of the dam group to be

Searching an optimal scheduling scheme initial t =0 in each round, and searching an initial vector of a dam group

In (3), the scheduling vector is set to be an m-dimensional all-0 vector in each period. Setting scheduling scheme of dam group to vector s of all zeros ₀ The current state is s _t 。

And S53, exploring the scheduling scheme, acquiring scheduling scheme tuple related data of the dam group according to the environment state acquired in S2 and the constraint condition acquired in S3, and storing the acquired tuple in a playback memory.

Specifically, step S53 specifically includes:

step S531, the exploration scheduling scheme is that a feasible scheduling scheme of a dam group is randomly selected as a according to the probability epsilon _t Or with a probability of 1-epsilon, the scheme is chosen such that the desired Target _ Q network selection is the largest, i.e. the desired maximum

The function phi (),

coding function of dam bank status and scheduling behavior, respectively, then phi(s) _t ) Represents a state s _t Corresponding code->

And coding corresponding to the scheduling behavior of the t-th period. Therefore, the corresponding input of the Target _ Q network and the Main _ Net network is the code phi(s) of the dam group state _t ) And &>

The output adopts the scheduling scheme a in the state _t The maximum expected profit value obtained.

Step S532, according to the environment state and the constraint condition, the profit r in the current scheduling period is calculated _t As the instant income, and the current running state of the dam group is updated to a new state s _t+1 And the resulting tuples<s _t ,a _t ,r _t ,s _t+1 >And storing the data into a playback memory D.

The tuple in the playback memory D is a quadruple<s _t ,a _t ,r _t ,s _t+1 >Wherein s is _t And s _t+1 Respectively represents the initial state of the scheduling period of the t-th scheduling period t +1 of the dam group, a _t For scheduling actions performed on the cycle, r _t A benefit value for the scheduling action over the scheduling period.

And S54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function.

Specifically, step S54 specifically includes:

in step S541, a batch of tuples are randomly selected from the playback memory to generate samples.

Step S542, selecting training samples according to the probability of the absolute value of the difference between the samples and the prediction network.

Selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network, and randomly selecting a group of tuples in a playback memory D to generate the training sample, wherein a certain tuple is<s _j ,a _j ,r _j ,s _j+1 >For each state s _j Expected profit of _j Then the function Q corresponding to the Target _ Q network can be used to estimate:

in step S543, the parameters in the learning network are updated according to the loss function.

Then follow

Function to update parameter ≥ in Main _ Net network>

Wherein Q' is a corresponding benefit prediction function, φ and +, for the network>

Is a coding function of the scheduling scheme and the scheduling behavior.

And step S55, updating the parameters in the learning network into the prediction network when the times of exploring the scheduling scheme reach a specific number.

In step S551, if the value of the variable t is less than or equal to the length period L of the scheduling scheme, the variable t is incremented, i.e. t = t +1, and the process returns to step S53 to continue.

In step S552, if the profit value of the current scheduling scheme is greater than the best profit value bestP, the best profit value bestP is updated to the profit value of the scheduling scheme, and the best scheduling scheme is updated.

In step S553, a loop function is set, and if the loop variable of the loop function satisfies the first condition, the parameter of the learning network is updated to the parameter of the prediction network. Setting a cycle T = T +1, and if a first condition T mod C is met and is zero, learning parameters of the network Main _ Net network

And updating to the network parameter theta of the prediction network Target _ Q, wherein C is the updating period of the reinforcement learning network.

Step S554, if the loop variable satisfies the second condition, jumping to step S52 to continue execution; and if the circulation variable does not meet the second condition, jumping out of the circulation and carrying out the next step of operation. If the second condition T is satisfied and is less than the maximum number M of exploration rounds, jumping to step S52 to continue execution; and if the second condition T is not met and is smaller than the maximum exploration round number M, jumping out of the loop and carrying out the next step of operation.

And S6, generating and outputting an optimized scheduling scheme of the step dam.

By the method, the parameters of the Main _ Net network of the network can be learned

And updating the predicted network Target _ Q network parameter theta, summing the predicted network Target _ Q network parameter theta at the moment, generating an optimized scheduling scheme of the step dam and outputting the optimized scheduling scheme.

In the invention, special neural network processing is used, not all updated Q (s, a) before are trained during each training, but a fixed-size training data pool, namely a queue, is used, the updated Q value is randomly inserted into the queue during each exploration, and random batch _ size (generally 64) data is taken as training data to update the neural network after each exploration is finished. In addition, a double-network structure is also used, namely a training network is separated from an evaluation network, so that the network structure during evaluation can be less influenced by relevant data.

The invention abstracts the scheduling optimization problem of the dam group into an initial scheduling scheme s ₀ (state without any scheduling action), finding a path to reach the optimal scheduling scheme (optimal scheduling state) s _best (ii) a Dam group current operation state use s _t This is shown because the current state of the basin is determined and the state (water level, silt accumulation state, pollutant content, etc.) is only corresponding to the current stateThe scheduling behavior over the first t-1 cycles is relevant; the dispatching line of the dam group in the t-th period is marked as a _t (ii) a Scheduling revenue function r _t ＝f(s _t ,a _t ) Scheduling activity a for dam group in t period _t The obtained profit amount has profit values of [ - ∞, + ∞ [ ]]The specific revenue function is related to factors such as agricultural production, industrial activities, market conditions, and weather conditions associated with the dam bank during the period.

Besides the optimal scheduling method for the cascade dam, the invention also provides an optimal scheduling system for the cascade dam. As shown in fig. 3, the present system includes a history data acquisition unit 10, a real-time data acquisition unit 20, a schedule status receiving unit 30, a deep learning setting unit 40, a schedule scheme calculation unit 50, and a schedule scheme output unit 60.

And the historical data acquisition unit 10 is used for acquiring the historical data of each dam basic characteristic, the static characteristic value of the dam and the river hydrology.

And the real-time data acquisition unit 20 is used for acquiring the environmental state of the step dam in the flow field related to the scheduling profit in real time.

A scheduling status receiving unit 30, configured to receive the scheduling parameter and the constraint condition set by the user.

The deep learning setting unit 40 is configured to set a learning rate, a greedy degree, and a reward decrement value parameter of the deep reinforcement learning, and represent the learning rate, the greedy degree, and the reward decrement value of the deep reinforcement learning by η, epsilon, and gamma, respectively.

And the scheduling scheme calculating unit 50 is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning.

And the scheduling scheme output unit 60 generates and outputs an optimized scheduling scheme of the step dam.

The relevant units provided in the system are used for executing the relevant instructions in the above described step dam optimization scheduling method, and are not described herein again because they have been described in detail above.

Meanwhile, the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the steps in the method are realized when the computer program is executed by a processor. Since the steps of the method have been described in detail, they are not described in detail herein.

In summary, the above description is only a detailed description of the preferred embodiments of the present invention, and not intended to limit the scope of the present invention. In practical applications, a person skilled in the art can make several modifications according to the technical solution. Any modification, equivalent replacement, partial application, etc. made on the basis of the principle set forth in the present invention shall be included in the scope of protection of the present invention.

Claims

1. The optimal scheduling method for the step dam is characterized by comprising the following steps of:

s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology;

the basic characteristics of the dam comprise the storage capacity, the dead water level, the highest water level and the pre-metering of each dam; the static characteristic values of the dam comprise upstream and downstream relations of the dam, the highest water level line and silt characteristics; historical data of river hydrology comprise annual average runoff, normal dam water consumption and average sand transportation;

s2, acquiring an environmental state related to scheduling benefits of the step dam in the drainage basin in real time;

the environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, the irrigation demand, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and on-line electricity, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam and warehousing flow data;

s3, receiving scheduling parameters and constraint conditions set by a user;

the scheduling parameters comprise a time span and a minimum scheduling period of a scheduling scheme; the constraint conditions comprise the highest water level of each dam, the maximum discharge flow, the maximum silt deposition amount, the environmental requirements of each dam and the highest content limit value of various pollutants;

s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma;

s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;

s51, constructing and initializing a playback memory, initializing a prediction network and a learning network, and setting network parameters thereof as the same parameters, wherein prediction functions corresponding to the prediction network and the learning network are Q and Q' respectively;

s52, initializing t =0, and initializing the state of the dam group to be

S53, exploring a scheduling scheme, obtaining scheduling scheme tuple related data of the dam group according to the environment state obtained in the S2 and the constraint condition obtained in the S3, and storing the obtained tuple in a playback memory;

s531, randomly selecting a feasible dam group scheduling scheme as alpha according to the probability epsilon _t Or with a probability of 1-epsilon, the scheme alpha is chosen such that the desired learning network choice is the largest one _t ；

S532, calculating the profit r in the current scheduling period according to the environment state and the constraint condition _t As the instant income, and the current running state of the dam group is updated to a new state s _t+1 And the resulting tuples<s _t ,α _t ,r _t ,s _t+1 >Storing the data into a playback memory;

s54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function;

s55, when the times of exploring the scheduling scheme reach specific times, updating the parameters in the learning network into the prediction network;

2. The optimized scheduling method of a cascade dam as claimed in claim 1, wherein the step S54 specifically comprises:

s541, randomly selecting a batch of tuples in the playback memory to generate samples;

s542, selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network;

and S543, updating the parameters in the learning network according to the loss function.

3. The optimized dispatching method for cascade dams of claim 1, wherein the step of S55 comprises:

s551, if the value of the variable is less than or equal to the length period of the scheduling scheme, the variable is automatically increased, and the step S53 is returned to continue to be executed;

s552, if the profit value of the scheduling scheme is greater than the optimal profit value, updating the optimal profit value to the profit value of the scheduling scheme, and updating the optimal scheduling scheme;

s553, setting a circulation function, and if the circulation variable of the circulation function meets a first condition, updating the parameters of the learning network into the parameters of the prediction network;

s554, if the loop variable satisfies the second condition, jumping to step S52 to continue executing; and if the circulation variable does not meet the second condition, jumping out of circulation and carrying out the next step of operation.

4. A cascade dam optimal dispatch system based on the method of any one of claims 1-3, characterized in that the system comprises the following units:

5. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.