CN110533244B - Optimal scheduling method and system for cascade dam and computer readable storage medium - Google Patents

Optimal scheduling method and system for cascade dam and computer readable storage medium Download PDF

Info

Publication number
CN110533244B
CN110533244B CN201910803576.4A CN201910803576A CN110533244B CN 110533244 B CN110533244 B CN 110533244B CN 201910803576 A CN201910803576 A CN 201910803576A CN 110533244 B CN110533244 B CN 110533244B
Authority
CN
China
Prior art keywords
dam
scheduling
learning
scheduling scheme
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910803576.4A
Other languages
Chinese (zh)
Other versions
CN110533244A (en
Inventor
钟将
杨昱睿
吕昱峰
常婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910803576.4A priority Critical patent/CN110533244B/en
Publication of CN110533244A publication Critical patent/CN110533244A/en
Application granted granted Critical
Publication of CN110533244B publication Critical patent/CN110533244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for optimal scheduling of a cascade dam and a computer readable storage medium, belonging to the technical field of optimal scheduling of dams and comprising the following steps: s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology; s2, acquiring an environment state related to scheduling benefits of the stepped dam in the drainage basin in real time; s3, receiving scheduling parameters and constraint conditions set by a user; s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma; s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning; and S6, generating and outputting an optimized scheduling scheme of the step dam. The method and the device solve the technical problem that the traditional optimization method faces huge search space and is difficult to obtain an optimized scheduling scheme.

Description

Optimal scheduling method and system for cascade dam and computer readable storage medium
Technical Field
The invention relates to the technical field of dam optimization scheduling, in particular to a method and a system for cascade dam optimization scheduling and a computer readable storage medium.
Background
The optimal utilization of the water resources in the drainage basin is always a difficult problem in the aspect of dam optimal scheduling technology, and a great challenge exists in the technology, and the problem of the collaborative optimal scheduling of a plurality of dams in the drainage basin is the basis for realizing the optimal utilization of the water resources in the drainage basin. Due to the fact that the multi-dam optimization scheduling scheme is quite complex, factors needing to be considered are quite multiple, and not only are factors such as a large number of hydrology, environmental constraints and industrial pollution emission in a basin needing to be considered comprehensively, but also factors such as agricultural irrigation, power generation requirements, water supply requirements and flood control and water transportation requirements need to be considered. Therefore, the multi-dam cooperative scheduling in the drainage basin needs to solve a plurality of mutually contradictory scheduling targets, so that the global optimization of water resources is realized.
In the prior art, deep reinforcement learning becomes a powerful tool for optimizing scheduling, benefits under different environments are explored by adopting a certain learning mechanism through an intelligent agent, and an optimized scheduling model is gradually formed. An agent is a structure with decision function, which can be a simple algorithm or a function represented by a neural network. However, in the prior art, the control method for scheduling multiple dams in a watershed has few technical researches on changing the running state of each dam according to the same scheduling period. The scheduling of multiple dams in a flow domain usually comprises a huge search space, and the existing optimization method is difficult to obtain a global optimal scheduling scheme from the huge search space, so that a technical scheme is needed to solve the technical problem of multi-dam scheduling, and the global optimal scheduling scheme is obtained.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and the optimal scheduling method and the optimal scheduling system for the cascade dam can search a globally optimal scheduling scheme by utilizing deep reinforcement learning according to the running state of the dam and environmental factors. The method, the system and the computer readable storage medium provided by the invention can be used for obtaining the global optimal scheduling scheme.
In order to achieve the above purpose, the invention provides the following technical scheme:
in one aspect, the invention provides a method for optimizing and scheduling a step dam, which specifically comprises the following steps: s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology; s2, acquiring an environmental state related to scheduling benefits of the step dam in the drainage basin in real time; s3, receiving scheduling parameters and constraint conditions set by a user; s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma; s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning; and S6, generating and outputting an optimized scheduling scheme of the step dam.
Further, step S5 specifically includes: s51, constructing and initializing a playback memory, a prediction network and a learning network, and setting network parameters thereof as the same parameters, wherein prediction functions corresponding to the prediction network and the learning network are Q and Q' respectively; s52, initializing t =0, wherein the state of the dam group is initialized to be
Figure BDA0002182999950000021
S53, exploring a scheduling scheme, obtaining scheduling scheme tuple related data of the dam group according to the environment state obtained in the S2 and the constraint condition obtained in the S3, and storing the obtained tuple in a playback memory; s54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function; and S55, when the times of exploring the scheduling scheme reach specific times, updating the parameters in the learning network into the prediction network.
Further, step S53 specifically includes: s531, randomly selecting a feasible dam group scheduling scheme as a according to the probability epsilon t Or with a probability of 1-epsilon, the scheme a is chosen such that the desired learning network choice is the largest one t (ii) a S532, calculating the profit r in the current scheduling period according to the environment state and the constraint condition t As the instant income, and the current running state of the dam group is updated to a new state s t+1 And the resulting tuples<s t ,a t ,r t ,s t+1 >And storing the data into a playback memory.
Further, step S54 specifically includes: s541, randomly selecting a batch of tuples in the playback memory to generate samples; s542, selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network; s543, updating the parameters in the learning network according to the loss function.
Further, step S55 specifically includes: s551, if the value of the variable is less than or equal to the length period of the scheduling scheme, the variable is increased automatically, and the step S53 is returned to and executed continuously; s552, if the profit value of the scheduling scheme is greater than the optimal profit value, updating the optimal profit value to the profit value of the scheduling scheme, and updating the optimal scheduling scheme; s553, setting a circulation function, and if the circulation variable of the circulation function meets a first condition, updating the parameters of the learning network into the parameters of the prediction network; s554, if the loop variable satisfies the second condition, jumping to step S52 to continue executing; and if the circulation variable does not meet the second condition, jumping out of the circulation and carrying out the next step of operation.
Further, the basic characteristics of the dams comprise the storage capacity, the dead water level, the highest water level and the pre-metering quantity of each dam; the static characteristic values of the dam comprise upstream and downstream relations of the dam, the highest water level line and silt characteristics; the historical data of the river hydrology comprises the annual average runoff, the normal dam water consumption and the average sand transportation.
Further, the environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, irrigation requirements, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and power supply on the internet, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam and warehousing flow data.
Further, the scheduling parameters include a time span and a minimum scheduling period of the scheduling scheme; the constraints include the maximum water level at each dam, maximum discharge flow, maximum silt deposit, environmental requirements at each dam, and maximum content limits for various pollutants.
In another aspect, the present invention further provides an optimized dispatching system for a step dam, including the following units:
the historical data acquisition unit is used for acquiring the basic characteristics of each dam, the static characteristic value of each dam and the historical data of river hydrology;
the real-time data acquisition unit is used for acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time;
a scheduling state receiving unit, configured to receive a scheduling parameter and a constraint condition set by a user;
a deep learning setting unit for setting a learning rate, a greedy degree and a reward decrement value parameter of the deep reinforcement learning, wherein the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma;
the scheduling scheme calculating unit is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;
and the scheduling scheme output unit generates and outputs an optimized scheduling scheme of the step dam.
Meanwhile, the present invention also provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the steps in the method as described above.
Compared with the prior art, the invention has the beneficial effects that:
the method, the system and the computer readable storage medium provided by the invention are simple to operate, and introduce machine learning into the cascade dam scheduling model, thereby avoiding the technical problem that the traditional optimization method is difficult to obtain an optimized scheduling scheme due to huge search space. As the reinforcement learning better adopts heuristic strategies for finding different states, the search of the optimal scheduling scheme is accelerated, and the global optimal scheduling scheme or the approximate optimal scheduling scheme can be quickly converged on a huge solution space.
Drawings
FIG. 1 is a schematic flow chart of an optimized scheduling method for a cascade dam according to the present invention;
FIG. 2 is a schematic diagram illustrating a detailed flow of step S5 in the optimal scheduling method for a step dam according to the present invention;
fig. 3 is a schematic structural diagram of an optimized dispatching system for a step dam according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.
In recent years, deep reinforcement learning becomes a powerful tool for optimizing scheduling, benefits under different environments are explored by adopting a certain learning mechanism, and an optimized scheduling model is gradually formed. The multi-dam scheduling in the drainage basin can adopt a synchronous scheduling control mode according to a fixed period, namely, each dam can change the running state of the dam according to the same scheduling period. The state of the next scheduling cycle in the watershed scheduling depends only on the initial state of the previous cycle and the scheduling action performed on the cycle, i.e. the state of the watershed dam is a markov process. Multiple dam schedules in the flow domain usually contain a very large search space, so solving the problem is very suitable for finding sub-optimal solutions by adopting an enhanced deep learning method.
The invention discloses a method, a system and a computer readable storage medium for optimal scheduling of a step dam, and the specific implementation mode is as follows as well known:
the existence of m dams in a flow domain requires coordinated scheduling, the length of a scheduling scheme is L periods, and the length of each period is any specified time length, and can be 1 day, several hours or even several minutes. If 4 dams exist in the preset flow domain and need to be cooperatively scheduled, the number of the dams needs to be cooperatively scheduled is m =4; if the length of the scheduling scheme is 15 cycles, the length cycle of the scheduling scheme is L =15; the length of each cycle is 1 day, i.e. a scheduling scheme is generated for the future 15 days.
It should be noted that the current operation state of the dam bank is recorded as s t (ii) a The dispatching line of the dam group in the t-th period is marked as a t (ii) a Scheduling revenue function r t =f(s t ,a t ) Representing the scheduling behavior a of the dam group according to the t-th scheduling period t The obtained profit amount has profit values of [ - ∞, + ∞ [ ]]The specific scheduling revenue function is related according to factors such as agricultural production, industrial activities, market conditions, meteorological conditions and the like associated with the dam group; recording a playback Memory (Replay Memory) of the learning process as D; and the prediction grid for predicting the benefit of the scheduling behavior is recorded as a Target _ Q network, and the learning network for performing online learning of the scheduling scheme is recorded as a Main _ Net network.
For convenience of description, each dam only has four types of basic scheduling actions of water storage, flood smoothening, sand flushing and power generation, and the action numbers of the basic scheduling actions are 0,1,2 and 3 respectively. Since the power generation type scheduling action may include starting a plurality of generator sets, various possible power generation scheduling modes of the generator sets are discretized into a plurality of scheduling actions, and different scheduling action numbers are given. Also for flood-clearing typeThe scheduling may correspond to different traffic, or it may be discretized into different scheduling actions and given different numbers. The scheduling actions that each dam can perform can be numbered with integers. For convenience, the impoundment scheduling action numbers of all dams may be set to 0. Suppose A i For the scheduling action set of the ith dam, the union of all the scheduling action sets of the dams is A, and the A = A 1 ∪A 2 ....∪A 4 =0,1,2,3. Then the scheduling action for the dam group in the t +1 th scheduling period can be expressed as a vector, which is denoted as a t =[act 1 ,act 2 ,...,act 4 ]In act therein 4 ∈A。
The scheduling action type of the dam group is n, and the number of the scheduling schemes selectable for the dam group formed by m dams in each period is m x n. Since the scheduling actions of the dams are four actions of water storage, flood smoothening, sand flushing and power generation, the scheduling action type n =4 of the dam group, and the number m =4 of the set dams needing scheduling is set, so that the number of the scheduling schemes of the dam group consisting of 4 dams per period is 16. The revenue for the cycle and the final expected revenue are set to- ∞forthe combination in which the scheduling constraint is violated.
The day i state vector of a dam bank is represented as s i =[a 0 ,...,a k ,...,a 15 ]Wherein the element a k And (4) representing the scheduling behavior of the dam group in the kth scheduling period, so that the scheduling scheme starts to adjust the scheduling scheme of the dam group from the initial scheduling state until the action of selecting the last scheduling period so as to find the optimal cooperative scheduling scheme.
The scheduling of the dam group in the river basin adopts a synchronous scheduling mode, namely the adjustment state adopts the same interval period; various scheduling behaviors of dams in the drainage basin are represented as unique numbers, the scheduling behaviors of dam groups in the drainage basin in a certain period can be represented by a vector, and each element represents the scheduling behavior of a corresponding dam in the certain period; one way of scheduling is to represent the scheduling behavior over a certain number of cycles as a sequentially concatenated vector.
Fig. 1 is a flowchart illustrating a method for optimized scheduling of a step dam according to an exemplary embodiment. Referring to fig. 1, an optimized scheduling method for a step dam of the present embodiment includes the following steps:
s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology.
The basic characteristics of the cascade dams in the river basin comprise the data of the storage capacity, the dead water level, the highest water level, the siltation amount and the like of each dam; static characteristic values such as upstream and downstream relations between dams, highest water level line, silt characteristics and the like; the characteristic value of the dam and historical data of river hydrological data, such as annual average runoff, normal dam water consumption, average sand transportation and the like.
And S2, acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time.
The environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, the irrigation demand, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and internet power, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam, warehousing flow data and the like. Data such as crop future price, weather statistical information and the like can be acquired from relevant websites in real time according to internet crawlers, data of the running water level and the warehousing flow of each dam can be acquired from a monitoring site of the dam, and emission data of various industrial pollutants can be acquired from a data exchange platform between environmental protection bureaus.
And S3, setting scheduling parameters and constraint conditions.
The scheduling parameters include a time span and a minimum scheduling period of the scheduling scheme, such as time span L =15 days, and the minimum scheduling period is 1 day; the constraint conditions comprise the highest water level, the maximum discharge flow, the maximum sediment deposition and the like of each dam; the constraint conditions also comprise the environmental requirements of each dam and the maximum content limit values of various pollutants, such as the environmental protection limit values of pollutants such as sulfate, chloride, total ammonia nitrogen, total phosphorus and the like in river water.
And S4, setting the learning rate, the greedy degree and the reward decrement value parameters of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning by eta, epsilon and gamma.
The learning rate, the greedy degree, the reward decrement value and other parameters of the deep reinforcement learning are set, the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma, the three parameters are not more than 1.0, the optimal parameters are obtained according to experiments and are respectively set to be eta =0.3, epsilon =0.5 and gamma =0.8, so that the parameters are set to be the optimal parameters in default, and a user can modify related parameters according to actual conditions. The parameters set here are the initial usual parameters, but the Q function used later will use these several parameters.
And S5, calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning.
Fig. 2 is a flowchart illustrating step S5 of a method for optimized scheduling of a step dam according to an exemplary embodiment. Referring to fig. 2, step S5 specifically includes the following steps:
step S51, a playback memory, a prediction network and a learning network are constructed and initialized, network parameters are set to be the same, and functions Q and Q' represent prediction functions corresponding to two neural networks.
And constructing and initializing a playback memory D, wherein the size of the playback memory D can be set to 1000, and the playback memory D is used for recording the latest 1000 suboptimal scheduling results. The scheduling algorithm comprises two deep learning neural networks and a playback memory, wherein one is a prediction network Target _ Q for predicting the profit of the scheduling behavior and is used for representing the expected profit of the scheduling behavior in a scheduling state, and the other is a learning network Main _ Net for performing online learning of the scheduling scheme, online training is performed in the playback memory D according to scheduling historical data, and parameters are periodically copied into the prediction network Target _ Q. The Target _ Q network and the Main _ Net network are initialized and the network structure and the grid parameters are set to be the same, for example, the Target _ Q network and the Main _ Net network can be set to be a full-connection BP neural network with 4 layers. Setting maximum number of search rounds M =100000, setting reinforcement learning network updateThe period C =100 times, and the optimal scheduling scheme s is set best Empty, bestP = - ∞, initial T =1. And performing online training in a playback memory according to scheduling historical data, and periodically copying the parameters into a prediction network Target _ Q, wherein prediction functions corresponding to the two neural networks are Q and Q' respectively. Playback of data stored in memory D as<s j ,a j ,r j ,s j+1 >Respectively representing the current running state of the dam group, the scheduling behavior of the dam group in the jth period, and the scheduling behavior a of the dam group in the jth period j The current gain obtained and the next operating state of the dam bank.
Step S52, initializing t =0, and initializing the state of the dam group to be
Figure BDA0002182999950000091
Searching an optimal scheduling scheme initial t =0 in each round, and searching an initial vector of a dam group
Figure BDA0002182999950000092
In (3), the scheduling vector is set to be an m-dimensional all-0 vector in each period. Setting scheduling scheme of dam group to vector s of all zeros 0 The current state is s t
And S53, exploring the scheduling scheme, acquiring scheduling scheme tuple related data of the dam group according to the environment state acquired in S2 and the constraint condition acquired in S3, and storing the acquired tuple in a playback memory.
Specifically, step S53 specifically includes:
step S531, the exploration scheduling scheme is that a feasible scheduling scheme of a dam group is randomly selected as a according to the probability epsilon t Or with a probability of 1-epsilon, the scheme is chosen such that the desired Target _ Q network selection is the largest, i.e. the desired maximum
Figure BDA0002182999950000101
The function phi (),
Figure BDA0002182999950000102
coding function of dam bank status and scheduling behavior, respectively, then phi(s) t ) Represents a state s t Corresponding code->
Figure BDA0002182999950000103
And coding corresponding to the scheduling behavior of the t-th period. Therefore, the corresponding input of the Target _ Q network and the Main _ Net network is the code phi(s) of the dam group state t ) And &>
Figure BDA0002182999950000104
The output adopts the scheduling scheme a in the state t The maximum expected profit value obtained.
Step S532, according to the environment state and the constraint condition, the profit r in the current scheduling period is calculated t As the instant income, and the current running state of the dam group is updated to a new state s t+1 And the resulting tuples<s t ,a t ,r t ,s t+1 >And storing the data into a playback memory D.
The tuple in the playback memory D is a quadruple<s t ,a t ,r t ,s t+1 >Wherein s is t And s t+1 Respectively represents the initial state of the scheduling period of the t-th scheduling period t +1 of the dam group, a t For scheduling actions performed on the cycle, r t A benefit value for the scheduling action over the scheduling period.
And S54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function.
Specifically, step S54 specifically includes:
in step S541, a batch of tuples are randomly selected from the playback memory to generate samples.
Step S542, selecting training samples according to the probability of the absolute value of the difference between the samples and the prediction network.
Selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network, and randomly selecting a group of tuples in a playback memory D to generate the training sample, wherein a certain tuple is<s j ,a j ,r j ,s j+1 >For each state s j Expected profit of j Then the function Q corresponding to the Target _ Q network can be used to estimate:
Figure BDA0002182999950000111
in step S543, the parameters in the learning network are updated according to the loss function.
Then follow
Figure BDA0002182999950000112
Function to update parameter ≥ in Main _ Net network>
Figure BDA0002182999950000113
Wherein Q' is a corresponding benefit prediction function, φ and +, for the network>
Figure BDA0002182999950000114
Is a coding function of the scheduling scheme and the scheduling behavior.
And step S55, updating the parameters in the learning network into the prediction network when the times of exploring the scheduling scheme reach a specific number.
In step S551, if the value of the variable t is less than or equal to the length period L of the scheduling scheme, the variable t is incremented, i.e. t = t +1, and the process returns to step S53 to continue.
In step S552, if the profit value of the current scheduling scheme is greater than the best profit value bestP, the best profit value bestP is updated to the profit value of the scheduling scheme, and the best scheduling scheme is updated.
In step S553, a loop function is set, and if the loop variable of the loop function satisfies the first condition, the parameter of the learning network is updated to the parameter of the prediction network. Setting a cycle T = T +1, and if a first condition T mod C is met and is zero, learning parameters of the network Main _ Net network
Figure BDA0002182999950000115
And updating to the network parameter theta of the prediction network Target _ Q, wherein C is the updating period of the reinforcement learning network.
Step S554, if the loop variable satisfies the second condition, jumping to step S52 to continue execution; and if the circulation variable does not meet the second condition, jumping out of the circulation and carrying out the next step of operation. If the second condition T is satisfied and is less than the maximum number M of exploration rounds, jumping to step S52 to continue execution; and if the second condition T is not met and is smaller than the maximum exploration round number M, jumping out of the loop and carrying out the next step of operation.
And S6, generating and outputting an optimized scheduling scheme of the step dam.
By the method, the parameters of the Main _ Net network of the network can be learned
Figure BDA0002182999950000116
And updating the predicted network Target _ Q network parameter theta, summing the predicted network Target _ Q network parameter theta at the moment, generating an optimized scheduling scheme of the step dam and outputting the optimized scheduling scheme.
In the invention, special neural network processing is used, not all updated Q (s, a) before are trained during each training, but a fixed-size training data pool, namely a queue, is used, the updated Q value is randomly inserted into the queue during each exploration, and random batch _ size (generally 64) data is taken as training data to update the neural network after each exploration is finished. In addition, a double-network structure is also used, namely a training network is separated from an evaluation network, so that the network structure during evaluation can be less influenced by relevant data.
The invention abstracts the scheduling optimization problem of the dam group into an initial scheduling scheme s 0 (state without any scheduling action), finding a path to reach the optimal scheduling scheme (optimal scheduling state) s best (ii) a Dam group current operation state use s t This is shown because the current state of the basin is determined and the state (water level, silt accumulation state, pollutant content, etc.) is only corresponding to the current stateThe scheduling behavior over the first t-1 cycles is relevant; the dispatching line of the dam group in the t-th period is marked as a t (ii) a Scheduling revenue function r t =f(s t ,a t ) Scheduling activity a for dam group in t period t The obtained profit amount has profit values of [ - ∞, + ∞ [ ]]The specific revenue function is related to factors such as agricultural production, industrial activities, market conditions, and weather conditions associated with the dam bank during the period.
Besides the optimal scheduling method for the cascade dam, the invention also provides an optimal scheduling system for the cascade dam. As shown in fig. 3, the present system includes a history data acquisition unit 10, a real-time data acquisition unit 20, a schedule status receiving unit 30, a deep learning setting unit 40, a schedule scheme calculation unit 50, and a schedule scheme output unit 60.
And the historical data acquisition unit 10 is used for acquiring the historical data of each dam basic characteristic, the static characteristic value of the dam and the river hydrology.
And the real-time data acquisition unit 20 is used for acquiring the environmental state of the step dam in the flow field related to the scheduling profit in real time.
A scheduling status receiving unit 30, configured to receive the scheduling parameter and the constraint condition set by the user.
The deep learning setting unit 40 is configured to set a learning rate, a greedy degree, and a reward decrement value parameter of the deep reinforcement learning, and represent the learning rate, the greedy degree, and the reward decrement value of the deep reinforcement learning by η, epsilon, and gamma, respectively.
And the scheduling scheme calculating unit 50 is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning.
And the scheduling scheme output unit 60 generates and outputs an optimized scheduling scheme of the step dam.
The relevant units provided in the system are used for executing the relevant instructions in the above described step dam optimization scheduling method, and are not described herein again because they have been described in detail above.
Meanwhile, the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the steps in the method are realized when the computer program is executed by a processor. Since the steps of the method have been described in detail, they are not described in detail herein.
In summary, the above description is only a detailed description of the preferred embodiments of the present invention, and not intended to limit the scope of the present invention. In practical applications, a person skilled in the art can make several modifications according to the technical solution. Any modification, equivalent replacement, partial application, etc. made on the basis of the principle set forth in the present invention shall be included in the scope of protection of the present invention.

Claims (5)

1. The optimal scheduling method for the step dam is characterized by comprising the following steps of:
s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology;
the basic characteristics of the dam comprise the storage capacity, the dead water level, the highest water level and the pre-metering of each dam; the static characteristic values of the dam comprise upstream and downstream relations of the dam, the highest water level line and silt characteristics; historical data of river hydrology comprise annual average runoff, normal dam water consumption and average sand transportation;
s2, acquiring an environmental state related to scheduling benefits of the step dam in the drainage basin in real time;
the environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, the irrigation demand, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and on-line electricity, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam and warehousing flow data;
s3, receiving scheduling parameters and constraint conditions set by a user;
the scheduling parameters comprise a time span and a minimum scheduling period of a scheduling scheme; the constraint conditions comprise the highest water level of each dam, the maximum discharge flow, the maximum silt deposition amount, the environmental requirements of each dam and the highest content limit value of various pollutants;
s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma;
s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;
s51, constructing and initializing a playback memory, initializing a prediction network and a learning network, and setting network parameters thereof as the same parameters, wherein prediction functions corresponding to the prediction network and the learning network are Q and Q' respectively;
s52, initializing t =0, and initializing the state of the dam group to be
Figure FDA0004117369730000011
S53, exploring a scheduling scheme, obtaining scheduling scheme tuple related data of the dam group according to the environment state obtained in the S2 and the constraint condition obtained in the S3, and storing the obtained tuple in a playback memory;
s531, randomly selecting a feasible dam group scheduling scheme as alpha according to the probability epsilon t Or with a probability of 1-epsilon, the scheme alpha is chosen such that the desired learning network choice is the largest one t
S532, calculating the profit r in the current scheduling period according to the environment state and the constraint condition t As the instant income, and the current running state of the dam group is updated to a new state s t+1 And the resulting tuples<s tt ,r t ,s t+1 >Storing the data into a playback memory;
s54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function;
s55, when the times of exploring the scheduling scheme reach specific times, updating the parameters in the learning network into the prediction network;
and S6, generating and outputting an optimized scheduling scheme of the step dam.
2. The optimized scheduling method of a cascade dam as claimed in claim 1, wherein the step S54 specifically comprises:
s541, randomly selecting a batch of tuples in the playback memory to generate samples;
s542, selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network;
and S543, updating the parameters in the learning network according to the loss function.
3. The optimized dispatching method for cascade dams of claim 1, wherein the step of S55 comprises:
s551, if the value of the variable is less than or equal to the length period of the scheduling scheme, the variable is automatically increased, and the step S53 is returned to continue to be executed;
s552, if the profit value of the scheduling scheme is greater than the optimal profit value, updating the optimal profit value to the profit value of the scheduling scheme, and updating the optimal scheduling scheme;
s553, setting a circulation function, and if the circulation variable of the circulation function meets a first condition, updating the parameters of the learning network into the parameters of the prediction network;
s554, if the loop variable satisfies the second condition, jumping to step S52 to continue executing; and if the circulation variable does not meet the second condition, jumping out of circulation and carrying out the next step of operation.
4. A cascade dam optimal dispatch system based on the method of any one of claims 1-3, characterized in that the system comprises the following units:
the historical data acquisition unit is used for acquiring the basic characteristics of each dam, the static characteristic value of each dam and the historical data of river hydrology;
the real-time data acquisition unit is used for acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time;
a scheduling state receiving unit, configured to receive a scheduling parameter and a constraint condition set by a user;
a deep learning setting unit for setting a learning rate, a greedy degree and a reward decrement value parameter of the deep reinforcement learning, wherein the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma;
the scheduling scheme calculating unit is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;
and the scheduling scheme output unit generates and outputs an optimized scheduling scheme of the step dam.
5. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.
CN201910803576.4A 2019-08-28 2019-08-28 Optimal scheduling method and system for cascade dam and computer readable storage medium Active CN110533244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910803576.4A CN110533244B (en) 2019-08-28 2019-08-28 Optimal scheduling method and system for cascade dam and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910803576.4A CN110533244B (en) 2019-08-28 2019-08-28 Optimal scheduling method and system for cascade dam and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110533244A CN110533244A (en) 2019-12-03
CN110533244B true CN110533244B (en) 2023-04-18

Family

ID=68664901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910803576.4A Active CN110533244B (en) 2019-08-28 2019-08-28 Optimal scheduling method and system for cascade dam and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110533244B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219236B (en) * 2021-11-29 2022-10-04 长江三峡通航管理局 Cascaded hub navigation joint scheduling method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3515038A1 (en) * 2018-01-19 2019-07-24 General Electric Company Autonomous reconfigurable virtual sensing system for cyber-attack neutralization

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102258990B (en) * 2011-05-17 2012-10-10 陕西科技大学 Method for preparing light sewage treatment material
US20130317957A1 (en) * 2012-05-25 2013-11-28 Teknon Corporation DBA Symmetry Software, Inc. Location Based Determination of Payroll Tax Withholding
CN105225017B (en) * 2015-10-30 2019-02-26 南京南瑞集团公司 A kind of GROUP OF HYDROPOWER STATIONS Short-term Optimal Operation method of multi-Agent
CN105869070B (en) * 2016-04-06 2020-09-11 大连理工大学 Cooperative optimization scheduling method for balance of benefits of cross-basin cascade hydropower station group
US10929743B2 (en) * 2016-09-27 2021-02-23 Disney Enterprises, Inc. Learning to schedule control fragments for physics-based character simulation and robots using deep Q-learning
CN106485366A (en) * 2016-10-31 2017-03-08 武汉大学 A kind of complexity Cascade Reservoirs retaining phase Optimization Scheduling
CN106951985B (en) * 2017-03-06 2021-06-25 河海大学 Multi-objective optimal scheduling method for cascade reservoir based on improved artificial bee colony algorithm
WO2019118460A1 (en) * 2017-12-11 2019-06-20 The Texas A&M University System Irrigation system control with predictive water balance capabilities
CN108647829A (en) * 2018-05-16 2018-10-12 河海大学 A kind of Hydropower Stations combined dispatching Rules extraction method based on random forest
CN108966352B (en) * 2018-07-06 2019-09-27 北京邮电大学 Dynamic beam dispatching method based on depth enhancing study
CN109347149B (en) * 2018-09-20 2022-04-22 国网河南省电力公司电力科学研究院 Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3515038A1 (en) * 2018-01-19 2019-07-24 General Electric Company Autonomous reconfigurable virtual sensing system for cyber-attack neutralization

Also Published As

Publication number Publication date
CN110533244A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
Baillieul et al. Encyclopedia of systems and control
Hınçal et al. Optimization of multireservoir systems by genetic algorithm
Wang et al. An integrated power load point-interval forecasting system based on information entropy and multi-objective optimization
Nedić Distributed optimization
CN106529732A (en) Carbon emission efficiency prediction method based on neural network and random frontier analysis
Pannocchia Distributed model predictive control
CN110533244B (en) Optimal scheduling method and system for cascade dam and computer readable storage medium
CN117744501B (en) Water network system regulation node optimal scheduling and decision-making method considering ecological flow
Choi et al. Developing optimal reservoir rule curve for hydropower reservoir with an add-on water supply function using improved grey wolf optimizer
Kumar et al. Environmentally sound short-term hydrothermal generation scheduling using intensified water cycle approach
Sørensen Dynamic positioning control systems for ships and underwater vehicles
Lafortune Diagnosis of discrete event systems
CN117494861A (en) Water resource optimal allocation method for coordinating city and county two-stage water resource utilization targets
Tadokoro Disaster response robot
Jothiprakash et al. Comparison of policies derived from stochastic dynamic programming and genetic algorithm models
Castañón Dynamic noncooperative games
Bar-Shalom et al. Data association
Kawan Data rate of nonlinear control systems and feedback entropy
Shim Disturbance observers
Jeong et al. Implementation of simplified sequential stochastic model predictive control for operation of hydropower system under uncertainty
Liu Machine learning for wind power prediction
Ravazzi et al. Dynamical social networks
Sela et al. Distributed sensing for monitoring water distribution systems
Gokayaz et al. From Probabilistic Seasonal Streamflow Forecasts to Optimal Reservoir Operations: A Stochastic Programming Approach
Cantoni et al. Demand-driven automatic control of irrigation channels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant