CN110533244B - Optimal scheduling method and system for cascade dam and computer readable storage medium - Google Patents
Optimal scheduling method and system for cascade dam and computer readable storage medium Download PDFInfo
- Publication number
- CN110533244B CN110533244B CN201910803576.4A CN201910803576A CN110533244B CN 110533244 B CN110533244 B CN 110533244B CN 201910803576 A CN201910803576 A CN 201910803576A CN 110533244 B CN110533244 B CN 110533244B
- Authority
- CN
- China
- Prior art keywords
- dam
- scheduling
- learning
- scheduling scheme
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000003860 storage Methods 0.000 title claims abstract description 16
- 230000002787 reinforcement Effects 0.000 claims abstract description 22
- 238000013135 deep learning Methods 0.000 claims abstract description 13
- 230000003068 static effect Effects 0.000 claims abstract description 10
- 230000008901 benefit Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 29
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 14
- 230000007613 environmental effect Effects 0.000 claims description 13
- 239000003344 environmental pollutant Substances 0.000 claims description 9
- 231100000719 pollutant Toxicity 0.000 claims description 9
- 230000009191 jumping Effects 0.000 claims description 8
- 238000010248 power generation Methods 0.000 claims description 8
- 239000004576 sand Substances 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000003973 irrigation Methods 0.000 claims description 4
- 230000002262 irrigation Effects 0.000 claims description 4
- 239000003337 fertilizer Substances 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000011144 upstream manufacturing Methods 0.000 claims description 3
- 230000008021 deposition Effects 0.000 claims description 2
- 230000005611 electricity Effects 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 9
- 230000009471 action Effects 0.000 description 21
- 230000006399 behavior Effects 0.000 description 15
- 238000013528 artificial neural network Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012271 agricultural production Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000011010 flushing procedure Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- VEXZGXHMUGYJMC-UHFFFAOYSA-M Chloride anion Chemical compound [Cl-] VEXZGXHMUGYJMC-UHFFFAOYSA-M 0.000 description 1
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-L Sulfate Chemical compound [O-]S([O-])(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-L 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- XKMRRTOUMJRJIA-UHFFFAOYSA-N ammonia nh3 Chemical compound N.N XKMRRTOUMJRJIA-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 229910052698 phosphorus Inorganic materials 0.000 description 1
- 239000011574 phosphorus Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013049 sediment Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method and a system for optimal scheduling of a cascade dam and a computer readable storage medium, belonging to the technical field of optimal scheduling of dams and comprising the following steps: s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology; s2, acquiring an environment state related to scheduling benefits of the stepped dam in the drainage basin in real time; s3, receiving scheduling parameters and constraint conditions set by a user; s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma; s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning; and S6, generating and outputting an optimized scheduling scheme of the step dam. The method and the device solve the technical problem that the traditional optimization method faces huge search space and is difficult to obtain an optimized scheduling scheme.
Description
Technical Field
The invention relates to the technical field of dam optimization scheduling, in particular to a method and a system for cascade dam optimization scheduling and a computer readable storage medium.
Background
The optimal utilization of the water resources in the drainage basin is always a difficult problem in the aspect of dam optimal scheduling technology, and a great challenge exists in the technology, and the problem of the collaborative optimal scheduling of a plurality of dams in the drainage basin is the basis for realizing the optimal utilization of the water resources in the drainage basin. Due to the fact that the multi-dam optimization scheduling scheme is quite complex, factors needing to be considered are quite multiple, and not only are factors such as a large number of hydrology, environmental constraints and industrial pollution emission in a basin needing to be considered comprehensively, but also factors such as agricultural irrigation, power generation requirements, water supply requirements and flood control and water transportation requirements need to be considered. Therefore, the multi-dam cooperative scheduling in the drainage basin needs to solve a plurality of mutually contradictory scheduling targets, so that the global optimization of water resources is realized.
In the prior art, deep reinforcement learning becomes a powerful tool for optimizing scheduling, benefits under different environments are explored by adopting a certain learning mechanism through an intelligent agent, and an optimized scheduling model is gradually formed. An agent is a structure with decision function, which can be a simple algorithm or a function represented by a neural network. However, in the prior art, the control method for scheduling multiple dams in a watershed has few technical researches on changing the running state of each dam according to the same scheduling period. The scheduling of multiple dams in a flow domain usually comprises a huge search space, and the existing optimization method is difficult to obtain a global optimal scheduling scheme from the huge search space, so that a technical scheme is needed to solve the technical problem of multi-dam scheduling, and the global optimal scheduling scheme is obtained.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and the optimal scheduling method and the optimal scheduling system for the cascade dam can search a globally optimal scheduling scheme by utilizing deep reinforcement learning according to the running state of the dam and environmental factors. The method, the system and the computer readable storage medium provided by the invention can be used for obtaining the global optimal scheduling scheme.
In order to achieve the above purpose, the invention provides the following technical scheme:
in one aspect, the invention provides a method for optimizing and scheduling a step dam, which specifically comprises the following steps: s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology; s2, acquiring an environmental state related to scheduling benefits of the step dam in the drainage basin in real time; s3, receiving scheduling parameters and constraint conditions set by a user; s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma; s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning; and S6, generating and outputting an optimized scheduling scheme of the step dam.
Further, step S5 specifically includes: s51, constructing and initializing a playback memory, a prediction network and a learning network, and setting network parameters thereof as the same parameters, wherein prediction functions corresponding to the prediction network and the learning network are Q and Q' respectively; s52, initializing t =0, wherein the state of the dam group is initialized to beS53, exploring a scheduling scheme, obtaining scheduling scheme tuple related data of the dam group according to the environment state obtained in the S2 and the constraint condition obtained in the S3, and storing the obtained tuple in a playback memory; s54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function; and S55, when the times of exploring the scheduling scheme reach specific times, updating the parameters in the learning network into the prediction network.
Further, step S53 specifically includes: s531, randomly selecting a feasible dam group scheduling scheme as a according to the probability epsilon t Or with a probability of 1-epsilon, the scheme a is chosen such that the desired learning network choice is the largest one t (ii) a S532, calculating the profit r in the current scheduling period according to the environment state and the constraint condition t As the instant income, and the current running state of the dam group is updated to a new state s t+1 And the resulting tuples<s t ,a t ,r t ,s t+1 >And storing the data into a playback memory.
Further, step S54 specifically includes: s541, randomly selecting a batch of tuples in the playback memory to generate samples; s542, selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network; s543, updating the parameters in the learning network according to the loss function.
Further, step S55 specifically includes: s551, if the value of the variable is less than or equal to the length period of the scheduling scheme, the variable is increased automatically, and the step S53 is returned to and executed continuously; s552, if the profit value of the scheduling scheme is greater than the optimal profit value, updating the optimal profit value to the profit value of the scheduling scheme, and updating the optimal scheduling scheme; s553, setting a circulation function, and if the circulation variable of the circulation function meets a first condition, updating the parameters of the learning network into the parameters of the prediction network; s554, if the loop variable satisfies the second condition, jumping to step S52 to continue executing; and if the circulation variable does not meet the second condition, jumping out of the circulation and carrying out the next step of operation.
Further, the basic characteristics of the dams comprise the storage capacity, the dead water level, the highest water level and the pre-metering quantity of each dam; the static characteristic values of the dam comprise upstream and downstream relations of the dam, the highest water level line and silt characteristics; the historical data of the river hydrology comprises the annual average runoff, the normal dam water consumption and the average sand transportation.
Further, the environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, irrigation requirements, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and power supply on the internet, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam and warehousing flow data.
Further, the scheduling parameters include a time span and a minimum scheduling period of the scheduling scheme; the constraints include the maximum water level at each dam, maximum discharge flow, maximum silt deposit, environmental requirements at each dam, and maximum content limits for various pollutants.
In another aspect, the present invention further provides an optimized dispatching system for a step dam, including the following units:
the historical data acquisition unit is used for acquiring the basic characteristics of each dam, the static characteristic value of each dam and the historical data of river hydrology;
the real-time data acquisition unit is used for acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time;
a scheduling state receiving unit, configured to receive a scheduling parameter and a constraint condition set by a user;
a deep learning setting unit for setting a learning rate, a greedy degree and a reward decrement value parameter of the deep reinforcement learning, wherein the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma;
the scheduling scheme calculating unit is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;
and the scheduling scheme output unit generates and outputs an optimized scheduling scheme of the step dam.
Meanwhile, the present invention also provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the steps in the method as described above.
Compared with the prior art, the invention has the beneficial effects that:
the method, the system and the computer readable storage medium provided by the invention are simple to operate, and introduce machine learning into the cascade dam scheduling model, thereby avoiding the technical problem that the traditional optimization method is difficult to obtain an optimized scheduling scheme due to huge search space. As the reinforcement learning better adopts heuristic strategies for finding different states, the search of the optimal scheduling scheme is accelerated, and the global optimal scheduling scheme or the approximate optimal scheduling scheme can be quickly converged on a huge solution space.
Drawings
FIG. 1 is a schematic flow chart of an optimized scheduling method for a cascade dam according to the present invention;
FIG. 2 is a schematic diagram illustrating a detailed flow of step S5 in the optimal scheduling method for a step dam according to the present invention;
fig. 3 is a schematic structural diagram of an optimized dispatching system for a step dam according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.
In recent years, deep reinforcement learning becomes a powerful tool for optimizing scheduling, benefits under different environments are explored by adopting a certain learning mechanism, and an optimized scheduling model is gradually formed. The multi-dam scheduling in the drainage basin can adopt a synchronous scheduling control mode according to a fixed period, namely, each dam can change the running state of the dam according to the same scheduling period. The state of the next scheduling cycle in the watershed scheduling depends only on the initial state of the previous cycle and the scheduling action performed on the cycle, i.e. the state of the watershed dam is a markov process. Multiple dam schedules in the flow domain usually contain a very large search space, so solving the problem is very suitable for finding sub-optimal solutions by adopting an enhanced deep learning method.
The invention discloses a method, a system and a computer readable storage medium for optimal scheduling of a step dam, and the specific implementation mode is as follows as well known:
the existence of m dams in a flow domain requires coordinated scheduling, the length of a scheduling scheme is L periods, and the length of each period is any specified time length, and can be 1 day, several hours or even several minutes. If 4 dams exist in the preset flow domain and need to be cooperatively scheduled, the number of the dams needs to be cooperatively scheduled is m =4; if the length of the scheduling scheme is 15 cycles, the length cycle of the scheduling scheme is L =15; the length of each cycle is 1 day, i.e. a scheduling scheme is generated for the future 15 days.
It should be noted that the current operation state of the dam bank is recorded as s t (ii) a The dispatching line of the dam group in the t-th period is marked as a t (ii) a Scheduling revenue function r t =f(s t ,a t ) Representing the scheduling behavior a of the dam group according to the t-th scheduling period t The obtained profit amount has profit values of [ - ∞, + ∞ [ ]]The specific scheduling revenue function is related according to factors such as agricultural production, industrial activities, market conditions, meteorological conditions and the like associated with the dam group; recording a playback Memory (Replay Memory) of the learning process as D; and the prediction grid for predicting the benefit of the scheduling behavior is recorded as a Target _ Q network, and the learning network for performing online learning of the scheduling scheme is recorded as a Main _ Net network.
For convenience of description, each dam only has four types of basic scheduling actions of water storage, flood smoothening, sand flushing and power generation, and the action numbers of the basic scheduling actions are 0,1,2 and 3 respectively. Since the power generation type scheduling action may include starting a plurality of generator sets, various possible power generation scheduling modes of the generator sets are discretized into a plurality of scheduling actions, and different scheduling action numbers are given. Also for flood-clearing typeThe scheduling may correspond to different traffic, or it may be discretized into different scheduling actions and given different numbers. The scheduling actions that each dam can perform can be numbered with integers. For convenience, the impoundment scheduling action numbers of all dams may be set to 0. Suppose A i For the scheduling action set of the ith dam, the union of all the scheduling action sets of the dams is A, and the A = A 1 ∪A 2 ....∪A 4 =0,1,2,3. Then the scheduling action for the dam group in the t +1 th scheduling period can be expressed as a vector, which is denoted as a t =[act 1 ,act 2 ,...,act 4 ]In act therein 4 ∈A。
The scheduling action type of the dam group is n, and the number of the scheduling schemes selectable for the dam group formed by m dams in each period is m x n. Since the scheduling actions of the dams are four actions of water storage, flood smoothening, sand flushing and power generation, the scheduling action type n =4 of the dam group, and the number m =4 of the set dams needing scheduling is set, so that the number of the scheduling schemes of the dam group consisting of 4 dams per period is 16. The revenue for the cycle and the final expected revenue are set to- ∞forthe combination in which the scheduling constraint is violated.
The day i state vector of a dam bank is represented as s i =[a 0 ,...,a k ,...,a 15 ]Wherein the element a k And (4) representing the scheduling behavior of the dam group in the kth scheduling period, so that the scheduling scheme starts to adjust the scheduling scheme of the dam group from the initial scheduling state until the action of selecting the last scheduling period so as to find the optimal cooperative scheduling scheme.
The scheduling of the dam group in the river basin adopts a synchronous scheduling mode, namely the adjustment state adopts the same interval period; various scheduling behaviors of dams in the drainage basin are represented as unique numbers, the scheduling behaviors of dam groups in the drainage basin in a certain period can be represented by a vector, and each element represents the scheduling behavior of a corresponding dam in the certain period; one way of scheduling is to represent the scheduling behavior over a certain number of cycles as a sequentially concatenated vector.
Fig. 1 is a flowchart illustrating a method for optimized scheduling of a step dam according to an exemplary embodiment. Referring to fig. 1, an optimized scheduling method for a step dam of the present embodiment includes the following steps:
s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology.
The basic characteristics of the cascade dams in the river basin comprise the data of the storage capacity, the dead water level, the highest water level, the siltation amount and the like of each dam; static characteristic values such as upstream and downstream relations between dams, highest water level line, silt characteristics and the like; the characteristic value of the dam and historical data of river hydrological data, such as annual average runoff, normal dam water consumption, average sand transportation and the like.
And S2, acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time.
The environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, the irrigation demand, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and internet power, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam, warehousing flow data and the like. Data such as crop future price, weather statistical information and the like can be acquired from relevant websites in real time according to internet crawlers, data of the running water level and the warehousing flow of each dam can be acquired from a monitoring site of the dam, and emission data of various industrial pollutants can be acquired from a data exchange platform between environmental protection bureaus.
And S3, setting scheduling parameters and constraint conditions.
The scheduling parameters include a time span and a minimum scheduling period of the scheduling scheme, such as time span L =15 days, and the minimum scheduling period is 1 day; the constraint conditions comprise the highest water level, the maximum discharge flow, the maximum sediment deposition and the like of each dam; the constraint conditions also comprise the environmental requirements of each dam and the maximum content limit values of various pollutants, such as the environmental protection limit values of pollutants such as sulfate, chloride, total ammonia nitrogen, total phosphorus and the like in river water.
And S4, setting the learning rate, the greedy degree and the reward decrement value parameters of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning by eta, epsilon and gamma.
The learning rate, the greedy degree, the reward decrement value and other parameters of the deep reinforcement learning are set, the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma, the three parameters are not more than 1.0, the optimal parameters are obtained according to experiments and are respectively set to be eta =0.3, epsilon =0.5 and gamma =0.8, so that the parameters are set to be the optimal parameters in default, and a user can modify related parameters according to actual conditions. The parameters set here are the initial usual parameters, but the Q function used later will use these several parameters.
And S5, calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning.
Fig. 2 is a flowchart illustrating step S5 of a method for optimized scheduling of a step dam according to an exemplary embodiment. Referring to fig. 2, step S5 specifically includes the following steps:
step S51, a playback memory, a prediction network and a learning network are constructed and initialized, network parameters are set to be the same, and functions Q and Q' represent prediction functions corresponding to two neural networks.
And constructing and initializing a playback memory D, wherein the size of the playback memory D can be set to 1000, and the playback memory D is used for recording the latest 1000 suboptimal scheduling results. The scheduling algorithm comprises two deep learning neural networks and a playback memory, wherein one is a prediction network Target _ Q for predicting the profit of the scheduling behavior and is used for representing the expected profit of the scheduling behavior in a scheduling state, and the other is a learning network Main _ Net for performing online learning of the scheduling scheme, online training is performed in the playback memory D according to scheduling historical data, and parameters are periodically copied into the prediction network Target _ Q. The Target _ Q network and the Main _ Net network are initialized and the network structure and the grid parameters are set to be the same, for example, the Target _ Q network and the Main _ Net network can be set to be a full-connection BP neural network with 4 layers. Setting maximum number of search rounds M =100000, setting reinforcement learning network updateThe period C =100 times, and the optimal scheduling scheme s is set best Empty, bestP = - ∞, initial T =1. And performing online training in a playback memory according to scheduling historical data, and periodically copying the parameters into a prediction network Target _ Q, wherein prediction functions corresponding to the two neural networks are Q and Q' respectively. Playback of data stored in memory D as<s j ,a j ,r j ,s j+1 >Respectively representing the current running state of the dam group, the scheduling behavior of the dam group in the jth period, and the scheduling behavior a of the dam group in the jth period j The current gain obtained and the next operating state of the dam bank.
Searching an optimal scheduling scheme initial t =0 in each round, and searching an initial vector of a dam groupIn (3), the scheduling vector is set to be an m-dimensional all-0 vector in each period. Setting scheduling scheme of dam group to vector s of all zeros 0 The current state is s t 。
And S53, exploring the scheduling scheme, acquiring scheduling scheme tuple related data of the dam group according to the environment state acquired in S2 and the constraint condition acquired in S3, and storing the acquired tuple in a playback memory.
Specifically, step S53 specifically includes:
step S531, the exploration scheduling scheme is that a feasible scheduling scheme of a dam group is randomly selected as a according to the probability epsilon t Or with a probability of 1-epsilon, the scheme is chosen such that the desired Target _ Q network selection is the largest, i.e. the desired maximum
The function phi (),coding function of dam bank status and scheduling behavior, respectively, then phi(s) t ) Represents a state s t Corresponding code->And coding corresponding to the scheduling behavior of the t-th period. Therefore, the corresponding input of the Target _ Q network and the Main _ Net network is the code phi(s) of the dam group state t ) And &>The output adopts the scheduling scheme a in the state t The maximum expected profit value obtained.
Step S532, according to the environment state and the constraint condition, the profit r in the current scheduling period is calculated t As the instant income, and the current running state of the dam group is updated to a new state s t+1 And the resulting tuples<s t ,a t ,r t ,s t+1 >And storing the data into a playback memory D.
The tuple in the playback memory D is a quadruple<s t ,a t ,r t ,s t+1 >Wherein s is t And s t+1 Respectively represents the initial state of the scheduling period of the t-th scheduling period t +1 of the dam group, a t For scheduling actions performed on the cycle, r t A benefit value for the scheduling action over the scheduling period.
And S54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function.
Specifically, step S54 specifically includes:
in step S541, a batch of tuples are randomly selected from the playback memory to generate samples.
Step S542, selecting training samples according to the probability of the absolute value of the difference between the samples and the prediction network.
Selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network, and randomly selecting a group of tuples in a playback memory D to generate the training sample, wherein a certain tuple is<s j ,a j ,r j ,s j+1 >For each state s j Expected profit of j Then the function Q corresponding to the Target _ Q network can be used to estimate:
in step S543, the parameters in the learning network are updated according to the loss function.
Then followFunction to update parameter ≥ in Main _ Net network>Wherein Q' is a corresponding benefit prediction function, φ and +, for the network>Is a coding function of the scheduling scheme and the scheduling behavior.
And step S55, updating the parameters in the learning network into the prediction network when the times of exploring the scheduling scheme reach a specific number.
In step S551, if the value of the variable t is less than or equal to the length period L of the scheduling scheme, the variable t is incremented, i.e. t = t +1, and the process returns to step S53 to continue.
In step S552, if the profit value of the current scheduling scheme is greater than the best profit value bestP, the best profit value bestP is updated to the profit value of the scheduling scheme, and the best scheduling scheme is updated.
In step S553, a loop function is set, and if the loop variable of the loop function satisfies the first condition, the parameter of the learning network is updated to the parameter of the prediction network. Setting a cycle T = T +1, and if a first condition T mod C is met and is zero, learning parameters of the network Main _ Net networkAnd updating to the network parameter theta of the prediction network Target _ Q, wherein C is the updating period of the reinforcement learning network.
Step S554, if the loop variable satisfies the second condition, jumping to step S52 to continue execution; and if the circulation variable does not meet the second condition, jumping out of the circulation and carrying out the next step of operation. If the second condition T is satisfied and is less than the maximum number M of exploration rounds, jumping to step S52 to continue execution; and if the second condition T is not met and is smaller than the maximum exploration round number M, jumping out of the loop and carrying out the next step of operation.
And S6, generating and outputting an optimized scheduling scheme of the step dam.
By the method, the parameters of the Main _ Net network of the network can be learnedAnd updating the predicted network Target _ Q network parameter theta, summing the predicted network Target _ Q network parameter theta at the moment, generating an optimized scheduling scheme of the step dam and outputting the optimized scheduling scheme.
In the invention, special neural network processing is used, not all updated Q (s, a) before are trained during each training, but a fixed-size training data pool, namely a queue, is used, the updated Q value is randomly inserted into the queue during each exploration, and random batch _ size (generally 64) data is taken as training data to update the neural network after each exploration is finished. In addition, a double-network structure is also used, namely a training network is separated from an evaluation network, so that the network structure during evaluation can be less influenced by relevant data.
The invention abstracts the scheduling optimization problem of the dam group into an initial scheduling scheme s 0 (state without any scheduling action), finding a path to reach the optimal scheduling scheme (optimal scheduling state) s best (ii) a Dam group current operation state use s t This is shown because the current state of the basin is determined and the state (water level, silt accumulation state, pollutant content, etc.) is only corresponding to the current stateThe scheduling behavior over the first t-1 cycles is relevant; the dispatching line of the dam group in the t-th period is marked as a t (ii) a Scheduling revenue function r t =f(s t ,a t ) Scheduling activity a for dam group in t period t The obtained profit amount has profit values of [ - ∞, + ∞ [ ]]The specific revenue function is related to factors such as agricultural production, industrial activities, market conditions, and weather conditions associated with the dam bank during the period.
Besides the optimal scheduling method for the cascade dam, the invention also provides an optimal scheduling system for the cascade dam. As shown in fig. 3, the present system includes a history data acquisition unit 10, a real-time data acquisition unit 20, a schedule status receiving unit 30, a deep learning setting unit 40, a schedule scheme calculation unit 50, and a schedule scheme output unit 60.
And the historical data acquisition unit 10 is used for acquiring the historical data of each dam basic characteristic, the static characteristic value of the dam and the river hydrology.
And the real-time data acquisition unit 20 is used for acquiring the environmental state of the step dam in the flow field related to the scheduling profit in real time.
A scheduling status receiving unit 30, configured to receive the scheduling parameter and the constraint condition set by the user.
The deep learning setting unit 40 is configured to set a learning rate, a greedy degree, and a reward decrement value parameter of the deep reinforcement learning, and represent the learning rate, the greedy degree, and the reward decrement value of the deep reinforcement learning by η, epsilon, and gamma, respectively.
And the scheduling scheme calculating unit 50 is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning.
And the scheduling scheme output unit 60 generates and outputs an optimized scheduling scheme of the step dam.
The relevant units provided in the system are used for executing the relevant instructions in the above described step dam optimization scheduling method, and are not described herein again because they have been described in detail above.
Meanwhile, the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the steps in the method are realized when the computer program is executed by a processor. Since the steps of the method have been described in detail, they are not described in detail herein.
In summary, the above description is only a detailed description of the preferred embodiments of the present invention, and not intended to limit the scope of the present invention. In practical applications, a person skilled in the art can make several modifications according to the technical solution. Any modification, equivalent replacement, partial application, etc. made on the basis of the principle set forth in the present invention shall be included in the scope of protection of the present invention.
Claims (5)
1. The optimal scheduling method for the step dam is characterized by comprising the following steps of:
s1, acquiring basic characteristics of each dam, static characteristic values of the dams and historical data of river hydrology;
the basic characteristics of the dam comprise the storage capacity, the dead water level, the highest water level and the pre-metering of each dam; the static characteristic values of the dam comprise upstream and downstream relations of the dam, the highest water level line and silt characteristics; historical data of river hydrology comprise annual average runoff, normal dam water consumption and average sand transportation;
s2, acquiring an environmental state related to scheduling benefits of the step dam in the drainage basin in real time;
the environment state comprises the downstream agricultural planting type of each dam, the cultivation area of each type of crops, the irrigation demand, the future price of various crops, the price of chemical fertilizers on the market, the price of power generation and on-line electricity, the discharge amount of industrial pollutants, meteorological statistical information during dispatching, the running water level of each dam and warehousing flow data;
s3, receiving scheduling parameters and constraint conditions set by a user;
the scheduling parameters comprise a time span and a minimum scheduling period of a scheduling scheme; the constraint conditions comprise the highest water level of each dam, the maximum discharge flow, the maximum silt deposition amount, the environmental requirements of each dam and the highest content limit value of various pollutants;
s4, setting a learning rate, a greedy degree and an incentive decrement value parameter of the deep reinforcement learning, and respectively expressing the learning rate, the greedy degree and the incentive decrement value of the deep reinforcement learning by eta, epsilon and gamma;
s5, calculating an optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;
s51, constructing and initializing a playback memory, initializing a prediction network and a learning network, and setting network parameters thereof as the same parameters, wherein prediction functions corresponding to the prediction network and the learning network are Q and Q' respectively;
S53, exploring a scheduling scheme, obtaining scheduling scheme tuple related data of the dam group according to the environment state obtained in the S2 and the constraint condition obtained in the S3, and storing the obtained tuple in a playback memory;
s531, randomly selecting a feasible dam group scheduling scheme as alpha according to the probability epsilon t Or with a probability of 1-epsilon, the scheme alpha is chosen such that the desired learning network choice is the largest one t ;
S532, calculating the profit r in the current scheduling period according to the environment state and the constraint condition t As the instant income, and the current running state of the dam group is updated to a new state s t+1 And the resulting tuples<s t ,α t ,r t ,s t+1 >Storing the data into a playback memory;
s54, randomly selecting a batch of tuples from the playback memory to generate training samples, and updating parameters in the learning network through a loss function;
s55, when the times of exploring the scheduling scheme reach specific times, updating the parameters in the learning network into the prediction network;
and S6, generating and outputting an optimized scheduling scheme of the step dam.
2. The optimized scheduling method of a cascade dam as claimed in claim 1, wherein the step S54 specifically comprises:
s541, randomly selecting a batch of tuples in the playback memory to generate samples;
s542, selecting a training sample according to the probability of the absolute value of the difference between the sample and the prediction network;
and S543, updating the parameters in the learning network according to the loss function.
3. The optimized dispatching method for cascade dams of claim 1, wherein the step of S55 comprises:
s551, if the value of the variable is less than or equal to the length period of the scheduling scheme, the variable is automatically increased, and the step S53 is returned to continue to be executed;
s552, if the profit value of the scheduling scheme is greater than the optimal profit value, updating the optimal profit value to the profit value of the scheduling scheme, and updating the optimal scheduling scheme;
s553, setting a circulation function, and if the circulation variable of the circulation function meets a first condition, updating the parameters of the learning network into the parameters of the prediction network;
s554, if the loop variable satisfies the second condition, jumping to step S52 to continue executing; and if the circulation variable does not meet the second condition, jumping out of circulation and carrying out the next step of operation.
4. A cascade dam optimal dispatch system based on the method of any one of claims 1-3, characterized in that the system comprises the following units:
the historical data acquisition unit is used for acquiring the basic characteristics of each dam, the static characteristic value of each dam and the historical data of river hydrology;
the real-time data acquisition unit is used for acquiring the environmental state of the cascade dam in the drainage basin related to the scheduling income in real time;
a scheduling state receiving unit, configured to receive a scheduling parameter and a constraint condition set by a user;
a deep learning setting unit for setting a learning rate, a greedy degree and a reward decrement value parameter of the deep reinforcement learning, wherein the learning rate, the greedy degree and the reward decrement value of the deep reinforcement learning are respectively expressed by eta, epsilon and gamma;
the scheduling scheme calculating unit is used for calculating the optimized scheduling scheme of the step dam by utilizing the reinforced deep learning;
and the scheduling scheme output unit generates and outputs an optimized scheduling scheme of the step dam.
5. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910803576.4A CN110533244B (en) | 2019-08-28 | 2019-08-28 | Optimal scheduling method and system for cascade dam and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910803576.4A CN110533244B (en) | 2019-08-28 | 2019-08-28 | Optimal scheduling method and system for cascade dam and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110533244A CN110533244A (en) | 2019-12-03 |
CN110533244B true CN110533244B (en) | 2023-04-18 |
Family
ID=68664901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910803576.4A Active CN110533244B (en) | 2019-08-28 | 2019-08-28 | Optimal scheduling method and system for cascade dam and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110533244B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114219236B (en) * | 2021-11-29 | 2022-10-04 | 长江三峡通航管理局 | Cascaded hub navigation joint scheduling method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3515038A1 (en) * | 2018-01-19 | 2019-07-24 | General Electric Company | Autonomous reconfigurable virtual sensing system for cyber-attack neutralization |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102258990B (en) * | 2011-05-17 | 2012-10-10 | 陕西科技大学 | Method for preparing light sewage treatment material |
US20130317957A1 (en) * | 2012-05-25 | 2013-11-28 | Teknon Corporation DBA Symmetry Software, Inc. | Location Based Determination of Payroll Tax Withholding |
CN105225017B (en) * | 2015-10-30 | 2019-02-26 | 南京南瑞集团公司 | A kind of GROUP OF HYDROPOWER STATIONS Short-term Optimal Operation method of multi-Agent |
CN105869070B (en) * | 2016-04-06 | 2020-09-11 | 大连理工大学 | Cooperative optimization scheduling method for balance of benefits of cross-basin cascade hydropower station group |
US10929743B2 (en) * | 2016-09-27 | 2021-02-23 | Disney Enterprises, Inc. | Learning to schedule control fragments for physics-based character simulation and robots using deep Q-learning |
CN106485366A (en) * | 2016-10-31 | 2017-03-08 | 武汉大学 | A kind of complexity Cascade Reservoirs retaining phase Optimization Scheduling |
CN106951985B (en) * | 2017-03-06 | 2021-06-25 | 河海大学 | Multi-objective optimal scheduling method for cascade reservoir based on improved artificial bee colony algorithm |
WO2019118460A1 (en) * | 2017-12-11 | 2019-06-20 | The Texas A&M University System | Irrigation system control with predictive water balance capabilities |
CN108647829A (en) * | 2018-05-16 | 2018-10-12 | 河海大学 | A kind of Hydropower Stations combined dispatching Rules extraction method based on random forest |
CN108966352B (en) * | 2018-07-06 | 2019-09-27 | 北京邮电大学 | Dynamic beam dispatching method based on depth enhancing study |
CN109347149B (en) * | 2018-09-20 | 2022-04-22 | 国网河南省电力公司电力科学研究院 | Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning |
-
2019
- 2019-08-28 CN CN201910803576.4A patent/CN110533244B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3515038A1 (en) * | 2018-01-19 | 2019-07-24 | General Electric Company | Autonomous reconfigurable virtual sensing system for cyber-attack neutralization |
Also Published As
Publication number | Publication date |
---|---|
CN110533244A (en) | 2019-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Baillieul et al. | Encyclopedia of systems and control | |
Hınçal et al. | Optimization of multireservoir systems by genetic algorithm | |
Wang et al. | An integrated power load point-interval forecasting system based on information entropy and multi-objective optimization | |
Nedić | Distributed optimization | |
CN106529732A (en) | Carbon emission efficiency prediction method based on neural network and random frontier analysis | |
Pannocchia | Distributed model predictive control | |
CN110533244B (en) | Optimal scheduling method and system for cascade dam and computer readable storage medium | |
CN117744501B (en) | Water network system regulation node optimal scheduling and decision-making method considering ecological flow | |
Choi et al. | Developing optimal reservoir rule curve for hydropower reservoir with an add-on water supply function using improved grey wolf optimizer | |
Kumar et al. | Environmentally sound short-term hydrothermal generation scheduling using intensified water cycle approach | |
Sørensen | Dynamic positioning control systems for ships and underwater vehicles | |
Lafortune | Diagnosis of discrete event systems | |
CN117494861A (en) | Water resource optimal allocation method for coordinating city and county two-stage water resource utilization targets | |
Tadokoro | Disaster response robot | |
Jothiprakash et al. | Comparison of policies derived from stochastic dynamic programming and genetic algorithm models | |
Castañón | Dynamic noncooperative games | |
Bar-Shalom et al. | Data association | |
Kawan | Data rate of nonlinear control systems and feedback entropy | |
Shim | Disturbance observers | |
Jeong et al. | Implementation of simplified sequential stochastic model predictive control for operation of hydropower system under uncertainty | |
Liu | Machine learning for wind power prediction | |
Ravazzi et al. | Dynamical social networks | |
Sela et al. | Distributed sensing for monitoring water distribution systems | |
Gokayaz et al. | From Probabilistic Seasonal Streamflow Forecasts to Optimal Reservoir Operations: A Stochastic Programming Approach | |
Cantoni et al. | Demand-driven automatic control of irrigation channels |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |