CN116703074A - Feedback enhancement-based intelligent power scheduling method, device, equipment and medium - Google Patents

Feedback enhancement-based intelligent power scheduling method, device, equipment and medium Download PDF

Info

Publication number
CN116703074A
CN116703074A CN202310622150.5A CN202310622150A CN116703074A CN 116703074 A CN116703074 A CN 116703074A CN 202310622150 A CN202310622150 A CN 202310622150A CN 116703074 A CN116703074 A CN 116703074A
Authority
CN
China
Prior art keywords
scheduling
power
grid system
power grid
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310622150.5A
Other languages
Chinese (zh)
Inventor
梁寿愚
何宇斌
李映辰
张坤
吴小刚
李文朝
胡荣
周华锋
江伟
顾慧杰
符秋稼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Co Ltd
Original Assignee
China Southern Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Southern Power Grid Co Ltd filed Critical China Southern Power Grid Co Ltd
Priority to CN202310622150.5A priority Critical patent/CN116703074A/en
Publication of CN116703074A publication Critical patent/CN116703074A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides an intelligent power scheduling method, device, equipment and medium based on feedback enhancement, relating to the technical field of reinforcement learning models, wherein the method comprises the following steps: acquiring a first state parameter of a power grid system, wherein the first state parameter is a state parameter of the power grid system at a first time step; inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively; performing power dispatching on the power grid system at a second time step based on the target dispatching behavior set, wherein the interval between the first time step and the second time step is a preset duration; the rewarding value adopted in the reinforcement learning model training process is obtained based on power grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process. The invention can realize intelligent automatic dispatching of the power grid system and reduce the labor cost of power grid dispatching.

Description

Feedback enhancement-based intelligent power scheduling method, device, equipment and medium
Technical Field
The invention relates to the technical field of reinforcement learning, in particular to an intelligent power scheduling method, device, equipment and medium based on feedback reinforcement.
Background
The power grid system comprises a large amount of equipment, including power generation equipment, electric equipment, power transformation equipment and the like, and the power consumption load often fluctuates, so that scheduling is needed to realize emission reduction and energy saving, income and safe and stable operation of the power grid. In the prior art, scheduling is often performed manually through experience, but due to fluctuation of power load and power output, the power scheduling requirement is very large, manual uninterrupted duty is required, timely scheduling is guaranteed, and labor cost is very high.
Disclosure of Invention
The invention provides an intelligent power scheduling method, device, equipment and medium based on feedback enhancement, which are used for solving the defect of high labor cost of power scheduling in the prior art, realizing intelligent automatic power scheduling and reducing labor cost.
The invention provides an intelligent power scheduling method based on feedback enhancement, which comprises the following steps:
acquiring a first state parameter of a power grid system, wherein the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises a power consumption demand parameter and a device operation parameter of the power grid system;
inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
performing power dispatching on the power grid system in a second time step based on the target dispatching behavior set, wherein the interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
According to the feedback enhancement-based intelligent power scheduling method provided by the invention, the power scheduling of the power grid system is performed in a second time step based on the target scheduling behavior set, and the method comprises the following steps:
when a selection instruction is not received, carrying out power dispatching on the power grid system in the second time step based on the dispatching behavior with the highest priority in the target dispatching behavior set;
and when a selection instruction is received, carrying out power dispatching on the power grid system in the second time step based on the dispatching behaviors corresponding to the selection instruction in the target dispatching behavior set.
According to the intelligent power scheduling method based on feedback enhancement provided by the invention, the training process of the reinforcement learning model comprises the following steps:
acquiring a first sample state parameter of the power grid system, wherein the first sample state parameter is a state parameter of the power grid system at a first sample time step;
inputting the first state parameters of the sample into the reinforcement learning model, and obtaining a sample training scheduling behavior set output by the reinforcement learning model;
performing power dispatching on the power grid system in a second time step based on the sample training dispatching behavior set to acquire power grid operation feedback data;
calculating a training reward value based on the grid operation feedback data;
updating the reinforcement learning model based on the training reward value to enable one training of the reinforcement learning model.
According to the feedback enhancement-based intelligent power scheduling method provided by the invention, the power scheduling of the power grid system is performed in a second time step based on the sample training scheduling behavior, and the method comprises the following steps:
acquiring a historical state parameter set of the power grid system, and determining a target historical state parameter in the historical state parameter set, wherein the similarity between the target historical state parameter and the first state parameter of the sample is larger than a first preset threshold;
acquiring a target historical scheduling behavior corresponding to the target historical state parameter;
and when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is larger than a second preset threshold, performing power scheduling on the power grid system in a second time step of the sample based on the scheduling behavior with the highest priority in the sample training scheduling behavior set.
According to the feedback-enhanced intelligent power scheduling method provided by the invention, after the target historical scheduling behavior corresponding to the target historical state parameter is obtained, the method further comprises the following steps:
and when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is not greater than the second preset threshold, not performing power scheduling on the power grid system in the second time step, or performing power scheduling on the power grid system in the second time step based on expert scheduling behaviors.
According to the intelligent power scheduling method based on feedback enhancement provided by the invention, the calculation of the training rewards value based on the power grid operation feedback data comprises the following steps:
acquiring an initial rewarding value according to the power grid operation feedback data and a preset rewarding function;
and when the initial reward value is lower than a third preset threshold value, performing reduction processing on the initial reward value to obtain the training reward value.
According to the feedback enhancement-based intelligent power scheduling method provided by the invention, after power scheduling is performed on the power grid system in the second time step based on the target scheduling behavior, the method further comprises the following steps:
acquiring each scheduling behavior set output by the reinforcement learning model in a preset time period;
obtaining scoring results for each scheduling behavior set;
optimizing the reinforcement learning model based on the scoring result.
The invention also provides an intelligent power scheduling device based on feedback enhancement, which comprises:
the system comprises a parameter acquisition module, a power supply module and a power supply module, wherein the parameter acquisition module is used for acquiring a first state parameter of a power grid system, the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises an electricity demand parameter and an equipment operation parameter of the power grid system;
the model processing module is used for inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
the scheduling module is used for performing power scheduling on the power grid system in a second time step based on the target scheduling behavior set, and the interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the intelligent power scheduling method based on feedback enhancement as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the feedback-enhancement-based intelligent power scheduling method as described in any of the above.
According to the feedback-enhanced intelligent power dispatching method, device, equipment and medium, the state parameters of the power grid system in the first time step are input into the trained reinforcement learning model, the power grid system is dispatched in the second time step based on the target dispatching behavior set output by the reinforcement learning model, and the reinforcement learning model adopts the rewarding value calculated by power grid operation feedback data based on energy consumption data, degradation income data and operation stability data in the training process, so that the trained reinforcement learning model can output dispatching behaviors capable of meeting power grid operation.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an intelligent power scheduling method based on feedback enhancement provided by the invention;
FIG. 2 is a schematic diagram of a feedback-enhanced intelligent power device according to the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The inventor finds that because the power load in the power grid system often has larger fluctuation, in the prior art, a dispatcher needs to take duty for a long time to ensure the safe and stable operation of the power grid system and realize the purposes of energy conservation, emission reduction, economic benefit generation and the like, which requires an electric company to hire more people with quite a scheduling experience and generates quite high labor cost. In order to solve the defect of high labor cost of power dispatching in the prior art, the invention realizes intelligent automatic power dispatching and reduces labor cost.
The feedback-enhancement-based intelligent power scheduling method provided by the invention is described below with reference to fig. 1, and as shown in fig. 1, the feedback-enhancement-based intelligent power scheduling method provided by the invention comprises the following steps:
s110, acquiring a first state parameter of a power grid system, wherein the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises an electricity demand parameter and an equipment operation parameter of the power grid system;
s120, inputting the first state parameters into a trained reinforcement learning model, and acquiring a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
and S130, carrying out power dispatching on the power grid system in a second time step based on the target dispatching behavior set, wherein the interval between the first time step and the second time step is a preset duration.
The equipment operation parameters of the power grid system are operation parameters of power equipment in the power grid system, such as voltage, current, frequency and the like of electric equipment and power generation equipment, and the power consumption demand parameters of the power grid system reflect power consumption demands of a power supply area corresponding to the power grid system, such as population data, economic data, industrial structure data and the like of the power supply area corresponding to the power grid system. According to the invention, the reinforcement learning model is adopted to realize automatic determination of power grid dispatching behavior, so that intelligent automatic dispatching of the power system is realized. Specifically, the first state parameter is input into a trained reinforcement learning model, the reinforcement learning model outputs the target scheduling behavior set, and the target scheduling behavior set includes at least one scheduling behavior.
When a selection instruction is not received, carrying out power dispatching on the power grid system in the second time step based on the dispatching behavior with the highest priority in the target dispatching behavior set;
and when the selection instruction is received, carrying out power dispatching on the power grid system in the second time step based on the dispatching behaviors corresponding to the selection instruction in the target dispatching behavior set.
Specifically, as described above, in the method provided by the invention, the dispatching personnel do not need to participate in dispatching of the power grid system all the time, but can participate in dispatching of the power grid system in a part of time period, when the dispatching personnel participate in dispatching of the power grid system, the dispatching personnel do not need to specifically perform dispatching behavior generation based on parameters of the power grid system, and the dispatching personnel only need to confirm and select one dispatching behavior in at least one dispatching behavior output by the reinforcement learning model, so that the workload of the dispatching personnel is reduced.
Further, although the reinforcement learning model may be trained to output the scheduling behavior suitable for the power grid system, in the method provided by the present invention, in order to further optimize the reinforcement learning model in real time according to the actual running situation of the power grid system, the scoring result of each scheduling behavior set output by the reinforcement learning model in a preset period is also counted, and the optimization of the reinforcement learning model is performed based on the scoring result, that is, after the power scheduling is performed on the power grid system in the second time step based on the target scheduling behavior, the feedback-enhanced intelligent power scheduling method provided by the present invention further includes:
acquiring each scheduling behavior set output by the reinforcement learning model in a preset time period;
obtaining scoring results for each scheduling behavior set;
and optimizing the reinforcement learning model based on the scoring result.
Specifically, the scoring result may include a plurality of scoring sets, each scoring set including a score of a scheduler for each set of scheduling actions output by the reinforcement learning model in the preset period, and optimizing the reinforcement learning model based on the scoring result may specifically include:
acquiring a source weight of each evaluation group in the evaluation results, wherein the source weight reflects the professional degree of a dispatcher who makes the evaluation group;
weighting each evaluation set in the evaluation result based on the source weight to obtain a total score of the reinforcement learning model in the preset time period;
optimizing the reinforcement learning model based on the total score.
And when the total score is lower than a preset score threshold, optimizing the reinforcement learning model, wherein the optimization can be training the reinforcement learning model again.
In order to enable the reinforcement learning model to output accurate scheduling behavior applicable to the power grid system, a training process of the reinforcement learning model in the present invention is described below.
The training process of the reinforcement learning model comprises the following steps:
acquiring a first sample state parameter of the power grid system, wherein the first sample state parameter is a state parameter of the power grid system at a first sample time step;
inputting the first state parameters of the sample into the reinforcement learning model, and obtaining a sample training scheduling behavior set output by the reinforcement learning model;
performing power dispatching on the power grid system in a second time step based on the sample training dispatching behavior set to acquire power grid operation feedback data;
calculating a training reward value based on the grid operation feedback data;
updating the reinforcement learning model based on the training reward value to enable one training of the reinforcement learning model.
The first state parameter of the sample is a state parameter of the power grid system in a first time step of the sample, the first time step of the sample is a time step in the training process of the reinforcement learning system, the first state parameter of the sample is input into the reinforcement learning model, the reinforcement learning model outputs a sample training scheduling behavior set, and the sample training scheduling behavior set comprises a plurality of scheduling behaviors and priorities corresponding to each scheduling behavior respectively. The performing power dispatching on the power grid system in a second time step based on the sample training dispatching behavior comprises the following steps:
acquiring a historical state parameter set of the power grid system, and determining a target historical state parameter in the historical state parameter set, wherein the similarity between the target historical state parameter and the first state parameter of the sample is larger than a first preset threshold;
acquiring a target historical scheduling behavior corresponding to the target historical state parameter;
and when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is larger than a second preset threshold, performing power scheduling on the power grid system in a second time step of the sample based on the scheduling behavior with the highest priority in the sample training scheduling behavior set.
In the conventional reinforcement learning, since actual actions are required to be performed based on the output of the model, and the reward values are calculated based on the influence of the actions, and the model is updated, the reinforcement learning does not require a large amount of labeling data as in the deep neural network learning, and is suitable for scenes with few labeling data. However, in the early stage of reinforcement learning, the model does not learn better actions, and may output worse actions, but in the power dispatching process, if the worse dispatching actions are adopted for dispatching, economic benefits may be reduced, even serious consequences such as insufficient regional power supply may be caused, and in order to avoid the consequences, in the method provided by the invention, after the sample training dispatching action set output in the training process of the reinforcement learning model is not directly based on the sample training dispatching action set, the dispatching action is not directly performed, but the feasibility is judged first, and only when the dispatching action with the highest priority in the sample training dispatching action set output by the reinforcement learning model has feasibility, the dispatching action with the highest priority in the sample training dispatching action set is executed, so that the larger loss on the operation of the power grid system is prevented.
Specifically, after the sample training scheduling behavior set output by the reinforcement learning model, firstly, a historical state parameter set of the power grid system is obtained, a target historical state parameter with the similarity of the first state parameter of the sample being greater than a first preset threshold value is determined, that is, whether a state similar to the state of the first time step of the sample exists in the running process of the power grid system is searched, when the state similar to the state of the first time step of the sample exists, a target historical scheduling behavior corresponding to the target historical state parameter is obtained, and the target historical scheduling behavior is the scheduling behavior adopted by the power grid system for the target historical state parameter in manual scheduling, that is, the target historical scheduling behavior has feasibility, so when the similarity of the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is greater than a second preset threshold value, the power grid system can be subjected to power scheduling at the second time step of the sample based on the scheduling behavior with the highest priority in the sample training scheduling behavior set. And when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is not greater than the second preset threshold, not performing power scheduling on the power grid system in the second time step, or performing power scheduling on the power grid system in the second time step based on expert scheduling behaviors, wherein the expert scheduling behaviors are scheduling behaviors made by professional schedulers aiming at the first state parameters of the samples, so that training of the reinforcement learning model is achieved, and the power grid system is ensured not to generate larger loss in the training process of the reinforcement learning model, and the safe operation of the power grid system is ensured.
The calculating training reward value based on the grid operation feedback data includes:
acquiring an initial rewarding value according to the power grid operation feedback data and a preset rewarding function;
and when the initial reward value is lower than a third preset threshold value, performing reduction processing on the initial reward value to obtain the training reward value.
Specifically, the obtaining the initial prize value according to the power grid operation feedback data and a preset prize function includes:
when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is larger than the second preset threshold, determining the power grid operation feedback data according to the actual operation condition of the power grid, and inputting the power grid operation feedback data into a preset rewarding function to obtain the initial rewarding value.
The power grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the power grid system, wherein the energy consumption data, the economic benefit data and the operation stability data can be obtained through actual collection, namely, after the power grid system is subjected to power dispatching in the second time step based on the dispatching behavior with the highest priority in the target dispatching behavior set, the power grid operation feedback data is determined through the actual collected data.
And when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is not greater than the second preset threshold, namely, the power grid system is not subjected to power scheduling in the second time step or is subjected to power scheduling in the second time step based on expert scheduling behaviors, directly setting the power grid operation feedback data to be null, determining the initial rewarding value to be a first preset rewarding value, wherein the first preset rewarding value is 0 or a negative number. That is, when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is not greater than the preset threshold, the result output by the reinforcement learning model is inaccurate, and a smaller reward value is set at this time, so that the reinforcement learning model can adjust the optimization direction and perform optimization in the direction capable of outputting accurate scheduling behaviors.
Further, when the initial reward value is lower than a third preset threshold, indicating that the effect of the scheduling behavior output by the reinforcement learning model is not high, performing a narrowing process on the initial reward value, for example, subtracting a value or multiplying a negative multiple, to obtain the training reward value, that is, reducing the reward of the scheduling behavior with poor effect, so that the reinforcement learning model can avoid optimizing towards the direction. It should be noted that, the prize values calculated based on the prize function provided by the present invention are all positive numbers, and the initial prize value being lower than the third preset threshold value refers to a case where the initial prize value is not the first preset prize value and is lower than the third preset threshold value, that is, although the first preset prize value is also lower than the third preset threshold value, when the initial prize value is the first preset prize value, the initial prize value is not subjected to the shrinking process.
Further, when the target historical state parameter does not exist in the historical state parameter set, that is, when the state parameter with the similarity to the first state parameter of the sample being greater than the first preset threshold value does not exist in the historical state parameter set, the training degree of the reinforcement learning model is distinguished first, when the training process of the reinforcement learning model is still in an initial stage, the reliability of the action output by the reinforcement learning model is lower, and when the training process of the reinforcement learning model is in a later stage, the reliability of the action output by the reinforcement learning model is higher. And if the training reward values corresponding to the scheduling behavior sets output for the previous n times are higher than a fourth preset threshold, and the variance between the training reward values corresponding to the scheduling behavior sets output for the previous n times is lower than a fifth preset threshold, scheduling the power grid system in the second time step based on the scheduling behavior with the highest priority in the target scheduling behavior set. If the training reward values corresponding to the previous n times of output scheduling behavior sets are not higher than the fourth preset threshold, or the variance between the training reward values corresponding to the previous n times of output scheduling behavior sets is lower than the fifth preset threshold, power scheduling is not performed on the power grid system in the second time step, or power scheduling is performed on the power grid system in the second time step based on expert scheduling behaviors.
The feedback-enhanced intelligent power scheduling apparatus provided by the invention is described below, and the feedback-enhanced intelligent power scheduling apparatus described below and the feedback-enhanced intelligent power scheduling method described above can be referred to correspondingly. As shown in fig. 2, the feedback enhancement-based intelligent power scheduling apparatus provided by the present invention includes:
the parameter obtaining module 210 is configured to obtain a first state parameter of a power grid system, where the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system includes an electricity demand parameter and an equipment operation parameter of the power grid system;
the model processing module 220 is configured to input the first state parameter into a trained reinforcement learning model, and obtain a target scheduling behavior set output by the reinforcement learning model, where the target scheduling behavior set includes at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
a scheduling module 230, configured to schedule power of the power grid system at a second time step based on the target scheduling behavior set, where an interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
Fig. 3 illustrates a physical schematic diagram of an electronic device, as shown in fig. 3, where the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a feedback-enhanced based intelligent power scheduling method comprising: acquiring a first state parameter of a power grid system, wherein the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises a power consumption demand parameter and a device operation parameter of the power grid system;
inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
performing power dispatching on the power grid system in a second time step based on the target dispatching behavior set, wherein the interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the feedback-enhanced intelligent power scheduling method provided by the above methods, the method comprising: acquiring a first state parameter of a power grid system, wherein the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises a power consumption demand parameter and a device operation parameter of the power grid system;
inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
performing power dispatching on the power grid system in a second time step based on the target dispatching behavior set, wherein the interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the feedback-enhanced intelligent power scheduling method provided by the above methods, the method comprising: acquiring a first state parameter of a power grid system, wherein the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises a power consumption demand parameter and a device operation parameter of the power grid system;
inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
performing power dispatching on the power grid system in a second time step based on the target dispatching behavior set, wherein the interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An intelligent power scheduling method based on feedback enhancement is characterized by comprising the following steps:
acquiring a first state parameter of a power grid system, wherein the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises a power consumption demand parameter and a device operation parameter of the power grid system;
inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
performing power dispatching on the power grid system in a second time step based on the target dispatching behavior set, wherein the interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
2. The feedback-enhancement-based intelligent power scheduling method of claim 1, wherein the power scheduling the grid system at a second time step based on the target set of scheduling actions comprises:
when a selection instruction is not received, carrying out power dispatching on the power grid system in the second time step based on the dispatching behavior with the highest priority in the target dispatching behavior set;
and when a selection instruction is received, carrying out power dispatching on the power grid system in the second time step based on the dispatching behaviors corresponding to the selection instruction in the target dispatching behavior set.
3. The feedback-enhanced based intelligent power scheduling method of claim 1, wherein the training process of the reinforcement learning model comprises:
acquiring a first sample state parameter of the power grid system, wherein the first sample state parameter is a state parameter of the power grid system at a first sample time step;
inputting the first state parameters of the sample into the reinforcement learning model, and obtaining a sample training scheduling behavior set output by the reinforcement learning model;
performing power dispatching on the power grid system in a second time step based on the sample training dispatching behavior set to acquire power grid operation feedback data;
calculating a training reward value based on the grid operation feedback data;
updating the reinforcement learning model based on the training reward value to enable one training of the reinforcement learning model.
4. The feedback-enhancement-based intelligent power dispatching method of claim 3, wherein the power dispatching the grid system at a sample second time step based on the sample training dispatching behavior comprises:
acquiring a historical state parameter set of the power grid system, and determining a target historical state parameter in the historical state parameter set, wherein the similarity between the target historical state parameter and the first state parameter of the sample is larger than a first preset threshold;
acquiring a target historical scheduling behavior corresponding to the target historical state parameter;
and when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is larger than a second preset threshold, performing power scheduling on the power grid system in a second time step of the sample based on the scheduling behavior with the highest priority in the sample training scheduling behavior set.
5. The feedback-enhanced-based intelligent power scheduling method according to claim 4, wherein after the target historical scheduling behavior corresponding to the target historical state parameter is obtained, further comprising:
and when the similarity between the scheduling behavior with the highest priority in the sample training scheduling behavior set and the target historical scheduling behavior is not greater than the second preset threshold, not performing power scheduling on the power grid system in the second time step, or performing power scheduling on the power grid system in the second time step based on expert scheduling behaviors.
6. The feedback-enhancement-based intelligent power scheduling method of claim 4, wherein the calculating a training prize value based on the grid operation feedback data comprises:
acquiring an initial rewarding value according to the power grid operation feedback data and a preset rewarding function;
and when the initial reward value is lower than a third preset threshold value, performing reduction processing on the initial reward value to obtain the training reward value.
7. The feedback-enhanced intelligent power dispatching method of claim 1, wherein after power dispatching the grid system of a second time step based on the target dispatching behavior, further comprising:
acquiring each scheduling behavior set output by the reinforcement learning model in a preset time period;
obtaining scoring results for each scheduling behavior set;
optimizing the reinforcement learning model based on the scoring result.
8. An intelligent power scheduling apparatus based on feedback enhancement, comprising:
the system comprises a parameter acquisition module, a power supply module and a power supply module, wherein the parameter acquisition module is used for acquiring a first state parameter of a power grid system, the first state parameter is a state parameter of the power grid system at a first time step, and the state parameter of the power grid system comprises an electricity demand parameter and an equipment operation parameter of the power grid system;
the model processing module is used for inputting the first state parameters into a trained reinforcement learning model, and obtaining a target scheduling behavior set output by the reinforcement learning model, wherein the target scheduling behavior set comprises at least one scheduling behavior and priorities corresponding to the scheduling behaviors respectively;
the scheduling module is used for performing power scheduling on the power grid system in a second time step based on the target scheduling behavior set, and the interval between the first time step and the second time step is a preset duration;
the reward value adopted in the reinforcement learning model training process is obtained based on grid operation feedback data of scheduling behaviors output in the reinforcement learning model training process, and the grid operation feedback data at least comprises energy consumption data, economic benefit data and operation stability data of the grid system.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the feedback-enhanced intelligent power scheduling method of any one of claims 1 to 7 when the program is executed.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the feedback-enhanced intelligent power scheduling method of any one of claims 1 to 7.
CN202310622150.5A 2023-05-29 2023-05-29 Feedback enhancement-based intelligent power scheduling method, device, equipment and medium Pending CN116703074A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310622150.5A CN116703074A (en) 2023-05-29 2023-05-29 Feedback enhancement-based intelligent power scheduling method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310622150.5A CN116703074A (en) 2023-05-29 2023-05-29 Feedback enhancement-based intelligent power scheduling method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116703074A true CN116703074A (en) 2023-09-05

Family

ID=87824979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310622150.5A Pending CN116703074A (en) 2023-05-29 2023-05-29 Feedback enhancement-based intelligent power scheduling method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116703074A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910490B (en) * 2023-09-08 2023-12-15 山东建筑大学 Method, device, equipment and medium for adjusting environment of agricultural greenhouse

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910490B (en) * 2023-09-08 2023-12-15 山东建筑大学 Method, device, equipment and medium for adjusting environment of agricultural greenhouse

Similar Documents

Publication Publication Date Title
US11888316B2 (en) Method and system of predicting electric system load based on wavelet noise reduction and EMD-ARIMA
CN112329948B (en) Multi-agent strategy prediction method and device
CN110781969B (en) Air conditioner air volume control method, device and medium based on deep reinforcement learning
CN111061564A (en) Server capacity adjusting method and device and electronic equipment
CN116703009A (en) Operation reference information generation method of photovoltaic power generation energy storage system
CN107871157B (en) Data prediction method, system and related device based on BP and PSO
CN113627533B (en) Power equipment overhaul decision generation method based on reinforcement learning
CN113868953A (en) Multi-unit operation optimization method, device and system in industrial system and storage medium
CN116703074A (en) Feedback enhancement-based intelligent power scheduling method, device, equipment and medium
CN110826196B (en) Industrial equipment operation data processing method and device
CN112488404A (en) Multithreading efficient prediction method and system for large-scale power load of power distribution network
CN116113038A (en) Channel access and energy scheduling method and device based on deep reinforcement learning
CN114759579A (en) Power grid active power optimization control system, method and medium based on data driving
CN114219274A (en) Workshop scheduling method adapting to machine state based on deep reinforcement learning
CN113555876A (en) Line power flow regulation and control method and system based on artificial intelligence
CN117220286B (en) Risk assessment method, device and medium for water-wind-solar multi-energy complementary system
Saraiva et al. New advances in integrating fuzzy data in Monte Carlo simulation to evaluate reliability indices of composite power systems
CN117200201A (en) Method, system, equipment and storage medium for daily optimal scheduling of power grid
CN115878284A (en) Method, device, equipment and storage medium for predicting completion time of simulation task
CN116885694A (en) Emergency load decision method and decision model construction method and construction equipment thereof
CN116933453A (en) Base station energy-saving model training method and device, and base station energy-saving method and device
CN116388165A (en) Short-term electricity load prediction method based on LSTM-elastic Net model
CN114139791A (en) Wind generating set power prediction method, system, terminal and storage medium
CN117745419A (en) Automatic operation method, device, equipment and computer readable storage medium
CN116341727A (en) Wind power determination method and device based on probability prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination