WO2023134759A1 - Data processing method and apparatus - Google Patents

Data processing method and apparatus Download PDF

Info

Publication number
WO2023134759A1
WO2023134759A1 PCT/CN2023/072238 CN2023072238W WO2023134759A1 WO 2023134759 A1 WO2023134759 A1 WO 2023134759A1 CN 2023072238 W CN2023072238 W CN 2023072238W WO 2023134759 A1 WO2023134759 A1 WO 2023134759A1
Authority
WO
WIPO (PCT)
Prior art keywords
policy
scheduling
strategy
processed
initial
Prior art date
Application number
PCT/CN2023/072238
Other languages
French (fr)
Chinese (zh)
Inventor
杨超
钮孟洋
韩佳澦
杨程
辛焱
Original Assignee
阿里巴巴达摩院(杭州)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴达摩院(杭州)科技有限公司 filed Critical 阿里巴巴达摩院(杭州)科技有限公司
Publication of WO2023134759A1 publication Critical patent/WO2023134759A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Definitions

  • the embodiments of this specification relate to the technical field of energy regulation, and in particular to a data processing method.
  • the embodiment of this specification provides a data processing method.
  • One or more embodiments of this specification also relate to another data processing method, a data processing device, another data processing device, a computing device, a computer-readable storage medium, and a computer program to solve Technical defects existing in the prior art.
  • a data processing method including:
  • the current operating state data of the target energy system is adjusted based on the target dispatch strategy.
  • a data processing device including:
  • a data acquisition module configured to acquire current operating state data of the target energy system
  • the first strategy acquisition module is configured to input the current operating status data into an initial decision object to obtain an initial dispatch strategy of the target energy system
  • the second strategy acquisition module is configured to input the initial dispatch strategy into the target decision-making object, and obtain the target dispatch strategy of the target energy system;
  • An adjustment module configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.
  • another data processing method including:
  • the current operating state data of the target power system is adjusted based on the target dispatch strategy.
  • another data processing device including:
  • a data acquisition module configured to acquire current operating state data of the target power system
  • the first strategy acquisition module is configured to input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system
  • the second strategy acquisition module is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target power system;
  • An adjustment module configured to adjust the current operating state data of the target power system based on the target dispatch strategy.
  • a computing device including:
  • the memory is used to store computer-executable instructions
  • the processor is used to execute the computer-executable instructions.
  • the steps of the data processing method are implemented.
  • a computer-readable storage medium which stores computer-executable instructions, and implements the steps of the data processing method when the computer-executable instructions are executed by a processor.
  • a computer program is provided, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.
  • the data processing method provided in this specification includes: obtaining the current operating state data of the target energy system; inputting the current operating state data into the initial decision object to obtain the initial scheduling strategy of the target energy system; inputting the initial scheduling strategy into The target decision object is to obtain the target dispatch strategy of the target energy system; and adjust the current operation status data of the target energy system based on the target dispatch strategy.
  • the method processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target scheduling strategy of the target energy system, and quickly analyzes the current state of the target energy system through the target scheduling strategy
  • the operation status data is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.
  • Fig. 1 is a flow chart of a data processing method provided by an embodiment of this specification
  • Fig. 2 is a processing flow chart of a data processing method provided by an embodiment of this specification
  • Fig. 3 is a flowchart of another data processing method provided by an embodiment of this specification.
  • Fig. 4 is a schematic structural diagram of a data processing device provided by an embodiment of this specification.
  • Fig. 5 is a schematic structural diagram of another data processing device provided by an embodiment of this specification.
  • Fig. 6 is a structural block diagram of a computing device provided by an embodiment of this specification.
  • first, second, etc. may be used to describe various information in one or more embodiments of the present specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another.
  • the first may also be referred to as the second, and similarly, the second may also be referred to as the first without departing from the scope of one or more embodiments of the present specification.
  • the word "if” as used herein may be interpreted as “at” or "when” or "in response to a determination.”
  • Mathematical modeling establish the model of the problem sought by mathematical means, and solve it through the solver.
  • Reinforcement learning An artificial intelligence machine learning method that can learn action strategies only through rewards/punishments given by the environment.
  • Power flow calculation (PF, Power Flow): According to the given power grid structure, parameters and operating conditions of components such as generators and loads, the calculation of the steady-state operating state parameters of each part of the power system is determined.
  • Power flow calculation (OPF, Optimal Power Flow): Under the premise of satisfying the steady-state power flow and the operation constraints of other equipment components, determine the power of the generator and the node voltage amplitude, so that a certain performance index of the system (such as power generation cost or network loss ) to achieve excellence.
  • ACOPF A power flow calculation model.
  • DCOPF A power flow calculation model.
  • the three schemes provided in this manual are: power flow calculation scheme, unit combination scheduling scheme, and reinforcement learning-based scheduling scheme.
  • the first power flow calculation (OPF) scheme the scheme itself uses mathematical modeling methods to determine the power of the generator and the node voltage amplitude under the premise of satisfying the steady-state power flow and other equipment components, so that the system A certain performance index (such as power generation cost or network loss) is excellent. It only considers single-step decision-making, and the scheme can usually be divided into ACOPF and DCOPF.
  • ACOPF contains nonlinear functions in the optimization problem, it usually leads to a slow solution time. On a power system with a scale of 10,000 nodes, the solution time of ACOPF is about 10 minutes.
  • DCOPF simplifies and approximates the operation of the grid system on the basis of ACOPF, turning the mathematical model into a linear model, which can speed up the solution speed, and the speed increase is more than doubled, but if multi-step DCOPF is also considered, it is also time-sensitive slow.
  • the ACOPF model or DCOPF model uses the means of mathematical modeling to calculate the single-step power flow.
  • the advantage is that the decision-making results are safe and interpretable.
  • the disadvantage is that in a power grid system with a scale of 10,000 nodes, the estimated time consumption of ACOPF is about 10 minutes, and that of DCOPF is about 5 minutes.
  • the current decision is made from the perspective of optimization, and secondly, it cannot make discrete decisions such as switching on and off, otherwise the solution will be more time-consuming.
  • the second unit combination scheduling (SCUC, Security Constrained Unit Commitment) scheme determines the start-stop decision and output decision of the unit at each time based on the overall target excellence of multiple time steps, and is usually used as a daytime scheduling decision plan. Although decisions can be made one day in advance, the volatility of new energy makes it difficult for day-level decision-making to cope with real-time changes.
  • SCUC Security Constrained Unit Commitment
  • SCUC scheme has sufficient time for switching on and off and multi-step decision-making optimization, the volatility of new energy can only be more accurately modeled and dealt with during real-time/quasi-real-time scheduling.
  • Reinforcement learning can learn and optimize long-term goals in an off-line data-driven manner through the simulator. Its advantage is that through the reasonable design of the model, decisions can be made from the perspective of long-term profit maximization on issues such as unit combination, ramping, and N-1 power grid safety criteria. Online scheduling decisions can be made in seconds, while the disadvantages It is the result of the decision that may have potential safety hazards.
  • the third scheduling scheme based on reinforcement learning reinforcement learning itself is suitable for sequential decision-making scenarios, so it is often used to optimize long-term goals.
  • the decision-making response time of this scheme is very fast, usually at the second level.
  • reinforcement learning is based on data-driven learning, it is not good at dealing with problems with hard constraints.
  • the decision result of reinforcement learning may not satisfy the security constraints, which makes the scheme of power grid dispatching based on reinforcement learning have security risks.
  • a data processing method is provided.
  • This specification also relates to a data processing device, another data processing method, another data processing device, a computing device, and a computer-readable
  • the storage medium and a computer program are described in detail in the following embodiments.
  • Fig. 1 shows a flow chart of a data processing method provided according to an embodiment of this specification, which specifically includes the following steps.
  • Step 102 Obtain current operating state data of the target energy system.
  • the target energy system can be understood as a system capable of processing energy; when the data processing method provided in this specification is applied in different scenarios, the target energy system is also different.
  • the target energy system can be understood as a new energy power system, such as a solar power system, a wind power system, a hydropower system, and the like.
  • the target energy system can be understood as an oil extraction system.
  • the current operating state data can be understood as parameters such as the grid structure and operating state parameters of elements such as generators in the power system, and the load of the power system. Including but not limited to the switch state of the generator, the power generation efficiency of the generator, the load of the power system, etc.
  • the current operating state data can be the operating state parameters of the power grid structure and hydroelectric generating units and other components in the hydroelectric power generation system, and the power supply load of the hydroelectric power generation system. .
  • the current operating state data may be parameters such as the power grid structure and the operating state parameters of components such as wind power generating sets in the wind power generating system, and the power supply load of the wind power generating system.
  • the current operating status data can be understood as the operating status parameters of components such as pump valves and oil pipelines in the oil extraction system, including but not limited to the switch status of pump valves, oil pipelines, etc. line pressure etc.
  • the target energy system as an example of an electric power system to illustrate the data processing method provided in this specification, wherein the electric power system can be any new energy power system, such as a solar power generation system, a wind power generation system , hydroelectric power generation system, tidal power generation system, etc.
  • the electric power system can be any new energy power system, such as a solar power generation system, a wind power generation system , hydroelectric power generation system, tidal power generation system, etc.
  • the current operating state of the target energy system can be obtained.
  • the target energy system is an electric power system
  • the current operating state data is an operating state parameter of components such as a power grid structure and a generator in the electric power system.
  • this specification is applied to the data processing method in the scenario of adjusting the current operating state of the power system, which can obtain the current operating state parameters of the power system.
  • the current operating state parameters can be that the current power load of the power system is relatively high.
  • generator A is in an off state
  • generator B and generator C are in an on state.
  • the current operating state data of the electric power system can be obtained through any method of obtaining the current operating state data of the electric power system, and this specification does not specifically limit this. For example, through various sensors configured in the power system, the current operating status data of the power system can be determined.
  • the data processing method provided in this specification can be applied to a decision-making scheduling module that can adjust the current operating state data of the target energy system.
  • the decision-making scheduling module can obtain the target scheduling strategy through the initial decision object and the target decision object, and The current operating state data of the target energy system is adjusted based on the target scheduling strategy.
  • the decision scheduling module can be deployed in the target energy system, or can be independent of the target energy system.
  • the decision-making scheduling module can be a decision-making scheduling platform independent of the power system, a decision-making scheduling degree server, or the decision-making scheduling module may be a decision-making scheduling device, a decision-making scheduling server, etc. deployed in the power system. This specification does not specifically limit this.
  • Step 104 Input the current operating status data into an initial decision object to obtain an initial scheduling strategy for the target energy system.
  • the initial decision object can be understood as an object that can obtain the initial scheduling strategy of the target energy system based on the current operating state data.
  • the initial decision object can be a deep learning model, electronic equipment, or an application program.
  • the data processing method provided in this manual by taking the initial decision object as an example of a deep reinforcement learning model.
  • the deep learning model can be understood as any model that can obtain the initial scheduling strategy of the target energy system based on the current operating state data.
  • the initial scheduling strategy can be understood as a strategy for adjusting the current running state data of the target object.
  • the initial scheduling strategy may be a strategy for adjusting current operating state data of the electric power system.
  • the initial decision object is a deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the deep reinforcement learning model, and based on the deep reinforcement learning model, the current state of the power system can be judged based on the current operating state data, Thus generating a scheduling policy. For example, when the deep reinforcement learning model judges that the current power load of the power system is high based on the current operating status data, it can determine a dispatch strategy that can cope with the high power load of the power system.
  • the scheduling strategy output by the deep reinforcement learning model can be as follows: turn on generator A in the power system, set the generator A to generate the minimum quota of power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 90 megawatts.
  • the initial decision-making object may also include two sub-modules, namely a strategy determination module and a strategy evaluation module;
  • the current operating status data of the target energy system is processed to obtain the initial scheduling strategy of the target energy system, thereby improving the processing efficiency of subsequent target decision-making objects, and further enabling the target energy system to have large-scale processing and rapid response capabilities.
  • the specific implementation methods are as follows.
  • the inputting the current operating state data into the initial decision-making object to obtain the initial scheduling strategy of the target energy system includes steps 1042 to 1046:
  • Step 1042 Input the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed.
  • the policy determination module can be understood as a module in the initial decision object that can process the current running state data and obtain the scheduling policy to be processed.
  • the policy determination module may be a policy determination sub-model in the deep reinforcement learning model.
  • the current operating state data of the target energy system is input into the policy determination module of the initial decision object, and the current operating state data is processed by the policy determination module to obtain the pending scheduling strategy of the target energy system.
  • the policy determination module is a policy determination sub-model in the deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the strategy determination sub-model in the deep reinforcement learning model, and the current power consumption load of the power system is relatively high based on the strategy determination sub-model. In the case of , a dispatching strategy that can cope with the power load of the power system is determined.
  • the scheduling strategy for determining the output of the sub-model in this strategy can be as follows: turn on generator A in the power system, set the generator A to generate electricity with a minimum quota (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of generator C to 100 megawatts.
  • Step 1044 The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed.
  • the policy evaluation module can be understood as a module in the initial decision object that can evaluate the scheduling strategy.
  • the policy evaluation module can be a policy evaluation in the deep reinforcement learning model submodel.
  • the policy evaluation result is understood as the result obtained by the policy evaluation module evaluating the scheduling policy to be processed; in practical applications, the policy evaluation result may be an evaluation score. For example, 1 point, 0.9 points, etc.
  • the policy evaluation module is a policy evaluation sub-model
  • the scheduling policy to be processed can be input into the policy evaluation sub-model to obtain an evaluation result corresponding to the scheduling policy to be processed.
  • the pending scheduling policy can be evaluated based on the policy evaluation module in the initial decision object, thereby obtaining the policy evaluation corresponding to the pending scheduling policy result.
  • the data processing method provided by the embodiment of this specification will fix the specific variables in the scheduling strategy to be processed , that is, setting specific variables in the pending scheduling policy to be non-tunable.
  • the subsequent target decision-making object only needs to process the variables that can be adjusted in the initial scheduling strategy, thereby improving the processing efficiency of the target decision-making object.
  • the specific variable can be understood as a variable that has a relatively large negative impact on the effect of the scheduling strategy.
  • the variables in the scheduling policy can also be adjusted, so as to obtain the pending scheduling policy after variable adjustment.
  • the scheduling strategy to be processed and the scheduling strategy to be processed after the variable adjustment can be distributed to evaluate the scheduling strategy to be processed through the policy evaluation sub-model, so that based on the evaluation result of the scheduling strategy to be processed , and the evaluation result of the scheduling strategy to be processed after variable adjustment, and determine the variable that has a greater negative impact on the effect of the scheduling strategy from the scheduling strategy to be processed. Furthermore, setting the variable as non-adjustable improves the processing efficiency of the subsequent target decision-making object, and avoids the problem that the effect of the scheduling strategy is poor due to the modification of the variable by the target decision-making object.
  • the policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed, including:
  • the decision parameter can be understood as a variable that can be adjusted in the dispatch strategy to be processed, for example, the decision variable can be the power generation (50 MW) set for generator A in the power system in the dispatch strategy.
  • the preset parameter modification rule can be understood as a rule for modifying the decision parameters in the scheduling strategy to be processed.
  • the decision variable is the power generation (50 megawatts) set by generator A
  • the preset parameter modification rule can be In order to reduce the power generation (50 MW) of generator A in the dispatch strategy by 10 MW.
  • the decision parameters in the scheduling strategy to be processed can be determined, and the decision parameters in the scheduling strategy to be processed can be determined based on the preset parameter modification rules. Adjust, so as to obtain the adjusted pending scheduling policy.
  • the pending scheduling strategy can be: set the generator A in the power system to perform minimum quota power generation (50 megawatts), Set the generating capacity of Generator B to 100 MW and the generating capacity of Generator C to 100 MW. Based on this, after the strategy determination sub-model obtains the dispatch strategy to be processed based on the current operating state data of the power system, it can determine the variable that can be adjusted in the dispatch strategy to be processed, and the variable can be the power generation of the generator.
  • the power generation of the generators in the dispatching strategy to be processed is adjusted, so as to obtain the dispatching strategy to be processed after the variable adjustment, and the dispatching strategy to be processed after the variable adjustment can be:
  • Generator A in the power system is set to generate the minimum quota (50 MW)
  • generator B is set to generate 100 MW
  • generator C is set to generate 80 MW.
  • the scheduling strategy to be processed and the scheduling strategy to be processed after the variable adjustment are respectively input into the decision evaluation sub-module for evaluation, so as to obtain the decision evaluation sub-module for the pending scheduling Policy evaluation result for the policy.
  • the adjusted scheduling strategy to be processed is obtained by adjusting the decision parameters in the scheduling strategy to be processed, and the scheduling strategy to be processed is input into the strategy evaluation module of the initial decision object to obtain the scheduling strategy to be processed
  • the evaluation result of the strategy is convenient for the subsequent generation of an initial scheduling strategy based on the evaluation result of the strategy, further improving the processing efficiency of the target decision object, and enabling the target energy system to have large-scale processing and rapid response capabilities.
  • the inputting the pending scheduling policy and the adjusted pending scheduling policy into the policy evaluation module of the initial decision object to obtain the policy evaluation result of the pending scheduling policy includes:
  • the first evaluation result can be understood as an evaluation result of the scheduling policy to be processed by the policy evaluation module.
  • the second evaluation result can be understood as the evaluation result of the adjusted scheduling strategy to be processed by the policy evaluation module.
  • the evaluation result can be an evaluation score, such as 1 point, 0.9 point and so on.
  • the scheduling strategy to be processed is input into the strategy evaluation module of the initial decision object, and the scheduling strategy to be processed is evaluated by the strategy evaluation module to obtain the first evaluation of the scheduling strategy to be processed result.
  • the scheduling strategy A to be processed after variable adjustment can be as follows: set generator A in the power system to produce the minimum quota power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 80 megawatts.
  • the scheduling strategy B to be processed after variable adjustment can be: set generator A in the power system to produce the minimum quota power generation (50 MW), set the power generation of generator B to 70 MW, set generator C’s The power generation is set to 100 MW.
  • the scheduling strategy to be processed, the scheduling strategy A to be processed after variable adjustment, and the scheduling strategy B to be processed after variable adjustment are respectively input into the decision evaluation sub-module for evaluation, so as to obtain the evaluation result;
  • the pending scheduling The evaluation result of the strategy can be 1 point;
  • the evaluation result of the pending scheduling strategy A after the variable adjustment can be 0.3 points, and the evaluation result of the pending scheduling strategy B after the variable adjustment can be 0.9 points.
  • the three evaluation results are used as the strategy evaluation results corresponding to the data to be processed.
  • the policy evaluation result of the scheduling policy to be processed is determined based on the first evaluation result of the scheduling policy to be processed and the second evaluation result of the scheduling policy to be processed. It is convenient to generate an initial scheduling strategy based on the evaluation result of the strategy, further improve the processing efficiency of the target decision object, and enable the target energy system to have large-scale processing and rapid response capabilities.
  • Step 1046 Determine an initial scheduling strategy based on the scheduling strategy to be processed and the corresponding strategy evaluation result.
  • the determining an initial scheduling strategy based on the scheduling strategy to be processed and the corresponding strategy evaluation result includes:
  • the first parameter can be understood as a decision variable that affects long-term effects.
  • the preset determination condition can be set according to the actual application scenario, for example, the preset determination condition can be to determine a discrete decision variable; the second parameter can be understood as a discrete decision determined from the scheduling strategy to be processed based on the preset determination condition
  • Variables fixed parameters can be understood as parameters that cannot be modified or adjusted.
  • the evaluation result of the policy includes that the evaluation result of the scheduling policy to be processed can be 1 point; the evaluation result of the scheduling policy A to be processed after variable adjustment can be 0.3 points, and the scheduling policy B to be processed after variable adjustment The evaluation result of can be 0.9 points.
  • the power generation of generator B 100 MW in the dispatching strategy to be processed is determined as the decision variable affecting the long-term effect through the evaluation result of the strategy.
  • the power generation capacity 50 megawatts
  • the decision variables and discrete decision variables that affect the long-term effect in the pending scheduling strategy are set as fixed parameters, that is, decision variables that cannot be modified and adjusted, so as to obtain the initial scheduling strategy.
  • the initial scheduling strategy is obtained by setting the first parameter determined based on the strategy evaluation result and the second parameter determined based on the preset determination condition in the scheduling strategy to be processed as fixed parameters, thereby reducing the work of the subsequent target decision-making object increase the processing speed of the target decision object.
  • Step 106 Input the initial dispatch strategy into the target decision object to obtain the target dispatch strategy of the target energy system.
  • the initial decision object can very quickly generate a scheduling strategy based on the current operating state data of the target energy system to adjust the current operating state data of the target energy system.
  • the initial decision object may not be good at dealing with Problems with hard constraints, where the hard constraints can be designed according to actual application scenarios, for example, when the target energy system is a power system, the hard constraints can be that the lines in the power system cannot exceed the limit situation, or the situation that the voltage in the grid cannot exceed the preset voltage threshold.
  • the data processing method provided in this manual after the initial decision-making object generates an initial scheduling strategy based on the current operating status data, inputs the initial scheduling strategy into the target decision-making object, and adjusts the initial scheduling strategy through the target decision-making object to avoid The situation that the target energy system violates the hard constraints occurs because of the adjustment strategy.
  • the target decision-making object can be understood as an object that can obtain the target energy system's target scheduling strategy based on the initial scheduling strategy.
  • the target decision-making object can be a mathematical model, an electronic device, or an application program sequence etc.
  • the data processing method provided in this specification is described below by taking the initial decision object as an example of a mathematical model.
  • the specific variables (discrete decision variables and decision variables affecting long-term effects) in the initial scheduling strategy have been fixed.
  • the initial scheduling strategy is input into the mathematical model, and the unfixed decision variables in the initial scheduling strategy are adjusted and modified through the mathematical model to carry out further single-step optimization and guarantee of physical safety constraints, so as to obtain the target Scheduling strategy.
  • generator A can be turned on, and the generator A can be set to generate the minimum quota (50 MW), the power generation of generator B can be set to 100 MW, and the power generation of generator C can be set to 100 case of megawatts.
  • the power generation capacity of the generator A (50 megawatts) and the power generation capacity of the generator C (100 megawatts) have been fixed.
  • the initial dispatch strategy is input into the mathematical model, and the power generation of generator B (100 MW) in the initial dispatch strategy is adjusted through the mathematical model, so as to obtain the target dispatch strategy.
  • the target scheduling strategy can be: generator A is turned on, and the generator A is set to generate a minimum quota of power generation (50 megawatts), the power generation of generator B is set to 80 megawatts, and the power generation of generator C is set to 100 megawatts.
  • Step 108 Adjust the current operating state data of the target energy system based on the target dispatch strategy.
  • the target scheduling strategy is: Generator A is turned on, and the generator A is set to generate electricity with a minimum quota (50 MW), the power generation of generator B is set to 80 MW, and the power generation of generator C is set to 80 MW. The case where the amount is set to 100 MW.
  • the target scheduling strategy turn on generator A in the power system, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to The amount is set to 100 MW.
  • the current operating state data of the target energy system is processed through the initial decision object and the target decision object, and the target energy dispatching strategy of the target energy system is obtained, and the target energy resource is quickly obtained through the target scheduling strategy.
  • the current operating status data of the system is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.
  • the current target dispatch strategy and the current operating state data of the power system can be saved.
  • the saved target scheduling strategy is used as an initial value, so as to quickly generate a new scheduling strategy based on the initial value, thereby improving the efficiency of generating the scheduling strategy.
  • the deep learning reinforcement model can refer to the dispatch strategy saved in history, through the dispatch strategy saved in history and the current operating status data of the power system , to obtain the dispatch strategy for the power system.
  • the dispatching strategy can also be sent to the policy update object, and based on the policy update object based on the current power system
  • the demand condition modifies the dispatching strategy, and the modified dispatching strategy is input into the mathematical model to generate the target dispatching strategy, so as to improve the adaptability of the dispatching strategy and the power system.
  • the inputting the current operating state data into the initial decision-making object to obtain the initial scheduling strategy of the target energy system includes:
  • the historical scheduling policy is a historical target scheduling policy obtained based on a target decision object
  • An initial scheduling policy sent by the policy update object is received, wherein the initial scheduling policy is obtained by updating the scheduling policy to be updated based on preset update conditions by the policy update object.
  • the historical operation status data can be understood as the operation status data of the target energy system acquired and saved in history.
  • the historical scheduling strategy can be understood as a target scheduling strategy for historical preservation.
  • the similarity can be understood as a numerical value indicating the similarity between the historical operating state data and the current operating state data. For example, any value in the interval [0, 1] or [0, 100].
  • Similar operating state data can be understood as historical operating state data that is most similar to current operating state data.
  • the scheduling strategy to be updated can be understood as a scheduling strategy that needs to be updated by a policy update object.
  • the policy update object can be understood as an object that can update the scheduling strategy to be updated based on preset update conditions.
  • the policy update object can be understood as a power system
  • the preset update condition can be understood as the demand for the power system, for example, for the power supply load demand of the power system, for the power load demand of the power system, for the voltage demand of the transmission line of the power system, for specific equipment in the power system on/off requirements, etc. In actual application, this requirement can be set according to the actual application scenario, and this specification does not make specific restrictions on it.
  • the similarity method can be obtained by tools such as neural network models, programs, or robots.
  • the maximum similarity is determined; and the historical operating state data corresponding to the maximum degree of acquaintance is used as similar operating state data to the current operating state data.
  • the policy update object can update the scheduling policy to be updated based on preset update conditions, obtain an updated scheduling policy, and use the updated scheduling policy as an initial scheduling policy.
  • the historical scheduling strategy is a deep reinforcement learning model and a data model, and a scheduling strategy generated based on the similar running state data.
  • the historical scheduling strategy is used as the reference value of the deep reinforcement learning model, and the scheduling strategy for the power system is obtained by inputting the historical scheduling strategy and the current operating state data of the power system into the deep reinforcement learning model.
  • the scheduling strategy is modified to obtain the modified scheduling strategy, and the modified scheduling strategy is sent to the decision-making scheduling module as a scheduling strategy that needs to be processed again by the mathematical model .
  • the decision-making scheduling can receive the scheduling strategy returned by the operation and maintenance personnel of the power system, and then the scheduling strategy can be input into the mathematical model for processing.
  • the historical operating state data corresponding to the maximum similarity between the historical operating state data and the current operating state data is determined as the similar operating state data of the current operating state data; and the similar operating state data is corresponding to
  • the historical scheduling strategy and the current operating status data are input to the initial decision object, and the scheduling strategy to be updated of the target energy system is quickly obtained; the efficiency of generating the scheduling strategy is improved.
  • the initial decision object needs to be generated, so as to achieve the goal of obtaining the target energy system based on the initial decision object and the target decision object Scheduling strategy, the specific implementation is as follows.
  • steps 1 to 3 are also included:
  • Step 1 Determine the simulated running state data based on the state simulation module.
  • the state simulation module can be understood as a module capable of modeling the current operating state data of the target energy system, for example, the state simulation module can be a simulator.
  • the simulated operating state data is the operating state data simulated by the state simulation module.
  • the state simulation module may be the operating state data of the power system simulated by the simulator .
  • the state simulation module may be the operating state data of the oil extraction system simulated based on a simulator.
  • the process of obtaining the initial decision object can be understood as the process of training the deep reinforcement learning model, so as to obtain the trained deep reinforcement learning model.
  • the state-based simulation module determines the simulated running state data, including:
  • Sample operating state data is determined based on the state simulation module.
  • the simulated running state data of the state simulation module is used as sample running state data for training the model.
  • Step 2 Input the simulation operation state data into the decision object to be processed to obtain a simulation scheduling strategy.
  • the decision object to be processed can be understood as a deep reinforcement learning model to be trained.
  • the simulation scheduling strategy can be understood as the scheduling strategy obtained after the deep reinforcement learning model calculates the simulation running status data.
  • the decision object to be processed may include a strategy determination module. Based on this, the simulation scheduling strategy can be understood as a scheduling strategy generated by the strategy determination module based on the simulated running status data.
  • the input of the simulated running status data into the decision object to be processed to obtain a simulated scheduling strategy includes:
  • the sample running status data is input into the decision-making model to be processed to obtain a simulation scheduling strategy.
  • the scheduling policy output by the deep reinforcement learning model is obtained.
  • the deep reinforcement learning model includes a policy determination sub-model and a policy evaluation sub-model. Based on this, inputting the simulated operation status data into the decision-making object to be processed to obtain the simulation scheduling strategy can also be understood as The simulated running status data is input into the strategy determination sub-model of the decision object to be processed, and the simulation scheduling strategy is calculated through the strategy determination sub-model to obtain the simulation scheduling strategy.
  • Step 3 Process the decision object to be processed based on the simulated scheduling strategy to obtain an initial decision object.
  • the initial decision object in the case that the initial decision object is a deep reinforcement learning model, the initial decision object can be understood as a trained deep reinforcement learning model.
  • the processing of the decision object to be processed based on the simulation scheduling strategy to obtain an initial decision object includes:
  • the decision model to be processed is trained based on the simulated scheduling strategy to obtain an initial decision model.
  • the deep reinforcement learning model is trained through the simulation scheduling policy until the training completion condition is met.
  • the processing of the decision object to be processed based on the simulation scheduling strategy to obtain an initial decision object includes:
  • the decision object to be processed is processed to obtain an initial decision object.
  • the simulation strategy evaluation result can be understood as the evaluation result of the simulation scheduling strategy by the state simulation module, for example, the simulation strategy evaluation result can be an evaluation score.
  • the simulated scheduling policy is input into the simulator.
  • the simulated scheduling strategy can be evaluated through the simulator, so as to obtain the simulated strategy evaluation result of the simulated scheduling strategy.
  • the evaluation result of the simulation strategy is the evaluation score.
  • the simulator continues to generate new simulated running status data, and continues to train the deep reinforcement learning model through the simulated running status data, and repeats this operation until the deep reinforced learning model reaches Training stop condition.
  • the training stop condition can be determined based on the simulation strategy evaluation result; when it is determined that the simulation strategy evaluation result satisfies the training stop condition, it can be determined that the training of the deep reinforcement learning model has been completed.
  • the evaluation result of the simulated strategy can be any score in the interval [0,1], where 0 means that the effect of the simulated scheduling strategy is poor, and 1 means that the effect of the simulated scheduling strategy is better. Based on this, when the simulator evaluates the nearly 10 consecutive simulated scheduling strategies, if the nearly 10 consecutive simulated scheduling strategies are all 1 point, it means that the deep reinforcement learning model has reached the training stop condition.
  • the deep reinforcement learning model can include two sub-models, one is the policy determination sub-model, and the other is the policy evaluation model. Therefore, the process of the deep reinforcement learning model can be understood as determining the sub-model for the strategy and the strategy In the training process of the evaluation model, the trained deep reinforcement learning model is determined based on the trained policy determination sub-model and the policy evaluation model.
  • the specific implementation method is as follows.
  • the processing of the decision object to be processed based on the evaluation result of the simulation strategy and the simulation scheduling strategy to obtain an initial decision object includes:
  • An initial decision object is determined based on the processed policy determination module and the processed policy evaluation module.
  • the policy determination module may be a policy determination sub-model
  • the policy evaluation module may be a policy evaluation sub-model
  • the policy determination sub-model outputs the simulation scheduling policy, it inputs the simulation scheduling policy into the simulator.
  • the simulated scheduling policy can be evaluated through the simulator, so as to obtain the policy evaluation result of the simulated scheduling policy.
  • the simulator After obtaining the strategy evaluation result of the simulated scheduling strategy, the simulator continues to generate new simulated operating state data, and continues to train the policy determination sub-model through the new simulated operating state data, and repeats this operation. Until the strategy determines that the sub-model reaches the training stop condition.
  • the training stop condition may be determined based on the strategy evaluation result; if it is determined that the strategy evaluation result satisfies the training stop condition, then it may be determined that the strategy determines that the sub-model has completed training.
  • the policy evaluation result can be any score in the interval [0,1], where 0 means that the effect of the simulated scheduling strategy is poor, and 1 means that the effect of the simulated scheduling strategy is better. Based on this, when the simulator evaluates the nearly 10 consecutive simulated scheduling strategies, if the nearly 10 consecutive simulated scheduling strategies are all 1 point, it means that the strategy determines the sub-model to reach the training stop condition.
  • the simulated scheduling policy output by the policy determination sub-model can be used as sample data, and the simulated policy evaluation result of the simulated scheduling policy can be used as a sample label, and the policy evaluation sub-model can be evaluated by sample data and sample labels.
  • the model is trained until the policy evaluation sub-model reaches convergence, so it is determined that the policy evaluation sub-model meets the training stop condition.
  • the simulated operation status data determined by the state simulation module is input into the decision object to be processed to obtain the simulation scheduling strategy; and the decision object to be processed is processed based on the simulation scheduling strategy to obtain the initial decision object.
  • the target scheduling strategy of the target energy system can be obtained based on the initial decision object and the target decision object, and the current operating state data of the target energy system can be quickly adjusted through the target scheduling strategy, so that the target energy system can be processed on a large scale. And the ability to respond quickly, alleviating the difficulties faced by manual scheduling solutions.
  • FIG. 2 shows a flowchart of a processing process of a data processing method provided by an embodiment of this specification.
  • the data processing method provided in this specification in the real-time dispatching scenario of the power system is divided into two parts: an offline training part, and an online real-time/quasi-real-time dispatching part.
  • the offline training part is to pre-train the deep reinforcement learning model (DRL) used in the scheduling process before the real-time scheduling of the power system, so as to realize the subsequent deep reinforcement module generation scheduling strategy completed through training.
  • the online real-time/quasi-real-time scheduling part after completing the training of the deep reinforcement learning model, through the combination of mathematical modeling and reinforcement learning, the ACOPF model or DCOPF model obtained through data modeling is combined with the deep reinforcement learning model. (DRL) are combined to generate dispatch decision results for the power system based on the real environmental data of the power system;
  • the data processing method provided in this specification in the real-time scheduling scenario of the power system first uses the emulator to perform offline training on the deep reinforcement learning model, as shown in the dotted box of "offline training" in Figure 2 . Specifically include the following steps.
  • Step 202 Based on the sample data provided by the simulator, train the action decision model in the deep reinforcement learning model.
  • the simulator can be understood as the state simulation module in the above embodiment;
  • the sample data can be understood as the simulated running state data in the above embodiment;
  • the deep reinforcement learning model can be understood as the initial decision object in the above embodiment;
  • action decision The model can be understood as the policy determination module in the above embodiments.
  • the deep reinforcement learning model in this embodiment itself has two sub-models, one is an action policy model (Actor), and the other is an action evaluation model (Critic).
  • the action policy model can be understood as the policy determination sub-model in the above embodiment; the action evaluation model can be understood as the policy evaluation model in the above embodiment.
  • the simulator will simulate the current state St of the power system; use the simulated current state St as a training sample, and input the training sample into the action strategy model to be trained to train the action decision model, the After receiving the current state St, the action decision-making model to be trained can respond to the action strategy, thereby outputting the action At, wherein the current state St can be understood as the simulated current operating state data in the above-mentioned embodiment, and the action At can be understood as is the simulation scheduling strategy in the above embodiment.
  • Step 204 Evaluate the action output by the action decision model through the simulator.
  • the action policy model to be trained outputs the action At
  • the action At is input into the simulator.
  • the action At can be evaluated by the simulator, so as to obtain the immediate income Rt (reward) of the action At.
  • the immediate return Rt can be understood as the simulation strategy evaluation result in the above embodiment.
  • the simulator continues to generate a new current state St+1, continues to train the action policy model through the current state St+1, and repeats this operation until the action The policy model reaches the training stop condition.
  • the training stop condition can be determined based on the immediate gain Rt; when it is determined that the immediate gain Rt satisfies the training stop condition, it can be determined that the training of the action strategy model has been completed.
  • the immediate return Rt can be any score in the interval [0,1]. Based on this, when the simulator evaluates the nearly 10 consecutive actions At, if the nearly 10 consecutive actions At are all 1 point, it means that the action decision model has reached the training stop condition.
  • Step 206 Train the action evaluation model based on the action output by the action decision model and the immediate benefits of the action.
  • the action evaluation model can be understood as the policy evaluation model in the above embodiment.
  • the function of the action evaluation model is to evaluate the average income that can be obtained after taking the action At in the state St, which includes the current immediate income Rt and the possible average income in the future.
  • the action At output by the action decision-making model is used as sample data, and the immediate income of the action is used as the sample label, and the action evaluation model is trained through the sample data and sample labels until the action evaluation model reaches convergence, so as to determine the action evaluation The model reaches the training stop condition.
  • the output of the action decision model (action At) can be used as the input of the action evaluation model.
  • the action is evaluated by the action At and the corresponding immediate income Rt
  • the action evaluation model can also be trained based on the action At+1 obtained after processing the current state St+1 based on the action decision-making model, and the immediate income Rt+1 corresponding to the action At+1 .
  • Step 208 Deploy the trained deep reinforcement learning model online.
  • the deep reinforcement learning model can learn excellent action strategy models and corresponding action evaluation models through offline training. . After the training is completed, the trained deep reinforcement learning model can be deployed online. It is convenient for subsequent real-time generation of power system dispatching strategies based on the trained deep reinforcement learning model.
  • the data processing method provided in this specification under the scenario of real-time scheduling of the power system after training the deep reinforcement learning model, can perform online real-time or quasi-real-time scheduling based on the deep reinforcement learning model.
  • the deep reinforcement learning model and mathematical model can be used to respond to action decisions based on the observed real power grid environment state.
  • Step 210 Obtain the real grid environment status.
  • the real power grid environment state can be understood as the current state operation data in the above embodiment
  • the real grid environment state St is obtained, and the real grid environment state St is input into the deep reinforcement learning model.
  • the grid environment state St can be understood as the above The current running status data in the above-mentioned embodiment.
  • Step 212 The deep reinforcement learning model obtains an initial action strategy based on the real power grid environment state.
  • the initial action strategy can be understood as an initial scheduling strategy.
  • the action At is obtained in response to the real power grid environment state St, and the action At can be the power system that will contain (generator A, generator B, generator C)
  • the generator A is turned on, and the generator A is set to generate electricity with a minimum limit (50 MW), the power generation of the generator B is set to 100 MW, and the power generation of the generator C is set to 100 MW.
  • the action At may be understood as the initial scheduling policy in the above embodiment.
  • the variable in the action At is adjusted to obtain the variable-adjusted action A1 and action A2.
  • the action A1 can be: set the generator A in the power system to perform minimum quota power generation (50 MW), and set the power generation
  • the power generation of generator B is set to 100 MW
  • the power generation of generator C is set to 80 MW;
  • Action A2 can be: set generator A in the power system to generate the minimum quota (50 MW), set the power generation of generator B to 70 MW, and set the power generation of generator C to 100 MW .
  • the power generation (100 MW) of the generator B in the action At is determined as a decision variable affecting the long-term effect. And determine the generating capacity of generator A (50 MW) in this action At as a discrete decision variable.
  • Fix the discrete decision variable and the decision variable affecting the long-term effect in the action At, that is, set the discrete decision variable and the decision variable affecting the long-term effect as unmodifiable, so as to obtain the initial action strategy, and input the initial action strategy To the mathematical model (ACOPF model or DCOPF model).
  • each continuous decision variable in the action can be individually Disturbance, to observe the change of Critic output results, if the change is large, it can be considered that the decision variable has a greater impact on future earnings.
  • issues such as unit combination, ramping, and N-1 power grid safety criteria can be dealt with from the perspective of long-term profit maximization.
  • Step 214 Adjust other variables in the initial action strategy through a mathematical model, so as to obtain a scheduling decision result.
  • the mathematical model can be understood as the target decision model in the above embodiment, and the scheduling decision result can be understood as the target scheduling policy in the above embodiment.
  • the remaining variables in the initial action strategy are adjusted and modified through the mathematical model to perform a further single-step Optimization and the guarantee of physical security constraints, so as to obtain the target scheduling strategy.
  • the initial action strategy can be to turn on generator A, and set the generator A to generate the minimum quota (50 megawatts), set the power generation of generator B to 100 megawatts, and set the power generation of generator C to 100 megawatts.
  • the power generation capacity of the generator A (50 megawatts) and the power generation capacity of the generator C (100 megawatts) have been fixed.
  • the initial action strategy is input into the mathematical model, and the power generation of generator B (100 MW) in the initial action strategy is adjusted through the mathematical model, so as to obtain the target dispatching strategy.
  • the single-step optimization of the initial action strategy and the guarantee of physical safety constraints can avoid the problem of potential safety hazards in the decision-making results of reinforcement learning.
  • the processing speed of the mathematical model is further improved.
  • the target scheduling strategy can be: turn on generator A, and set the generator A to generate electricity with a minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to Set to 100 MW.
  • Step 216 Adjust the current operating state of the power system based on the scheduling decision result.
  • generator A in the power system is turned on, and the generator A is set to perform minimum quota power generation (50 megawatts), the power generation of generator B is set to 80 megawatts, and the generator C's power generation is set to 100 MW.
  • the data processing method provided in this manual combines mathematical modeling ACOPF/DCOPF with reinforcement learning.
  • the remaining decision variables are calculated by fixing discrete decision variables and the results of decision variables that affect long-term effects.
  • Solving mathematical modeling can speed up the solving of mathematical modeling and ensure that the solution results meet safety constraints.
  • the influence of reinforcement learning it can also take into account the optimization of long-term goals, and the final decision-making time can be controlled within 5 minutes.
  • the effect of real-time/quasi-real-time dispatching is basically achieved, making the safe operation of the power grid capable of real-time dispatching.
  • the results of the entire scheduling decision also take into account the long-term overall benefits.
  • Fig. 3 shows a flow chart of another data processing method provided according to an embodiment of the present specification, which specifically includes the following steps.
  • Step 302 Obtain current operating state data of the target power system.
  • Step 304 Input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system.
  • Step 306 Input the initial dispatch strategy into the target decision object to obtain the target dispatch strategy of the target power system.
  • Step 308 Adjust the current operating state data of the target power system based on the target dispatch strategy.
  • the current operating state of the target power system can be obtained, and the current operating state data is input into the initial decision object to obtain the initial dispatching strategy of the target power system; then Input the initial dispatching strategy into the target decision object to obtain the target dispatching strategy of the target power system. And based on the target scheduling strategy, the current operating status data of the target power system is adjusted, so that the target power system has the ability of large-scale processing and rapid response, and alleviates the difficulties faced by the manual scheduling scheme.
  • another data processing method applied in this specification to the scene of adjusting the current operating state of the power system can obtain the current operating state parameters of the power system, and the current operating state parameters can be the current power consumption of the power system.
  • the load is high, and among the three generators (generator A, generator B, and generator C) in the power system, generator A is off, and generator B and generator C are on.
  • the current operating state data of the power system is obtained, the current operating state data is input into the deep reinforcement learning model, and based on the deep reinforcement learning model, the current state of the power system can be judged based on the current operating state data, thereby generating a schedule Strategy.
  • the deep reinforcement learning model judges that the current power load of the power system is high based on the current operating status data, it can determine a dispatch strategy that can cope with the high power load of the power system.
  • the scheduling strategy output by the deep reinforcement learning model can be as follows: turn on generator A in the power system, set the generator A to generate the minimum quota of power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 90 megawatts.
  • the target scheduling strategy is: turn on generator A, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to for the case of 100 MW.
  • the initial decision object may also include two sub-modules, which are respectively a strategy determination module and a strategy evaluation module; based on this, the strategy determination module and the strategy evaluation module in the initial decision object may Process the current operating status data of the target power system to obtain the initial dispatch strategy of the target power system, thereby improving the processing efficiency of subsequent target decision-making objects, and further enabling the target power system to have large-scale processing and rapid response capabilities.
  • the way is as follows.
  • the inputting the current operating state data into the initial decision-making object to obtain the initial dispatch strategy of the target power system includes:
  • the policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
  • An initial dispatch strategy of the target power system is determined based on the dispatch strategy to be processed and a corresponding strategy evaluation result.
  • the policy determination module is a policy determination sub-model in the deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the strategy determination sub-model in the deep reinforcement learning model, and the current power consumption load of the power system is relatively high based on the strategy determination sub-model. In the case of , determine the dispatch strategy to be processed that can cope with the power load of the power system.
  • the scheduling strategy for determining the output of the sub-model in this strategy can be as follows: turn on generator A in the power system, set the generator A to generate electricity with a minimum quota (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of generator C to 100 megawatts.
  • the strategy determination sub-model After the strategy determination sub-model obtains the dispatch strategy to be processed based on the current operating state data of the power system, it can determine the variable that can be adjusted in the dispatch strategy to be processed, and the variable can be the power generation of the generator.
  • the dispatching strategy A to be processed after variable adjustment can be: Generator A in the power system is set to generate the minimum quota (50 MW), generator B is set to generate 100 MW, and generator C is set to generate 80 MW.
  • the scheduling strategy B to be processed after variable adjustment can be: set generator A in the power system to produce the minimum quota power generation (50 MW), set the power generation of generator B to 70 MW, set generator C’s The power generation is set to 100 MW.
  • the scheduling strategy to be processed After the variables in the scheduling strategy to be processed are adjusted, the scheduling strategy to be processed, the variable-adjusted pending scheduling strategy A, and the variable-adjusted pending scheduling strategy B are respectively input into the decision evaluation sub-module for evaluation,
  • the evaluation result is obtained;
  • the evaluation result of the scheduling strategy to be processed can be 1 point;
  • the evaluation result of the scheduling strategy A to be processed after the variable adjustment can be 0.3 points, and the evaluation result of the scheduling strategy B to be processed after the variable adjustment can be 0.9 point.
  • the three evaluation results are used as the strategy evaluation results corresponding to the data to be processed.
  • the power generation of generator B (100 MW) in the scheduling strategy to be processed is determined as a decision variable affecting the long-term effect.
  • the power generation capacity (50 megawatts) of the generator A in the action At is determined as a discrete decision variable.
  • Another data processing method provided in this specification is to process the current operating state data of the target energy system through the initial decision object and the target decision object, obtain the target dispatching strategy of the target power system, and use the target dispatching strategy to quickly Adjusting the current operating status data of the target power system enables the target power system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual dispatching schemes.
  • FIG. 4 shows a schematic structural diagram of a data processing device provided by an embodiment of this specification. As shown in Figure 4, the device includes:
  • the data acquisition module 402 is configured to acquire the current operating state data of the target energy system
  • the first strategy acquisition module 404 is configured to input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target energy system;
  • the second strategy acquisition module 406 is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target energy system;
  • the adjustment module 408 is configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.
  • the first policy acquisition module 404 is configured to:
  • the policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
  • An initial scheduling policy is determined based on the pending scheduling policy and a corresponding policy evaluation result.
  • the first policy acquisition module 404 is configured to:
  • the first policy acquisition module 404 is configured to:
  • the first policy acquisition module 404 is configured to:
  • the data processing device further includes a processing module configured to:
  • the decision object to be processed is processed based on the simulation scheduling strategy to obtain an initial decision object.
  • processing module is further configured to:
  • the decision object to be processed is processed to obtain an initial decision object.
  • processing module is further configured to:
  • An initial decision object is determined based on the processed policy determination module and the processed policy evaluation module.
  • processing module is further configured to:
  • Sample operating state data is determined based on the state simulation module.
  • processing module is further configured to:
  • the sample running status data is input into the decision-making model to be processed to obtain a simulation scheduling strategy.
  • processing module is further configured to:
  • the decision model to be processed is trained based on the simulated scheduling strategy to obtain an initial decision model.
  • the first policy acquiring module 404 is further configured to
  • the historical scheduling policy is a historical target scheduling policy obtained based on a target decision object
  • An initial scheduling policy sent by the policy update object is received, wherein the initial scheduling policy is obtained by updating the scheduling policy to be updated based on preset update conditions by the policy update object.
  • the data processing device provided in this manual processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target energy system’s target scheduling strategy, and quickly implements the target energy system through the target scheduling strategy.
  • the current operating status data of the system is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.
  • FIG. 5 shows a schematic structural diagram of another data processing device provided by an embodiment of this specification.
  • the device package include:
  • a data acquisition module 502 configured to acquire current operating state data of the target power system
  • the first strategy acquisition module 504 is configured to input the current operating state data into an initial decision object, and obtain an initial dispatch strategy of the target power system;
  • the second strategy acquisition module 506 is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target power system;
  • the adjustment module 508 is configured to adjust the current operating state data of the target power system based on the target dispatch strategy.
  • the first policy acquisition module 504 is further configured to:
  • the policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
  • An initial dispatch strategy of the target power system is determined based on the dispatch strategy to be processed and a corresponding strategy evaluation result.
  • Another data processing device provided in this manual processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target dispatching strategy of the target power system, and quickly Adjusting the current operating status data of the target power system enables the target power system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual dispatching schemes.
  • FIG. 6 shows a structural block diagram of a computing device 600 provided according to an embodiment of this specification.
  • Components of the computing device 600 include, but are not limited to, memory 610 and processor 620 .
  • the processor 620 is connected to the memory 610 through the bus 630, and the database 650 is used for saving data.
  • Computing device 600 also includes an access device 640 that enables computing device 600 to communicate via one or more networks 660 .
  • networks include the Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or a combination of communication networks such as the Internet.
  • Access device 640 may include one or more of any type of network interface (e.g., a network interface card (NIC)), wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, Worldwide Interoperability for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, etc.
  • NIC network interface card
  • the above-mentioned components of the computing device 600 and other components not shown in FIG. 6 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 6 is only for the purpose of illustration, rather than limiting the scope of this description. Those skilled in the art can add or replace other components as needed.
  • Computing device 600 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile telephones (e.g., smartphones), ), wearable computing devices (eg, smart watches, smart glasses, etc.), or other types of mobile devices, or stationary computing devices such as desktop computers or PCs.
  • mobile computers or mobile computing devices e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.
  • mobile telephones e.g., smartphones
  • wearable computing devices eg, smart watches, smart glasses, etc.
  • desktop computers or PCs e.g., desktop computers or PCs.
  • Computing device 600 may also be a mobile or stationary server.
  • the processor 620 is configured to execute the following computer-executable instructions.
  • the steps of the above-mentioned data processing method are implemented.
  • An embodiment of the present specification also provides a computer-readable storage medium, which stores computer-executable instructions, and implements the steps of the above-mentioned data processing method when the computer-executable instructions are executed by a processor.
  • An embodiment of the present specification also provides a computer program, wherein, when the computer program is executed in a computer, the computer is caused to execute the steps of the above data processing method.
  • the computer instructions include computer program code, which may be in source code form, object code form, executable file or some intermediate form or the like.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunication signal and software distribution medium, etc.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

Embodiments of the present application provide a data processing method and apparatus. The data processing method comprises: obtaining current operational state data of a target energy system; inputting the current operational state data into an initial decision object to obtain an initial scheduling policy of the target energy system; inputting the initial scheduling policy into a target decision object to obtain a target scheduling policy of the target energy system; and adjusting the current operational state data of the target energy system on the basis of the target scheduling policy. The present invention causes the target energy system to have large-scale processing and quick response capabilities, and difficulties faced by a manual scheduling scheme are solved.

Description

数据处理方法及装置Data processing method and device
本申请要求于2022年01月17日提交中国专利局、申请号为202210046910.8、申请名称为“数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202210046910.8 and the application title "data processing method and device" submitted to the China Patent Office on January 17, 2022, the entire contents of which are incorporated in this application by reference.
技术领域technical field
本说明书实施例涉及能源调控技术领域,特别涉及一种数据处理方法。The embodiments of this specification relate to the technical field of energy regulation, and in particular to a data processing method.
背景技术Background technique
随着科技的不断发展,能源行业正在构建以新能源为主体的新型能源系统,例如新型电力系统,但是目前新能源因其自身的波动性与间歇性,迫使电力系统调度必须具备快速响应的能力,以及大规模调度处理能力,而传统的人工调度方式显然已无法满足上述要求。With the continuous development of science and technology, the energy industry is building a new energy system with new energy as the main body, such as a new power system. However, due to the volatility and intermittency of new energy, the power system dispatch must have the ability to respond quickly. , and large-scale scheduling processing capabilities, and the traditional manual scheduling method is obviously unable to meet the above requirements.
发明内容Contents of the invention
有鉴于此,本说明书施例提供了一种数据处理方法。本说明书一个或者多个实施例同时涉及另一种数据处理方法,一种数据处理装置,另一种数据处理装置,一种计算设备,一种计算机可读存储介质,一种计算机程序,以解决现有技术中存在的技术缺陷。In view of this, the embodiment of this specification provides a data processing method. One or more embodiments of this specification also relate to another data processing method, a data processing device, another data processing device, a computing device, a computer-readable storage medium, and a computer program to solve Technical defects existing in the prior art.
根据本说明书实施例的第一方面,提供了一种数据处理方法,包括:According to the first aspect of the embodiments of this specification, a data processing method is provided, including:
获取目标能源系统的当前运行状态数据;Obtain the current operating status data of the target energy system;
将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略;Inputting the current operating status data into an initial decision object to obtain an initial scheduling strategy for the target energy system;
将所述初始调度策略输入目标决策对象,获得所述目标能源系统的目标调度策略;Inputting the initial scheduling strategy into the target decision object to obtain the target scheduling strategy of the target energy system;
基于所述目标调度策略对所述目标能源系统的所述当前运行状态数据进行调整。The current operating state data of the target energy system is adjusted based on the target dispatch strategy.
根据本说明书实施例的第二方面,提供了一种数据处理装置,包括:According to a second aspect of the embodiments of this specification, a data processing device is provided, including:
数据获取模块,被配置为获取目标能源系统的当前运行状态数据;A data acquisition module configured to acquire current operating state data of the target energy system;
第一策略获取模块,被配置为将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略;The first strategy acquisition module is configured to input the current operating status data into an initial decision object to obtain an initial dispatch strategy of the target energy system;
第二策略获取模块,被配置为将所述初始调度策略输入目标决策对象,获得所述目标能源系统的目标调度策略;The second strategy acquisition module is configured to input the initial dispatch strategy into the target decision-making object, and obtain the target dispatch strategy of the target energy system;
调整模块,被配置为基于所述目标调度策略对所述目标能源系统的所述当前运行状态数据进行调整。An adjustment module configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.
根据本说明书实施例的第三方面,提供了另一种数据处理方法,包括:According to the third aspect of the embodiments of this specification, another data processing method is provided, including:
获取目标电力系统的当前运行状态数据;Obtain the current operating state data of the target power system;
将所述当前运行状态数据输入初始决策对象,获得所述目标电力系统的初始调度策略;Inputting the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system;
将所述初始调度策略输入目标决策对象,获得所述目标电力系统的目标调度策略;Inputting the initial dispatch strategy into a target decision object to obtain a target dispatch strategy of the target power system;
基于所述目标调度策略对所述目标电力系统的所述当前运行状态数据进行调整。The current operating state data of the target power system is adjusted based on the target dispatch strategy.
根据本说明书实施例的第四方面,提供了另一种数据处理装置,包括:According to the fourth aspect of the embodiments of this specification, another data processing device is provided, including:
数据获取模块,被配置为获取目标电力系统的当前运行状态数据;A data acquisition module configured to acquire current operating state data of the target power system;
第一策略获取模块,被配置为将所述当前运行状态数据输入初始决策对象,获得所述目标电力系统的初始调度策略; The first strategy acquisition module is configured to input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system;
第二策略获取模块,被配置为将所述初始调度策略输入目标决策对象,获得所述目标电力系统的目标调度策略;The second strategy acquisition module is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target power system;
调整模块,被配置为基于所述目标调度策略对所述目标电力系统的所述当前运行状态数据进行调整。An adjustment module configured to adjust the current operating state data of the target power system based on the target dispatch strategy.
根据本说明书实施例的第五方面,提供了一种计算设备,包括:According to a fifth aspect of the embodiments of this specification, a computing device is provided, including:
存储器和处理器;memory and processor;
所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令,该计算机可执行指令被处理器执行时实现所述数据处理方法的步骤。The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the computer-executable instructions are executed by the processor, the steps of the data processing method are implemented.
根据本说明书实施例的第六方面,提供了一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现所述数据处理方法的步骤。According to a sixth aspect of the embodiments of the present specification, there is provided a computer-readable storage medium, which stores computer-executable instructions, and implements the steps of the data processing method when the computer-executable instructions are executed by a processor.
根据本说明书实施例的第七方面,提供了一种计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行所述数据处理方法的步骤。According to a seventh aspect of the embodiments of the present specification, a computer program is provided, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.
本说明书提供的数据处理方法,包括:获取目标能源系统的当前运行状态数据;将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略;将所述初始调度策略输入目标决策对象,获得所述目标能源系统的目标调度策略;基于所述目标调度策略对所述目标能源系统的所述当前运行状态数据进行调整。The data processing method provided in this specification includes: obtaining the current operating state data of the target energy system; inputting the current operating state data into the initial decision object to obtain the initial scheduling strategy of the target energy system; inputting the initial scheduling strategy into The target decision object is to obtain the target dispatch strategy of the target energy system; and adjust the current operation status data of the target energy system based on the target dispatch strategy.
具体地,该方法通过初始决策对象以及目标决策对象,对目标能源系统的当前运行状态数据进行处理,获得该目标能源系统的目标调度策略,并通过该目标调度策略快速的对目标能源系统的当前运行状态数据进行调整,使得目标能源系统具备大规模处理,以及快速响应的能力,缓解人工调度方案所面临的困境。Specifically, the method processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target scheduling strategy of the target energy system, and quickly analyzes the current state of the target energy system through the target scheduling strategy The operation status data is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.
附图说明Description of drawings
图1是本说明书一个实施例提供的一种数据处理方法的流程图;Fig. 1 is a flow chart of a data processing method provided by an embodiment of this specification;
图2是本说明书一个实施例提供的一种数据处理方法的处理过程流程图;Fig. 2 is a processing flow chart of a data processing method provided by an embodiment of this specification;
图3是本说明书一个实施例提供的另一种数据处理方法的流程图;Fig. 3 is a flowchart of another data processing method provided by an embodiment of this specification;
图4是本说明书一个实施例提供的一种数据处理装置的结构示意图;Fig. 4 is a schematic structural diagram of a data processing device provided by an embodiment of this specification;
图5是本说明书一个实施例提供的另一种数据处理装置的结构示意图;Fig. 5 is a schematic structural diagram of another data processing device provided by an embodiment of this specification;
图6是本说明书一个实施例提供的一种计算设备的结构框图。Fig. 6 is a structural block diagram of a computing device provided by an embodiment of this specification.
具体实施方式Detailed ways
在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本说明书内涵的情况下做类似推广,因此本说明书不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of the specification. However, this specification can be implemented in many other ways different from those described here, and those skilled in the art can make similar extensions without violating the connotation of this specification, so this specification is not limited by the specific implementations disclosed below.
在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。Terms used in one or more embodiments of this specification are for the purpose of describing specific embodiments only, and are not intended to limit one or more embodiments of this specification. As used in one or more embodiments of this specification and the appended claims, the singular forms "a", "the", and "the" are also intended to include the plural forms unless the context clearly dictates otherwise. It should also be understood that the term "and/or" used in one or more embodiments of the present specification refers to and includes any or all possible combinations of one or more associated listed items.
应当理解,尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一也可以被称为第二,类似地,第二也可以被称为第一。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。 It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of the present specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, the first may also be referred to as the second, and similarly, the second may also be referred to as the first without departing from the scope of one or more embodiments of the present specification. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."
首先,对本说明书一个或多个实施例涉及的名词术语进行解释。First, terms and terms involved in one or more embodiments of this specification are explained.
数学建模:通过数学手段来建立所求问题的模型,并通过求解器进行求解。Mathematical modeling: establish the model of the problem sought by mathematical means, and solve it through the solver.
强化学习:一种人工智能的机器学习方式,它可以仅通过环境给予的奖励/惩罚来进行动作策略的学习。Reinforcement learning: An artificial intelligence machine learning method that can learn action strategies only through rewards/punishments given by the environment.
潮流计算(PF,Power Flow):根据给定的电网结构、参数和发电机、负荷等元件的运行条件,确定电力系统各部分稳态运行状态参数的计算。Power flow calculation (PF, Power Flow): According to the given power grid structure, parameters and operating conditions of components such as generators and loads, the calculation of the steady-state operating state parameters of each part of the power system is determined.
潮流计算(OPF,Optimal Power Flow):在满足稳态潮流以及其他设备元件运行约束的前提下,确定发电机的功率和节点电压幅值,使系统的某一性能指标(如发电成本或网络损耗)达到优秀。Power flow calculation (OPF, Optimal Power Flow): Under the premise of satisfying the steady-state power flow and the operation constraints of other equipment components, determine the power of the generator and the node voltage amplitude, so that a certain performance index of the system (such as power generation cost or network loss ) to achieve excellence.
ACOPF:一种潮流计算模型。ACOPF: A power flow calculation model.
DCOPF:一种潮流计算模型。DCOPF: A power flow calculation model.
为更好的服务“双碳”战略目标,许多能源行业正在构建以新能源为主体的新型能源系统,例如,电力行业正在构建以新能源为主体的新型电力系统,但目前转型过程中也面临了许多挑战。一方面新能源因其自身的波动性与间歇性,迫使电网调度必须具备快速响应的能力,另外一方面大规模新能源的并网,也迫使电网调度必须具备大规模调度处理能力,而传统的日间人工调度方式显然已无法满足上述要求。In order to better serve the strategic goal of "dual carbon", many energy industries are building new energy systems with new energy as the main body. For example, the electric power industry is building a new type of power system with new energy as the main body. Many challenges. On the one hand, due to its own volatility and intermittency, new energy forces the power grid scheduling to have the ability to respond quickly; The daytime manual scheduling method is obviously unable to meet the above requirements.
基于此,本说明书提供的三种方案,分别为:潮流计算方案、机组组合调度方案、以及基于强化学习的调度方案。其中,第一种潮流计算(OPF)方案:该方案本身是通过数学建模方法,在满足稳态潮流及其他设备元件运行约束的前提下,确定发电机的功率和节点电压幅值,使系统的某一性能指标(如发电成本或网络损耗)达到优秀。它仅考虑单步决策,该方案通常又可以分为ACOPF和DCOPF。Based on this, the three schemes provided in this manual are: power flow calculation scheme, unit combination scheduling scheme, and reinforcement learning-based scheduling scheme. Among them, the first power flow calculation (OPF) scheme: the scheme itself uses mathematical modeling methods to determine the power of the generator and the node voltage amplitude under the premise of satisfying the steady-state power flow and other equipment components, so that the system A certain performance index (such as power generation cost or network loss) is excellent. It only considers single-step decision-making, and the scheme can usually be divided into ACOPF and DCOPF.
ACOPF因为优化问题里包含了非线性函数,通常会导致求解时间较慢,在10000节点规模的电力系统上,该ACOPF的求解时间大约在10分钟。Because ACOPF contains nonlinear functions in the optimization problem, it usually leads to a slow solution time. On a power system with a scale of 10,000 nodes, the solution time of ACOPF is about 10 minutes.
DCOPF是在ACOPF的基础上对电网系统的运行进行了简化和近似,使得数学模型变成了线性模型,从而可以加快求解速度,提速大概在一倍多,但如果考虑多步的DCOPF同样时效很慢。DCOPF simplifies and approximates the operation of the grid system on the basis of ACOPF, turning the mathematical model into a linear model, which can speed up the solution speed, and the speed increase is more than doubled, but if multi-step DCOPF is also considered, it is also time-sensitive slow.
而ACOPF模型或DCOPF模型采用了数学建模的手段来进行单步潮流的计算。优点是决策结果有安全保障,可解释性好,缺点是在10000节点规模的电网系统里,ACOPF耗时预估在10分钟左右,DCOPF在5分钟左右,这也意味着它无法从长期多步优化的角度来进行当前决策,其次它也不能进行开关机等离散决策,否则求解会更耗时。The ACOPF model or DCOPF model uses the means of mathematical modeling to calculate the single-step power flow. The advantage is that the decision-making results are safe and interpretable. The disadvantage is that in a power grid system with a scale of 10,000 nodes, the estimated time consumption of ACOPF is about 10 minutes, and that of DCOPF is about 5 minutes. The current decision is made from the perspective of optimization, and secondly, it cannot make discrete decisions such as switching on and off, otherwise the solution will be more time-consuming.
第二种机组组合调度(SCUC,Security Constrained Unit Commitment)方案:该方案是以多个时间步的整体目标优秀程度来决定各时刻机组的启停决策以及出力决策的,通常会作为日间调度决策方案。虽然可以提前一天来进行决策,但新能源的波动性导致日级的决策效果很难应对实时的变化情况。The second unit combination scheduling (SCUC, Security Constrained Unit Commitment) scheme: this scheme determines the start-stop decision and output decision of the unit at each time based on the overall target excellence of multiple time steps, and is usually used as a daytime scheduling decision plan. Although decisions can be made one day in advance, the volatility of new energy makes it difficult for day-level decision-making to cope with real-time changes.
SCUC方案虽然有比较充足的时间来进行开关机以及多步的决策优化,但新能源的波动性只有在实时/准实时调度时才能更准确进行建模和应对。强化学习可以通过仿真器在离线基于数据驱动的方式来进行长期目标的学习与优化。它的优点是通过模型的合理设计是可以对机组组合、爬坡、以及N-1电网安全准则等问题从长期收益最大化的角度来进行决策的,在线可以做到秒级调度决策,而缺点是决策结果可能存在安全隐患。Although the SCUC scheme has sufficient time for switching on and off and multi-step decision-making optimization, the volatility of new energy can only be more accurately modeled and dealt with during real-time/quasi-real-time scheduling. Reinforcement learning can learn and optimize long-term goals in an off-line data-driven manner through the simulator. Its advantage is that through the reasonable design of the model, decisions can be made from the perspective of long-term profit maximization on issues such as unit combination, ramping, and N-1 power grid safety criteria. Online scheduling decisions can be made in seconds, while the disadvantages It is the result of the decision that may have potential safety hazards.
第三种基于强化学习的调度方案:强化学习本身是适用于序贯决策的场景的,因此也常常用来进行长期目标的优化,该方案的决策响应时间非常快,通常都在秒级。但强化学习由于是基于数据驱动的方式来进行学习的,它不善于处理含有硬性约束条件的问题,具 体在电网调度场景中,强化学习的决策结果有可能不能满足安全约束,这使得完全基于强化学习进行电网调度的方案存在安全隐患。The third scheduling scheme based on reinforcement learning: reinforcement learning itself is suitable for sequential decision-making scenarios, so it is often used to optimize long-term goals. The decision-making response time of this scheme is very fast, usually at the second level. However, because reinforcement learning is based on data-driven learning, it is not good at dealing with problems with hard constraints. In the power grid dispatching scenario, the decision result of reinforcement learning may not satisfy the security constraints, which makes the scheme of power grid dispatching based on reinforcement learning have security risks.
基于此,在本说明书中,提供了一种数据处理方法,本说明书同时涉及一种数据处理装置,另一种数据处理方法,另一种数据处理装置,一种计算设备,一种计算机可读存储介质以及一种计算机程序,在下面的实施例中逐一进行详细说明。Based on this, in this specification, a data processing method is provided. This specification also relates to a data processing device, another data processing method, another data processing device, a computing device, and a computer-readable The storage medium and a computer program are described in detail in the following embodiments.
图1示出了根据本说明书一个实施例提供的一种数据处理方法的流程图,具体包括以下步骤。Fig. 1 shows a flow chart of a data processing method provided according to an embodiment of this specification, which specifically includes the following steps.
步骤102:获取目标能源系统的当前运行状态数据。Step 102: Obtain current operating state data of the target energy system.
其中,目标能源系统可以理解为能够对能源进行处理的系统;在本说明书提供的数据处理方法应用在不同场景的情况下,该目标能源系统也不同。例如,在本说明书提供的数据处理方法能够应用于新能源场景的情况下,该目标能源系统可以理解为新能源电力系统,比如,太阳能发电系统、风力发电系统、水力发电系统等。在本说明书提供的数据处理方法能够应用于石油开采的情况下,该目标能源系统可以理解为石油开采系统。Wherein, the target energy system can be understood as a system capable of processing energy; when the data processing method provided in this specification is applied in different scenarios, the target energy system is also different. For example, in the case where the data processing method provided in this specification can be applied to a new energy scenario, the target energy system can be understood as a new energy power system, such as a solar power system, a wind power system, a hydropower system, and the like. In the case that the data processing method provided in this specification can be applied to oil extraction, the target energy system can be understood as an oil extraction system.
在目标能源系统为电力系统的情况下,该当前运行状态数据可以理解为该电力系统中的电网结构和发电机等元件的运行状态参数、电力系统的负荷等参数。包括但不限于发电机的开关状态、发电机的发电效率、电力系统的负荷等。在实际应用中,在电力系统为水力发电系统的情况下,该当前运行状态数据可以为该水力发电系统中的电网结构和水力发电机组等元件的运行状态参数、水力发电系统的供电负荷等参数。同理,在电力系统为风力发电系统的情况下,该当前运行状态数据可以为该风力发电系统中的电网结构和风力发电机组等元件的运行状态参数、风力发电系统的供电负荷等参数。In the case where the target energy system is a power system, the current operating state data can be understood as parameters such as the grid structure and operating state parameters of elements such as generators in the power system, and the load of the power system. Including but not limited to the switch state of the generator, the power generation efficiency of the generator, the load of the power system, etc. In practical applications, when the power system is a hydroelectric power generation system, the current operating state data can be the operating state parameters of the power grid structure and hydroelectric generating units and other components in the hydroelectric power generation system, and the power supply load of the hydroelectric power generation system. . Similarly, when the power system is a wind power generation system, the current operating state data may be parameters such as the power grid structure and the operating state parameters of components such as wind power generating sets in the wind power generating system, and the power supply load of the wind power generating system.
在目标能源系统为石油开采系统的情况下,该当前运行状态数据可以理解为该石油开采系统中的泵阀和输油管线等部件的运行状态参数等,包括但不限于泵阀的开关状态、输油管线的压力等。In the case where the target energy system is an oil extraction system, the current operating status data can be understood as the operating status parameters of components such as pump valves and oil pipelines in the oil extraction system, including but not limited to the switch status of pump valves, oil pipelines, etc. line pressure etc.
为了便于理解,下述均以目标能源系统为电力系统为例,对本说明书提供的数据处理方法进行说明,其中,该电力系统可以为任意一种新能源电力系统,例如太阳能发电系统、风力发电系统、水力发电系统、潮汐发电系统等。In order to facilitate understanding, the following will take the target energy system as an example of an electric power system to illustrate the data processing method provided in this specification, wherein the electric power system can be any new energy power system, such as a solar power generation system, a wind power generation system , hydroelectric power generation system, tidal power generation system, etc.
具体地,本说明书提供的数据处理方法中,能够获取到目标能源系统的当前运行状态。Specifically, in the data processing method provided in this specification, the current operating state of the target energy system can be obtained.
下面以数据处理方法应用于调整电力系统的当前运行状态的场景下为例,对获取目标能源系统的当前运行状态数据做进一步说明。该目标能源系统为电力系统,该当前运行状态数据为电力系统的中电网结构和发电机等元件的运行状态参数。Taking the scenario where the data processing method is applied to adjust the current operating state of the power system as an example, the acquisition of the current operating state data of the target energy system will be further described. The target energy system is an electric power system, and the current operating state data is an operating state parameter of components such as a power grid structure and a generator in the electric power system.
基于此,本说明书应用于调整电力系统的当前运行状态的场景下的数据处理方法,能够获取到电力系统的当前运行状态参数,该当前运行状态参数可以为电力系统的当前用电负荷较高,并且该电力系统中3台发电机(发电机A、发电机B、发电机C)中发电机A为关闭状态、发电机B、发电机C为开启状态。Based on this, this specification is applied to the data processing method in the scenario of adjusting the current operating state of the power system, which can obtain the current operating state parameters of the power system. The current operating state parameters can be that the current power load of the power system is relatively high, In addition, among the three generators (generator A, generator B, and generator C) in the power system, generator A is in an off state, and generator B and generator C are in an on state.
在具体实施过程中,可以通过任意一种获取电力系统的当前运行状态数据的方式,获取电力系统的当前运行状态数据,本说明书对此不做具体限制。例如,通过电力系统中配置的各种传感器,确定电力系统的当前运行状态数据。In the specific implementation process, the current operating state data of the electric power system can be obtained through any method of obtaining the current operating state data of the electric power system, and this specification does not specifically limit this. For example, through various sensors configured in the power system, the current operating status data of the power system can be determined.
此外,本说明书提供的数据处理方法可以应用于能够对目标能源系统的当前运行状态数据进行调整的决策调度模块中,该决策调度模块能够通过初始决策对象和目标决策对象,获得目标调度策略,并基于该目标调度策略对目标能源系统的当前运行状态数据进行调整。其中,该决策调度模块可以被部署在目标能源系统中,也可以独立于目标能源系统之外。在实际应用中,该决策调度模块可以为一个独立于电力系统之外的决策调度平台、决策调 度服务器,或者,该决策调度模块可以为电力系统中部署的决策调度设备、决策调度服务器等等。本说明书对此不做具体限制。In addition, the data processing method provided in this specification can be applied to a decision-making scheduling module that can adjust the current operating state data of the target energy system. The decision-making scheduling module can obtain the target scheduling strategy through the initial decision object and the target decision object, and The current operating state data of the target energy system is adjusted based on the target scheduling strategy. Wherein, the decision scheduling module can be deployed in the target energy system, or can be independent of the target energy system. In practical applications, the decision-making scheduling module can be a decision-making scheduling platform independent of the power system, a decision-making scheduling degree server, or the decision-making scheduling module may be a decision-making scheduling device, a decision-making scheduling server, etc. deployed in the power system. This specification does not specifically limit this.
步骤104:将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略。Step 104: Input the current operating status data into an initial decision object to obtain an initial scheduling strategy for the target energy system.
其中,该初始决策对象可以理解为能够基于当前运行状态数据获得目标能源系统的初始调度策略的对象,在实际应用中,该初始决策对象可以为深度学习模型、电子设备、或者应用程序等。为了便于理解,下述均以初始决策对象为一种深度强化学习模型为例,对本说明书提供的数据处理方法进行说明。其中,在初始决策对象为深度强化学习模型的情况下,该深度学习模型可以理解为任意一种能够基于当前运行状态数据获得目标能源系统的初始调度策略的模型。Wherein, the initial decision object can be understood as an object that can obtain the initial scheduling strategy of the target energy system based on the current operating state data. In practical applications, the initial decision object can be a deep learning model, electronic equipment, or an application program. For ease of understanding, the following describes the data processing method provided in this manual by taking the initial decision object as an example of a deep reinforcement learning model. Wherein, in the case that the initial decision object is a deep reinforcement learning model, the deep learning model can be understood as any model that can obtain the initial scheduling strategy of the target energy system based on the current operating state data.
该初始调度策略可以理解为对目标对象的当前运行状态数据进行调整的策略。例如,在目标能源系统为电力系统的情况下,该初始调度策略可以为对电力系统的当前运行状态数据进行调整的策略。The initial scheduling strategy can be understood as a strategy for adjusting the current running state data of the target object. For example, in the case that the target energy system is an electric power system, the initial scheduling strategy may be a strategy for adjusting current operating state data of the electric power system.
沿用上例,其中,该初始决策对象为深度强化学习模型。基于此,在获得电力系统的当前运行状态数据之后,将该当前运行状态数据输入至深度强化学习模型中,基于该深度强化学习模型能够基于该当前运行状态数据对电力系统的当前状态进行判断,从而生成调度策略。比如该深度强化学习模型基于当前运行状态数据判断电力系统的当前用电负荷较高的情况下,可以确定出能够应对该电力系统的用电负荷较高情况的调度策略。该深度强化学习模型输出的调度策略可以为,将电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为90兆瓦。Following the above example, the initial decision object is a deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the deep reinforcement learning model, and based on the deep reinforcement learning model, the current state of the power system can be judged based on the current operating state data, Thus generating a scheduling policy. For example, when the deep reinforcement learning model judges that the current power load of the power system is high based on the current operating status data, it can determine a dispatch strategy that can cope with the high power load of the power system. The scheduling strategy output by the deep reinforcement learning model can be as follows: turn on generator A in the power system, set the generator A to generate the minimum quota of power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 90 megawatts.
本说明书一实施例中,该初始决策对象中还可以包括两个子模块,分别为策略确定模块以及策略评估模块;基于此,可以通过初始决策对象中的策略确定模块以及策略评估模块对目标能源系统的当前运行状态数据进行处理,获得目标能源系统的初始调度策略,从而提高后续目标决策对象的处理效率,进一步使得目标能源系统能够具备大规模处理,以及快速响应的能力,具体实现方式如下。In an embodiment of this specification, the initial decision-making object may also include two sub-modules, namely a strategy determination module and a strategy evaluation module; The current operating status data of the target energy system is processed to obtain the initial scheduling strategy of the target energy system, thereby improving the processing efficiency of subsequent target decision-making objects, and further enabling the target energy system to have large-scale processing and rapid response capabilities. The specific implementation methods are as follows.
所述将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略,包括步骤1042至步骤1046:The inputting the current operating state data into the initial decision-making object to obtain the initial scheduling strategy of the target energy system includes steps 1042 to 1046:
步骤1042:将所述当前运行状态数据输入初始决策对象的策略确定模块,获得待处理调度策略。Step 1042: Input the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed.
其中,该策略确定模块可以理解为初始决策对象中能够实现对当前运行状态数据进行处理并获得待处理调度策略的模块。在初始决策对象为深度强化学习模型的情况下,该策略确定模块可以为深度强化学习模型中的一个策略确定子模型。Wherein, the policy determination module can be understood as a module in the initial decision object that can process the current running state data and obtain the scheduling policy to be processed. In the case that the initial decision object is a deep reinforcement learning model, the policy determination module may be a policy determination sub-model in the deep reinforcement learning model.
具体地,将目标能源系统的当前运行状态数据,输入至初始决策对象的策略确定模块中,通过该策略确定模块对该当前运行状态数据进行处理,从而获得该目标能源系统的待处理调度策略。Specifically, the current operating state data of the target energy system is input into the policy determination module of the initial decision object, and the current operating state data is processed by the policy determination module to obtain the pending scheduling strategy of the target energy system.
沿用上例,其中,该策略确定模块为深度强化学习模型中的策略确定子模型。基于此,在获得电力系统的当前运行状态数据之后,将该当前运行状态数据输入至深度强化学习模型中的策略确定子模型中,基于该策略确定子模型在电力系统的当前用电负荷较高的情况下,确定出能够应对该电力系统的用电负荷情况的调度策略。该策略确定子模型输出的调度策略可以为,将电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为100兆瓦。 Following the above example, the policy determination module is a policy determination sub-model in the deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the strategy determination sub-model in the deep reinforcement learning model, and the current power consumption load of the power system is relatively high based on the strategy determination sub-model. In the case of , a dispatching strategy that can cope with the power load of the power system is determined. The scheduling strategy for determining the output of the sub-model in this strategy can be as follows: turn on generator A in the power system, set the generator A to generate electricity with a minimum quota (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of generator C to 100 megawatts.
步骤1044:基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果。Step 1044: The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed.
其中,策略评估模块可以理解为初始决策对象中能够实现对调度策略进行评估的模块,在初始决策对象为深度强化学习模型的情况下,该策略评估模块可以为深度强化学习模型中的一个策略评估子模型。Among them, the policy evaluation module can be understood as a module in the initial decision object that can evaluate the scheduling strategy. In the case that the initial decision object is a deep reinforcement learning model, the policy evaluation module can be a policy evaluation in the deep reinforcement learning model submodel.
该策略评估结果理解为策略评估模块对待处理调度策略进行评估获得的结果;在实际应用中,该策略评估结果可以为评估分值。例如1分、0.9分等。在策略评估模块为策略评估子模型的情况下,能够将该待处理调度策略输入至策略评估子模型中,获得待处理调度策略对应的评估结果。The policy evaluation result is understood as the result obtained by the policy evaluation module evaluating the scheduling policy to be processed; in practical applications, the policy evaluation result may be an evaluation score. For example, 1 point, 0.9 points, etc. In the case that the policy evaluation module is a policy evaluation sub-model, the scheduling policy to be processed can be input into the policy evaluation sub-model to obtain an evaluation result corresponding to the scheduling policy to be processed.
具体地,在通过初始决策对象的策略确定模块获得待处理调度策略之后,能够基于该初始决策对象中的策略评估模块对该待处理调度策略进行评估,从而获得该待处理调度策略对应的策略评估结果。Specifically, after the pending scheduling policy is obtained through the policy determination module of the initial decision object, the pending scheduling policy can be evaluated based on the policy evaluation module in the initial decision object, thereby obtaining the policy evaluation corresponding to the pending scheduling policy result.
在实际应用中,为了提高目标决策对象的处理效率,使得目标能源系统具备大规模处理,以及快速响应的能力,本说明书实施例提供的数据处理方法,会对待处理调度策略中的特定变量进行固定,也即是说,将待处理调度策略中的特定变量设置为不可调整。使得后续目标决策对象只需要对初始调度策略中能够调整的变量进行处理即可,从而提高了目标决策对象的处理效率。其中,该特定变量可以理解为对调度策略的效果造成较大负面影响的变量。In practical applications, in order to improve the processing efficiency of the target decision-making object, so that the target energy system has the ability of large-scale processing and rapid response, the data processing method provided by the embodiment of this specification will fix the specific variables in the scheduling strategy to be processed , that is, setting specific variables in the pending scheduling policy to be non-tunable. The subsequent target decision-making object only needs to process the variables that can be adjusted in the initial scheduling strategy, thereby improving the processing efficiency of the target decision-making object. Wherein, the specific variable can be understood as a variable that has a relatively large negative impact on the effect of the scheduling strategy.
基于此,当通过策略确定子模型获得待处理调度策略之后,还可以对该调度策略中的变量进行调整,从而获得进行变量调整后的待处理调度策略。Based on this, after the pending scheduling policy is obtained through the policy determination sub-model, the variables in the scheduling policy can also be adjusted, so as to obtain the pending scheduling policy after variable adjustment.
并且,在对待处理调度策略中的变量进行调整之后,能够通过策略评估子模型,分布对该待处理调度策略以及进行变量调整后的待处理调度策略进行评估,从而基于待处理调度策略的评估结果,以及进行变量调整后的待处理调度策略的评估结果,从待处理调度策略中确定出对调度策略的效果造成较大负面影响的变量。且进一步将该变量设置为不可调整,提高了后续目标决策对象的处理效率,并且避免了目标决策对象对该变量进行修改所导致的调度策略的效果较差的问题。Moreover, after adjusting the variables in the scheduling strategy to be processed, the scheduling strategy to be processed and the scheduling strategy to be processed after the variable adjustment can be distributed to evaluate the scheduling strategy to be processed through the policy evaluation sub-model, so that based on the evaluation result of the scheduling strategy to be processed , and the evaluation result of the scheduling strategy to be processed after variable adjustment, and determine the variable that has a greater negative impact on the effect of the scheduling strategy from the scheduling strategy to be processed. Furthermore, setting the variable as non-adjustable improves the processing efficiency of the subsequent target decision-making object, and avoids the problem that the effect of the scheduling strategy is poor due to the modification of the variable by the target decision-making object.
其中,对待处理调度策略中的变量进行调整,并对该待处理调度策略以及进行变量调整后的待处理调度策略进行评估的方式,具体如下。Wherein, the manner of adjusting the variables in the pending scheduling strategy and evaluating the pending scheduling strategy and the pending scheduling strategy after variable adjustment is as follows.
所述基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果,包括:The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed, including:
基于预设参数调整规则对所述待处理调度策略中的决策参数进行修改,获得调整后的待处理调度策略;modifying the decision parameters in the scheduling strategy to be processed based on preset parameter adjustment rules to obtain the adjusted scheduling strategy to be processed;
将所述待处理调度策略以及所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的策略评估结果。Inputting the scheduling policy to be processed and the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object to obtain a policy evaluation result of the scheduling policy to be processed.
其中,该决策参数可以理解为待处理调度策略中能够进行调整的变量,例如,该决策变量可以为调度策略中为电力系统中的发电机A所设置的发电量(50兆瓦)。Wherein, the decision parameter can be understood as a variable that can be adjusted in the dispatch strategy to be processed, for example, the decision variable can be the power generation (50 MW) set for generator A in the power system in the dispatch strategy.
预设参数修改规则可以理解为对待处理调度策略中的决策参数进行修改的规则,例如,决策变量为发电机A所设置的发电量(50兆瓦)的情况下,该预设参数修改规则可以为将调度策略中发电机A的发电量(50兆瓦)下调10兆瓦。The preset parameter modification rule can be understood as a rule for modifying the decision parameters in the scheduling strategy to be processed. For example, when the decision variable is the power generation (50 megawatts) set by generator A, the preset parameter modification rule can be In order to reduce the power generation (50 MW) of generator A in the dispatch strategy by 10 MW.
具体地,在通过初始决策对象的策略确定模块获得待处理调度策略之后,能够确定出该待处理调度策略中的决策参数,并基于预设参数修改规则对该待处理调度策略中的决策参数进行调整,从而获得调整后的待处理调度策略。 Specifically, after obtaining the scheduling strategy to be processed through the strategy determination module of the initial decision object, the decision parameters in the scheduling strategy to be processed can be determined, and the decision parameters in the scheduling strategy to be processed can be determined based on the preset parameter modification rules. Adjust, so as to obtain the adjusted pending scheduling policy.
通过将待处理调度策略以及调整后的待处理调度策略输入该初始决策对象的策略评估模块中,基于该策略评估模块对该待处理调度策略以及调整后的待处理调度策略进行评估,从而获得该待处理调度策略的策略评估结果。By inputting the pending scheduling policy and the adjusted pending scheduling policy into the policy evaluation module of the initial decision object, based on the policy evaluation module, evaluating the pending scheduling policy and the adjusted pending scheduling policy, thereby obtaining the Policy evaluation results for pending scheduling policies.
沿用上例,其中,该决策参数可以为待处理调度策略中能够进行调整的变量,该待处理调度策略可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为100兆瓦。基于此,策略确定子模型在基于电力系统的当前运行状态数据获得待处理调度策略之后,能够确定该待处理调度策略中能够进行调整的变量,该变量可以为发电机的发电量。Using the above example, where the decision parameter can be a variable that can be adjusted in the pending scheduling strategy, the pending scheduling strategy can be: set the generator A in the power system to perform minimum quota power generation (50 megawatts), Set the generating capacity of Generator B to 100 MW and the generating capacity of Generator C to 100 MW. Based on this, after the strategy determination sub-model obtains the dispatch strategy to be processed based on the current operating state data of the power system, it can determine the variable that can be adjusted in the dispatch strategy to be processed, and the variable can be the power generation of the generator.
并通过基于预先设定的对变量进行修改规则,对待处理调度策略中发电机的发电量进行调整,从而获得变量调整后的待处理调度策略,该变量调整后的待处理调度策略可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为80兆瓦。And by modifying the variables based on the preset rules, the power generation of the generators in the dispatching strategy to be processed is adjusted, so as to obtain the dispatching strategy to be processed after the variable adjustment, and the dispatching strategy to be processed after the variable adjustment can be: Generator A in the power system is set to generate the minimum quota (50 MW), generator B is set to generate 100 MW, and generator C is set to generate 80 MW.
在对待处理调度策略中的变量进行调整之后,将该待处理调度策略以及变量调整后的待处理调度策略,分别输入至决策评估子模块中进行评估,从而获得决策评估子模块针对该待处理调度策略的策略评估结果。After adjusting the variables in the scheduling strategy to be processed, the scheduling strategy to be processed and the scheduling strategy to be processed after the variable adjustment are respectively input into the decision evaluation sub-module for evaluation, so as to obtain the decision evaluation sub-module for the pending scheduling Policy evaluation result for the policy.
本说明书提供的实施例中,将通过对待处理调度策略中的决策参数进行调整,获得的调整后的待处理调度策略,以及待处理调度策略输入初始决策对象的策略评估模块,获得待处理调度策略的策略评估结果,便于后续基于该策略评估结果生成初始调度策略,进一步提高目标决策对象的处理效率,使得目标能源系统具备大规模处理,以及快速响应的能力。In the embodiment provided in this specification, the adjusted scheduling strategy to be processed is obtained by adjusting the decision parameters in the scheduling strategy to be processed, and the scheduling strategy to be processed is input into the strategy evaluation module of the initial decision object to obtain the scheduling strategy to be processed The evaluation result of the strategy is convenient for the subsequent generation of an initial scheduling strategy based on the evaluation result of the strategy, further improving the processing efficiency of the target decision object, and enabling the target energy system to have large-scale processing and rapid response capabilities.
进一步地,所述将所述待处理调度策略以及所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的策略评估结果,包括:Further, the inputting the pending scheduling policy and the adjusted pending scheduling policy into the policy evaluation module of the initial decision object to obtain the policy evaluation result of the pending scheduling policy includes:
将所述待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的第一评估结果;inputting the scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a first evaluation result of the scheduling policy to be processed;
将所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的第二评估结果;inputting the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a second evaluation result of the scheduling policy to be processed;
基于所述第一评估结果以及所述第二评估结果确定所述待处理调度策略的策略评估结果。Determine a policy evaluation result of the scheduling policy to be processed based on the first evaluation result and the second evaluation result.
其中,该第一评估结果可以理解为策略评估模块对待处理调度策略的评估结果。第二评估结果可以理解为策略评估模块对调整后的待处理调度策略的评估结果,在实际应用中,该评估结果可以为评估分值,例如1分、0.9分等。Wherein, the first evaluation result can be understood as an evaluation result of the scheduling policy to be processed by the policy evaluation module. The second evaluation result can be understood as the evaluation result of the adjusted scheduling strategy to be processed by the policy evaluation module. In practical applications, the evaluation result can be an evaluation score, such as 1 point, 0.9 point and so on.
具体地,在获得调整后的待处理调度策略之后,将待处理调度策略输入初始决策对象的策略评估模块,通过该策略评估模块对待处理调度策略进行评估,获得该待处理调度策略的第一评估结果。将调整后的待处理调度策略输入初始决策对象的策略评估模块,通过该策略评估模块对调整后的待处理调度策略进行评估,获得调整后的待处理调度策略的第二评估结果,之后基于第一评估结果以及第二评估结果确定待处理调度策略的策略评估结果。Specifically, after obtaining the adjusted scheduling strategy to be processed, the scheduling strategy to be processed is input into the strategy evaluation module of the initial decision object, and the scheduling strategy to be processed is evaluated by the strategy evaluation module to obtain the first evaluation of the scheduling strategy to be processed result. Input the adjusted scheduling strategy to be processed into the strategy evaluation module of the initial decision object, evaluate the adjusted scheduling strategy to be processed through the strategy evaluation module, obtain the second evaluation result of the adjusted scheduling strategy to be processed, and then based on the first The first evaluation result and the second evaluation result determine the policy evaluation result of the scheduling policy to be processed.
沿用上例,其中,变量调整后的待处理调度策略A可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为80兆瓦。变量调整后的待处理调度策略B可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为70兆瓦、将发电机C的发电量设置为100兆瓦。 Following the above example, the scheduling strategy A to be processed after variable adjustment can be as follows: set generator A in the power system to produce the minimum quota power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 80 megawatts. The scheduling strategy B to be processed after variable adjustment can be: set generator A in the power system to produce the minimum quota power generation (50 MW), set the power generation of generator B to 70 MW, set generator C’s The power generation is set to 100 MW.
基于此,分别将待处理调度策略、变量调整后的待处理调度策略A、变量调整后的待处理调度策略B,分别输入至决策评估子模块中进行评估,从而获得评估结果;该待处理调度策略的评估结果可以为1分;变量调整后的待处理调度策略A的评估结果可以为0.3分,变量调整后的待处理调度策略B的评估结果可以为0.9分。之后将三个评估结果作为待处理数据对应的策略评估结果。Based on this, the scheduling strategy to be processed, the scheduling strategy A to be processed after variable adjustment, and the scheduling strategy B to be processed after variable adjustment are respectively input into the decision evaluation sub-module for evaluation, so as to obtain the evaluation result; the pending scheduling The evaluation result of the strategy can be 1 point; the evaluation result of the pending scheduling strategy A after the variable adjustment can be 0.3 points, and the evaluation result of the pending scheduling strategy B after the variable adjustment can be 0.9 points. Afterwards, the three evaluation results are used as the strategy evaluation results corresponding to the data to be processed.
本说明书提供的实施例中,通过基于待处理调度策略的第一评估结果以及待处理调度策略的第二评估结果,确定待处理调度策略的策略评估结果。便于后续基于该策略评估结果生成初始调度策略,进一步提高目标决策对象的处理效率,使得目标能源系统具备大规模处理,以及快速响应的能力。In the embodiment provided in this specification, the policy evaluation result of the scheduling policy to be processed is determined based on the first evaluation result of the scheduling policy to be processed and the second evaluation result of the scheduling policy to be processed. It is convenient to generate an initial scheduling strategy based on the evaluation result of the strategy, further improve the processing efficiency of the target decision object, and enable the target energy system to have large-scale processing and rapid response capabilities.
步骤1046:基于所述待处理调度策略以及对应的策略评估结果确定初始调度策略。Step 1046: Determine an initial scheduling strategy based on the scheduling strategy to be processed and the corresponding strategy evaluation result.
具体地,所述基于所述待处理调度策略以及对应的策略评估结果确定初始调度策略,包括:Specifically, the determining an initial scheduling strategy based on the scheduling strategy to be processed and the corresponding strategy evaluation result includes:
基于所述策略评估结果确定所述待处理调度策略中的第一参数;determining a first parameter in the pending scheduling policy based on the policy evaluation result;
基于预设确定条件确定所述待处理调度策略中的第二参数;determining a second parameter in the scheduling policy to be processed based on preset determination conditions;
将所述待处理调度策略中所述第一参数以及所述第二参数设置为固定参数,获得初始调度策略。Setting the first parameter and the second parameter in the pending scheduling strategy as fixed parameters to obtain an initial scheduling strategy.
其中,第一参数可以理解为影响长期效果的决策变量。该预设确定条件可以根据实际应用场景进行设置,例如,该预设确定条件可以为确定离散决策变量;该第二参数可以理解为基于预设确定条件从待处理调度策略中确定出的离散决策变量,固定参数可以理解为无法进行修改或调整的参数。Among them, the first parameter can be understood as a decision variable that affects long-term effects. The preset determination condition can be set according to the actual application scenario, for example, the preset determination condition can be to determine a discrete decision variable; the second parameter can be understood as a discrete decision determined from the scheduling strategy to be processed based on the preset determination condition Variables, fixed parameters can be understood as parameters that cannot be modified or adjusted.
沿用上例,其中,该策略评估结果包括该待处理调度策略的评估结果可以为1分;变量调整后的待处理调度策略A的评估结果可以为0.3分,变量调整后的待处理调度策略B的评估结果可以为0.9分,基于此,通过该策略评估结果确定出待处理调度策略中发电机B的发电量(100兆瓦)确定为影响长期效果的决策变量。并基于该预设确定条件将该动作At中发电机A的发电量(50兆瓦)确定为离散决策变量。Following the above example, the evaluation result of the policy includes that the evaluation result of the scheduling policy to be processed can be 1 point; the evaluation result of the scheduling policy A to be processed after variable adjustment can be 0.3 points, and the scheduling policy B to be processed after variable adjustment The evaluation result of can be 0.9 points. Based on this, the power generation of generator B (100 MW) in the dispatching strategy to be processed is determined as the decision variable affecting the long-term effect through the evaluation result of the strategy. And based on the preset determination condition, the power generation capacity (50 megawatts) of the generator A in the action At is determined as a discrete decision variable.
之后将该待处理调度策略中影响长期效果的决策变量和离散决策变量设置为固定参数,也即是不可进行修改和调整的决策变量,从而获得初始调度策略。Afterwards, the decision variables and discrete decision variables that affect the long-term effect in the pending scheduling strategy are set as fixed parameters, that is, decision variables that cannot be modified and adjusted, so as to obtain the initial scheduling strategy.
本说明书实施例中,通过将待处理调度策略中基于策略评估结果确定的第一参数以及基于预设确定条件确定的第二参数为固定参数,获得初始调度策略,从而降低后续目标决策对象的工作量,提高该目标决策对象的处理速度。In the embodiment of this specification, the initial scheduling strategy is obtained by setting the first parameter determined based on the strategy evaluation result and the second parameter determined based on the preset determination condition in the scheduling strategy to be processed as fixed parameters, thereby reducing the work of the subsequent target decision-making object increase the processing speed of the target decision object.
步骤106:将所述初始调度策略输入目标决策对象,获得所述目标能源系统的目标调度策略。Step 106: Input the initial dispatch strategy into the target decision object to obtain the target dispatch strategy of the target energy system.
在实际应用中,该初始决策对象能够非常快速的基于目标能源系统的当前运行状态数据,生成对该目标能源系统的当前运行状态数据进行调整的调度策略,但是,该初始决策对象可能不善于处理含有硬性约束条件的问题,其中,该硬性约束条件可以根据实际应用场景进行设计,例如,在目标能源系统为电力系统的情况下,硬性约束条件可以为电力系统中的线路不能够出现越限的情况、或者电网中的电压不能超过预设电压阈值的情况。In practical applications, the initial decision object can very quickly generate a scheduling strategy based on the current operating state data of the target energy system to adjust the current operating state data of the target energy system. However, the initial decision object may not be good at dealing with Problems with hard constraints, where the hard constraints can be designed according to actual application scenarios, for example, when the target energy system is a power system, the hard constraints can be that the lines in the power system cannot exceed the limit situation, or the situation that the voltage in the grid cannot exceed the preset voltage threshold.
基于此,本说明书提供的数据处理方法,在初始决策对象基于当前运行状态数据生成初始调度策略之后,将该初始调度策略输入目标决策对象中,通过该目标决策对象对初始调度策略进行调整,避免因为调整策略导致目标能源系统违反硬性约束条件的情况发生。Based on this, the data processing method provided in this manual, after the initial decision-making object generates an initial scheduling strategy based on the current operating status data, inputs the initial scheduling strategy into the target decision-making object, and adjusts the initial scheduling strategy through the target decision-making object to avoid The situation that the target energy system violates the hard constraints occurs because of the adjustment strategy.
其中,该目标决策对象可以理解为能够基于初始调度策略获得目标能源系统的目标调度策略的对象,在实际应用中,该目标决策对象可以为数学模型、电子设备、或者应用程 序等。为了便于理解,下述均以初始决策对象为一种数学模型为例,对本说明书提供的数据处理方法进行说明。Wherein, the target decision-making object can be understood as an object that can obtain the target energy system's target scheduling strategy based on the initial scheduling strategy. In practical applications, the target decision-making object can be a mathematical model, an electronic device, or an application program sequence etc. For ease of understanding, the data processing method provided in this specification is described below by taking the initial decision object as an example of a mathematical model.
沿用上例,其中,该初始调度策略中的特定变量(离散决策变量与影响长期效果的决策变量)以及已经被进行固定。基于此,将该初始调度策略输入至数学模型中,通过数学模型对初始调度策略中未被固定的决策变量进行调整和修改,来进行进一步的单步优化以及物理安全约束的保障,从而获得目标调度策略。Using the above example, the specific variables (discrete decision variables and decision variables affecting long-term effects) in the initial scheduling strategy have been fixed. Based on this, the initial scheduling strategy is input into the mathematical model, and the unfixed decision variables in the initial scheduling strategy are adjusted and modified through the mathematical model to carry out further single-step optimization and guarantee of physical safety constraints, so as to obtain the target Scheduling strategy.
在初始调度策略可以为发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为100兆瓦的情况下。其中,该发电机A的发电量(50兆瓦)、发电机C的发电量(100兆瓦)已经进行固定。基于此,将该初始调度策略输入至数学模型中,通过该数学模型对初始调度策略中发电机B的发电量(100兆瓦)进行调整,从而获得目标调度策略。该目标调度策略可以为:发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为80兆瓦、将发电机C的发电量设置为100兆瓦。In the initial scheduling strategy, generator A can be turned on, and the generator A can be set to generate the minimum quota (50 MW), the power generation of generator B can be set to 100 MW, and the power generation of generator C can be set to 100 case of megawatts. Among them, the power generation capacity of the generator A (50 megawatts) and the power generation capacity of the generator C (100 megawatts) have been fixed. Based on this, the initial dispatch strategy is input into the mathematical model, and the power generation of generator B (100 MW) in the initial dispatch strategy is adjusted through the mathematical model, so as to obtain the target dispatch strategy. The target scheduling strategy can be: generator A is turned on, and the generator A is set to generate a minimum quota of power generation (50 megawatts), the power generation of generator B is set to 80 megawatts, and the power generation of generator C is set to 100 megawatts.
从而实现对该初始动作策略进行单步优化以及物理安全约束的保障,避免强化学习的决策结果存在安全隐患的问题,同时因为已经固定了一部分决策变量的结果,进一步提高了数学模型的处理速度。In this way, the single-step optimization of the initial action strategy and the guarantee of physical safety constraints can be realized, and the problem of potential safety hazards in the decision-making results of reinforcement learning can be avoided. At the same time, the processing speed of the mathematical model is further improved because the results of some decision variables have been fixed.
步骤108:基于所述目标调度策略对所述目标能源系统的所述当前运行状态数据进行调整。Step 108: Adjust the current operating state data of the target energy system based on the target dispatch strategy.
沿用上例,在目标调度策略为:发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为80兆瓦、将发电机C的发电量设置为100兆瓦的情况下。基于该目标调度策略将电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为80兆瓦、将发电机C的发电量设置为100兆瓦。Using the above example, the target scheduling strategy is: Generator A is turned on, and the generator A is set to generate electricity with a minimum quota (50 MW), the power generation of generator B is set to 80 MW, and the power generation of generator C is set to 80 MW. The case where the amount is set to 100 MW. Based on the target scheduling strategy, turn on generator A in the power system, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to The amount is set to 100 MW.
本说明书提供的数据处理方法中,通过初始决策对象以及目标决策对象,对目标能源系统的当前运行状态数据进行处理,获得该目标能源系统的目标调度策略,并通过该目标调度策略快速的目标能源系统的当前运行状态数据进行调整,使得目标能源系统具备大规模处理,以及快速响应的能力,缓解人工调度方案所面临的困境。In the data processing method provided in this specification, the current operating state data of the target energy system is processed through the initial decision object and the target decision object, and the target energy dispatching strategy of the target energy system is obtained, and the target energy resource is quickly obtained through the target scheduling strategy. The current operating status data of the system is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.
在本说明书提供的一实施例中,当获得针对电力系统的目标调度策略后,能够将当前的目标调度策略、以及该电力系统的当前运行状态数据进行保存。当后续获得与该当前运行状态数据对应的运行状态数据的情况下,将保存的目标调度策略作为初始值,从而基于该初始值快速的生成新的调度策略,从而提高生成该调度策略的效率,基于此,在基于深度强化学习模型确定针对该电力系统的调度策略的过程中,该深度学习强化模型能够参考历史保存的调度策略,通过该历史保存的调度策略以及该电力系统的当前运行状态数据,获得针对该电力系统的调度策略。In an embodiment provided in this specification, after the target dispatch strategy for the power system is obtained, the current target dispatch strategy and the current operating state data of the power system can be saved. When the running state data corresponding to the current running state data is subsequently obtained, the saved target scheduling strategy is used as an initial value, so as to quickly generate a new scheduling strategy based on the initial value, thereby improving the efficiency of generating the scheduling strategy, Based on this, in the process of determining the dispatch strategy for the power system based on the deep reinforcement learning model, the deep learning reinforcement model can refer to the dispatch strategy saved in history, through the dispatch strategy saved in history and the current operating status data of the power system , to obtain the dispatch strategy for the power system.
此外,在将电力系统的当前运行状态数据输入深度学习模型,获得针对该电力系统的调度策略后,还可以将该调度策略发送给策略更新对象,并基于该策略更新对象基于当前针对电力系统的需求条件对该调度策略进行修改,并将该修改后的调度策略输入至数学模型中生成目标调度策略,从而提高调度策略与电力系统的适应性。具体实现方式如下。In addition, after inputting the current operating state data of the power system into the deep learning model and obtaining the dispatching strategy for the power system, the dispatching strategy can also be sent to the policy update object, and based on the policy update object based on the current power system The demand condition modifies the dispatching strategy, and the modified dispatching strategy is input into the mathematical model to generate the target dispatching strategy, so as to improve the adaptability of the dispatching strategy and the power system. The specific implementation is as follows.
所述将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略,包括:The inputting the current operating state data into the initial decision-making object to obtain the initial scheduling strategy of the target energy system includes:
确定历史运行状态数据与所述当前运行状态数据的相似度,将所述相似度中最大相似度对应的历史运行状态数据,确定为所述当前运行状态数据的相似运行状态数据; determining the similarity between the historical operating state data and the current operating state data, and determining the historical operating state data corresponding to the maximum similarity among the similarities as similar operating state data to the current operating state data;
获取所述相似运行状态数据对应的历史调度策略,其中,所述历史调度策略为历史基于目标决策对象获得的目标调度策略;Obtaining a historical scheduling policy corresponding to the similar operating status data, wherein the historical scheduling policy is a historical target scheduling policy obtained based on a target decision object;
将所述当前运行状态数据以及所述历史调度策略输入至初始决策对象,获得所述目标能源系统的待更新调度策略;Inputting the current operating state data and the historical scheduling strategy into the initial decision object to obtain the updated scheduling strategy of the target energy system;
将所述目标能源系统的待更新调度策略发送至策略更新对象;Send the to-be-updated scheduling policy of the target energy system to the policy update object;
接收所述策略更新对象发送的初始调度策略,其中,所述初始调度策略为所述策略更新对象基于预设更新条件对所述待更新调度策略进行更新获得。An initial scheduling policy sent by the policy update object is received, wherein the initial scheduling policy is obtained by updating the scheduling policy to be updated based on preset update conditions by the policy update object.
其中,历史运行状态数据可以理解为历史获取并保存的该目标能源系统的运行状态数据。该历史调度策略可以理解为历史保存的目标调度策略。相似度可以理解为表示历史运行状态数据与当前运行状态数据之间相似程度的数值。例如,[0,1]或者[0,100]区间内任意数值。相似运行状态数据可以理解为与当前运行状态数据相似程度最高的历史运行状态数据。待更新调度策略可以理解为需要策略更新对象进行更新的调度策略,该策略更新对象可以理解为能够基于预设更新条件对待更新调度策略进行更新的对象,例如,该策略更新对象可以理解为电力系统的运维人员、电力系统的运维机器人、神经网络模型或者程序。该预设更新条件可以理解为针对该电力系统的需求,例如,针对电力系统的供电负荷需求、针对电力系统的用电负荷需求,针对电力系统的输电线路的电压需求、针对电力系统中特定设备的开启/关闭需求等。在实际应用中,该需求可以根据实际应用场景进行设置,本说明书对此不做具体限制。Wherein, the historical operation status data can be understood as the operation status data of the target energy system acquired and saved in history. The historical scheduling strategy can be understood as a target scheduling strategy for historical preservation. The similarity can be understood as a numerical value indicating the similarity between the historical operating state data and the current operating state data. For example, any value in the interval [0, 1] or [0, 100]. Similar operating state data can be understood as historical operating state data that is most similar to current operating state data. The scheduling strategy to be updated can be understood as a scheduling strategy that needs to be updated by a policy update object. The policy update object can be understood as an object that can update the scheduling strategy to be updated based on preset update conditions. For example, the policy update object can be understood as a power system The operation and maintenance personnel of the power system, the operation and maintenance robot of the power system, the neural network model or the program. The preset update condition can be understood as the demand for the power system, for example, for the power supply load demand of the power system, for the power load demand of the power system, for the voltage demand of the transmission line of the power system, for specific equipment in the power system on/off requirements, etc. In actual application, this requirement can be set according to the actual application scenario, and this specification does not make specific restrictions on it.
具体地,在获得目标能源系统的当前运行状态数据之后,能够获取历史保存的历史运行状态数据,并确定该历史运行状态数据与该当前运行状态的相似度,该历史运行状态数据可以为一个或多个。在实际应用中,该相似度的方式可以通过神经网络模型、程序或者机器人等工具获得。Specifically, after obtaining the current operating state data of the target energy system, it is possible to obtain historically saved historical operating state data and determine the similarity between the historical operating state data and the current operating state. The historical operating state data can be one or Multiple. In practical applications, the similarity method can be obtained by tools such as neural network models, programs, or robots.
从每个历史运行状态数据与当前运行状态数据的相似度中,确定出最大的相似度;并将该最大相识度对应的历史运行状态数据,作为与当前运行状态数据的相似运行状态数据。From the similarities between each historical operating state data and the current operating state data, the maximum similarity is determined; and the historical operating state data corresponding to the maximum degree of acquaintance is used as similar operating state data to the current operating state data.
确定该相似运行状态数据所对应的历史调度策略,并将该历史调度策略以及目标能源系统的当前运行状态数据输入至初始决策对象,从而获得该目标能源系统的待更新调度策略。Determine the historical dispatching strategy corresponding to the similar operating state data, and input the historical dispatching strategy and the current operating state data of the target energy system into the initial decision object, so as to obtain the updated dispatching strategy of the target energy system.
之后将该待更新调度策略发送至策略更新对象,并接收到策略更新对象返回的初始调度策略。其中,该策略更新对象能够基于预设更新条件对该待更新调度策略进行更新,获得更新后的调度策略,并将该更新后的调度策略作为初始调度策略。Then send the scheduling policy to be updated to the policy update object, and receive the initial scheduling policy returned by the policy update object. Wherein, the policy update object can update the scheduling policy to be updated based on preset update conditions, obtain an updated scheduling policy, and use the updated scheduling policy as an initial scheduling policy.
沿用上例,在获取到该电力系统的当前运行状态数据之后,能够获取历史保存的该电力系统的历史运行状态数据;并确定该历史运行状态数据与当前运行状态数据的相似度;通过将该相似度进行降序排序,并基于该相似度的排序结果从相似度中确定出最大相似度,并将该最大相似度对应的历史运行状态数据,作为与当前运行状态数据相似程度最高的历史运行状态数据,即相似运行状态数据。Following the above example, after obtaining the current operating state data of the power system, it is possible to obtain the historically saved historical operating state data of the power system; and determine the similarity between the historical operating state data and the current operating state data; The similarity is sorted in descending order, and the maximum similarity is determined from the similarity based on the sorting result of the similarity, and the historical operation status data corresponding to the maximum similarity is taken as the historical operation status with the highest similarity to the current operation status data Data, that is, similar operating status data.
从保存的历史调度策略中,确定出相似运行状态对应的历史调度策略。其中,该历史调度策略是深度强化学习模型以及数据模型,基于该相似运行状态数据所生成的调度策略。From the saved historical scheduling policies, determine the historical scheduling policies corresponding to similar running states. Wherein, the historical scheduling strategy is a deep reinforcement learning model and a data model, and a scheduling strategy generated based on the similar running state data.
将该历史调度策略作为深度强化学习模型的参考值,通过将该历史调度策略以及电力系统的当前运行状态数据输入至深度强化学习模型中,从而获得针对该电力系统的调度策略。The historical scheduling strategy is used as the reference value of the deep reinforcement learning model, and the scheduling strategy for the power system is obtained by inputting the historical scheduling strategy and the current operating state data of the power system into the deep reinforcement learning model.
此外,在获得针对该电力系统的调度策略之后,为了使得该调度策略与电力系统的当前需求较为匹配,还需要将该调度策略发送至电力系统的运维人员,该运维人员根据该电 力系统的当前需求(供电电荷需求),对该调度策略进行修改,从而获得修改后的调度策略,并将修改后的调度策略,作为需要数学模型再次进行处理的调度策略发送至决策调度模块中。In addition, after obtaining the scheduling strategy for the power system, in order to match the scheduling strategy with the current demand of the power system, it is necessary to send the scheduling strategy to the operation and maintenance personnel of the power system. According to the current demand of the power system (power supply charge demand), the scheduling strategy is modified to obtain the modified scheduling strategy, and the modified scheduling strategy is sent to the decision-making scheduling module as a scheduling strategy that needs to be processed again by the mathematical model .
该决策调度能够接收电力系统的运维人员返回的调度策略,后续能够将该调度策略输入至数学模型中进行处理。The decision-making scheduling can receive the scheduling strategy returned by the operation and maintenance personnel of the power system, and then the scheduling strategy can be input into the mathematical model for processing.
本说明书实施例中,将历史运行状态数据与当前运行状态数据之间的最大相似度所对应的历史运行状态数据,确定为当前运行状态数据的相似运行状态数据;并将该相似运行状态数据对应的历史调度策略以及当前运行状态数据输入至初始决策对象,快速的获得目标能源系统的待更新调度策略;提高了生成调度策略的效率。In the embodiment of this specification, the historical operating state data corresponding to the maximum similarity between the historical operating state data and the current operating state data is determined as the similar operating state data of the current operating state data; and the similar operating state data is corresponding to The historical scheduling strategy and the current operating status data are input to the initial decision object, and the scheduling strategy to be updated of the target energy system is quickly obtained; the efficiency of generating the scheduling strategy is improved.
并且,通过将目标能源系统的待更新调度策略发送至策略更新对象,并接收策略更新对象基于预设更新条件对待更新调度策略进行更新获得的初始调度策略,便于提高后续生成的调度策略与电力系统的适应性。Moreover, by sending the scheduling strategy to be updated of the target energy system to the strategy update object, and receiving the initial scheduling strategy obtained by updating the scheduling strategy to be updated based on the preset update conditions, it is convenient to improve the subsequent generated scheduling strategy and the power system. adaptability.
在本说明书提供的实施例中,在基于初始决策对象以及目标决策对象生成目标调度策略之前,还需要生成该初始决策对象,从而实现后续基于初始决策对象以及目标决策对象获得该目标能源系统的目标调度策略,具体实现方式如下。In the embodiments provided in this specification, before generating the target dispatch strategy based on the initial decision object and the target decision object, the initial decision object needs to be generated, so as to achieve the goal of obtaining the target energy system based on the initial decision object and the target decision object Scheduling strategy, the specific implementation is as follows.
所述获取目标能源系统的当前运行状态数据之前,还包括步骤一至步骤三:Before the acquisition of the current operating state data of the target energy system, steps 1 to 3 are also included:
步骤一:基于状态模拟模块确定模拟运行状态数据。Step 1: Determine the simulated running state data based on the state simulation module.
其中,该状态模拟模块可以理解为能够模型目标能源系统的当前运行状态数据的模块,例如,该状态模拟模块可以为仿真器。对应的,该模拟运行状态数据为状态模拟模块模拟出的运行状态数据,例如,在目标能源系统为电力系统的情况下,该状态模拟模块可以为基于仿真器模拟出的电力系统的运行状态数据。例如,在目标能源系统为石油开采系统的情况下,该状态模拟模块可以为基于仿真器模拟出的石油开采系统的运行状态数据。Wherein, the state simulation module can be understood as a module capable of modeling the current operating state data of the target energy system, for example, the state simulation module can be a simulator. Correspondingly, the simulated operating state data is the operating state data simulated by the state simulation module. For example, in the case where the target energy system is a power system, the state simulation module may be the operating state data of the power system simulated by the simulator . For example, in the case that the target energy system is an oil extraction system, the state simulation module may be the operating state data of the oil extraction system simulated based on a simulator.
此外,在初始决策对象为深度强化学习模型的情况下,获得初始决策对象的过程可以理解为对该深度强化学习模型进行训练的过程,从而获得训练完成的深度强化学习模型。基于此,所述基于状态模拟模块确定模拟运行状态数据,包括:In addition, when the initial decision object is a deep reinforcement learning model, the process of obtaining the initial decision object can be understood as the process of training the deep reinforcement learning model, so as to obtain the trained deep reinforcement learning model. Based on this, the state-based simulation module determines the simulated running state data, including:
基于状态模拟模块确定样本运行状态数据。Sample operating state data is determined based on the state simulation module.
具体地,将状态模拟模块的模拟运行状态数据,作为对模型进行训练的样本运行状态数据。Specifically, the simulated running state data of the state simulation module is used as sample running state data for training the model.
步骤二:将所述模拟运行状态数据输入待处理决策对象,获得模拟调度策略。Step 2: Input the simulation operation state data into the decision object to be processed to obtain a simulation scheduling strategy.
其中,待处理决策对象可以理解为待训练的深度强化学习模型。对应的,该模拟调度策略可以理解为深度强化学习模型对模拟运行状态数据进行计算后获得的调度策略。在本说明书一实施例中,该待处理决策对象中可以包含策略确定模块,基于此,该模拟调度策略可以理解为策略确定模块基于模拟运行状态数据生成的调度策略。Among them, the decision object to be processed can be understood as a deep reinforcement learning model to be trained. Correspondingly, the simulation scheduling strategy can be understood as the scheduling strategy obtained after the deep reinforcement learning model calculates the simulation running status data. In an embodiment of the present specification, the decision object to be processed may include a strategy determination module. Based on this, the simulation scheduling strategy can be understood as a scheduling strategy generated by the strategy determination module based on the simulated running status data.
具体地,所述将所述模拟运行状态数据输入待处理决策对象,获得模拟调度策略,包括:Specifically, the input of the simulated running status data into the decision object to be processed to obtain a simulated scheduling strategy includes:
将所述样本运行状态数据输入待处理决策模型,获得模拟调度策略。The sample running status data is input into the decision-making model to be processed to obtain a simulation scheduling strategy.
沿用上例,在对深度强化学习模型进行训练的过程中,需要将仿真器模拟的电力系统的运行状态数据作为样本数据,并基于该样本数据输入至待训练的深度强化学习模型中进行计算,从而获得该深度强化学习模型输出的调度策略。Following the above example, in the process of training the deep reinforcement learning model, it is necessary to use the operating state data of the power system simulated by the simulator as sample data, and input the sample data into the deep reinforcement learning model to be trained for calculation. Thus, the scheduling policy output by the deep reinforcement learning model is obtained.
在实际应用中,深度强化学习模型中包括策略确定子模型以及策略评价子模型,基于此,将所述模拟运行状态数据输入待处理决策对象,获得模拟调度策略,也可以理解为将 模拟运行状态数据输入待处理决策对象的策略确定子模型中,通过该策略确定子模型对该模拟调度策略进行计算,获得模拟调度策略。In practical applications, the deep reinforcement learning model includes a policy determination sub-model and a policy evaluation sub-model. Based on this, inputting the simulated operation status data into the decision-making object to be processed to obtain the simulation scheduling strategy can also be understood as The simulated running status data is input into the strategy determination sub-model of the decision object to be processed, and the simulation scheduling strategy is calculated through the strategy determination sub-model to obtain the simulation scheduling strategy.
步骤三:基于所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象。Step 3: Process the decision object to be processed based on the simulated scheduling strategy to obtain an initial decision object.
其中,在初始决策对象为深度强化学习模型的情况下,该初始决策对象可以理解为训练完成的深度强化学习模型。基于此,所述基于所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象,包括:Wherein, in the case that the initial decision object is a deep reinforcement learning model, the initial decision object can be understood as a trained deep reinforcement learning model. Based on this, the processing of the decision object to be processed based on the simulation scheduling strategy to obtain an initial decision object includes:
基于所述模拟调度策略对所述待处理决策模型进行训练,获得初始决策模型。The decision model to be processed is trained based on the simulated scheduling strategy to obtain an initial decision model.
沿用上例,通过待训练的深度强化学习模型获得模拟调度策略之后,通过该模拟调度策略对该深度强化学习模型进行训练,直至达到训练完成条件。Following the above example, after the simulation scheduling policy is obtained through the deep reinforcement learning model to be trained, the deep reinforcement learning model is trained through the simulation scheduling policy until the training completion condition is met.
具体地,所述基于所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象,包括:Specifically, the processing of the decision object to be processed based on the simulation scheduling strategy to obtain an initial decision object includes:
基于所述状态模拟模块对所述模拟调度策略进行评估,获得模拟策略评估结果;Evaluating the simulation scheduling strategy based on the state simulation module to obtain a simulation strategy evaluation result;
基于所述模拟策略评估结果以及所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象。Based on the evaluation result of the simulation policy and the simulation scheduling policy, the decision object to be processed is processed to obtain an initial decision object.
其中,该模拟策略评估结果可以理解为状态模拟模块对模拟调度策略的评估结果,例如模拟策略评估结果可以为评估分值。Wherein, the simulation strategy evaluation result can be understood as the evaluation result of the simulation scheduling strategy by the state simulation module, for example, the simulation strategy evaluation result can be an evaluation score.
沿用上例,在待训练的深度强化学习模型输出模拟调度策略之后,将模拟调度策略输入至仿真器中。通过仿真器能够对该模拟调度策略进行评估,从而获得该模拟调度策略的模拟策略评估结果。其中,模拟策略评估结果为评估分值。Following the above example, after the deep reinforcement learning model to be trained outputs the simulated scheduling policy, the simulated scheduling policy is input into the simulator. The simulated scheduling strategy can be evaluated through the simulator, so as to obtain the simulated strategy evaluation result of the simulated scheduling strategy. Wherein, the evaluation result of the simulation strategy is the evaluation score.
在实际应用中,当获得模拟策略评估结果之后,仿真器继续生成新的模拟运行状态数据,通过模拟运行状态数据继续对该深度强化学习模型进行训练,重复执行该操作,直至深度强化学习模型到达训练停止条件。In practical applications, after obtaining the simulation policy evaluation results, the simulator continues to generate new simulated running status data, and continues to train the deep reinforcement learning model through the simulated running status data, and repeats this operation until the deep reinforced learning model reaches Training stop condition.
其中,该训练停止条件可以基于模拟策略评估结果确定;在确定模拟策略评估结果满足训练停止条件的情况下,则可以确定深度强化学习模型已完成的训练。比如,模拟策略评估结果可以为区间[0,1]中任意分值,其中0分表示模拟调度策略的效果较差,1分表示模拟调度策略的效果较好。基于此,当仿真器对近10次连续的模拟调度策略进行评估的过程中,若近10次连续的模拟调度策略均为1分,则表示该深度强化学习模型以到达训练停止条件。Wherein, the training stop condition can be determined based on the simulation strategy evaluation result; when it is determined that the simulation strategy evaluation result satisfies the training stop condition, it can be determined that the training of the deep reinforcement learning model has been completed. For example, the evaluation result of the simulated strategy can be any score in the interval [0,1], where 0 means that the effect of the simulated scheduling strategy is poor, and 1 means that the effect of the simulated scheduling strategy is better. Based on this, when the simulator evaluates the nearly 10 consecutive simulated scheduling strategies, if the nearly 10 consecutive simulated scheduling strategies are all 1 point, it means that the deep reinforcement learning model has reached the training stop condition.
在实际应用中,深度强化学习模型可以包括2个子模型,一个是策略确定子模型,另一个是策略评估模型,因此,对深度强化学习模型的过程,可以理解为对该策略确定子模型以及策略评估模型的训练过程,基于训练完成的策略确定子模型以及策略评估模型确定出训练完成的深度强化学习模型,具体实现方式如下。In practical applications, the deep reinforcement learning model can include two sub-models, one is the policy determination sub-model, and the other is the policy evaluation model. Therefore, the process of the deep reinforcement learning model can be understood as determining the sub-model for the strategy and the strategy In the training process of the evaluation model, the trained deep reinforcement learning model is determined based on the trained policy determination sub-model and the policy evaluation model. The specific implementation method is as follows.
所述基于所述模拟策略评估结果以及所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象,包括:The processing of the decision object to be processed based on the evaluation result of the simulation strategy and the simulation scheduling strategy to obtain an initial decision object includes:
基于所述模拟策略评估结果对所述待处理决策对象中的策略确定模块进行处理,获得处理后的所述策略确定模块;Processing the policy determination module in the pending decision object based on the simulated policy evaluation result to obtain the processed policy determination module;
基于所述模拟策略评估结果以及所述模拟调度策略,对所述待处理决策对象中的策略评估模块进行处理,获得处理后的所述策略评估模块;Based on the simulated policy evaluation result and the simulated scheduling policy, process the policy evaluation module in the pending decision object to obtain the processed policy evaluation module;
基于处理后的所述策略确定模块以及处理后的所述策略评估模块,确定初始决策对象。An initial decision object is determined based on the processed policy determination module and the processed policy evaluation module.
其中,在待处理决策对象为深度强化学习模型的情况下,该策略确定模块可以为策略确定子模型,该策略评估模块为策略评估子模型。 Wherein, when the decision object to be processed is a deep reinforcement learning model, the policy determination module may be a policy determination sub-model, and the policy evaluation module may be a policy evaluation sub-model.
沿用上例,策略确定子模型在输出模拟调度策略之后,将该模拟调度策略输入至仿真器中。通过仿真器能够对该模拟调度策略进行评估,从而获得该模拟调度策略的策略评估结果。Following the above example, after the policy determination sub-model outputs the simulation scheduling policy, it inputs the simulation scheduling policy into the simulator. The simulated scheduling policy can be evaluated through the simulator, so as to obtain the policy evaluation result of the simulated scheduling policy.
在实际应用中,当获得该模拟调度策略的策略评估结果之后,仿真器继续生成新的模拟运行状态数据,通过新的模拟运行状态数据继续对该策略确定子模型进行训练,重复执行该操作,直至策略确定子模型到达训练停止条件。In practical applications, after obtaining the strategy evaluation result of the simulated scheduling strategy, the simulator continues to generate new simulated operating state data, and continues to train the policy determination sub-model through the new simulated operating state data, and repeats this operation. Until the strategy determines that the sub-model reaches the training stop condition.
其中,该训练停止条件可以基于策略评估结果确定;在确定策略评估结果满足训练停止条件的情况下,则可以确定该策略确定子模型已完成的训练。例如,策略评估结果可以为区间[0,1]中任意分值,其中0分表示模拟调度策略的效果较差,1分表示模拟调度策略的效果较好。基于此,当仿真器对近10次连续的模拟调度策略进行评估的过程中,若近10次连续的模拟调度策略均为1分,则表示该策略确定子模型以到达训练停止条件。Wherein, the training stop condition may be determined based on the strategy evaluation result; if it is determined that the strategy evaluation result satisfies the training stop condition, then it may be determined that the strategy determines that the sub-model has completed training. For example, the policy evaluation result can be any score in the interval [0,1], where 0 means that the effect of the simulated scheduling strategy is poor, and 1 means that the effect of the simulated scheduling strategy is better. Based on this, when the simulator evaluates the nearly 10 consecutive simulated scheduling strategies, if the nearly 10 consecutive simulated scheduling strategies are all 1 point, it means that the strategy determines the sub-model to reach the training stop condition.
而对策略评估子模型的训练过程中,可以将策略确定子模型输出的模拟调度策略作为样本数据,将该模拟调度策略的模拟策略评价结果作为样本标签,通过样本数据和样本标签对策略评估子模型进行训练,直至策略评估子模型达到收敛,从而确定该策略评估子模型达到训练停止条件。In the training process of the policy evaluation sub-model, the simulated scheduling policy output by the policy determination sub-model can be used as sample data, and the simulated policy evaluation result of the simulated scheduling policy can be used as a sample label, and the policy evaluation sub-model can be evaluated by sample data and sample labels. The model is trained until the policy evaluation sub-model reaches convergence, so it is determined that the policy evaluation sub-model meets the training stop condition.
本说明书实施例提供的数据处理方法中,将状态模拟模块确定的模拟运行状态数据输入待处理决策对象,获得模拟调度策略;并基于模拟调度策略对待处理决策对象进行处理,获得初始决策对象。从而实现后续基于初始决策对象以及目标决策对象获得该目标能源系统的目标调度策略,进一步通过该目标调度策略快速的对目标能源系统的当前运行状态数据进行调整,使得目标能源系统具备大规模处理,以及快速响应的能力,缓解人工调度方案所面临的困境。In the data processing method provided by the embodiment of this specification, the simulated operation status data determined by the state simulation module is input into the decision object to be processed to obtain the simulation scheduling strategy; and the decision object to be processed is processed based on the simulation scheduling strategy to obtain the initial decision object. In this way, the target scheduling strategy of the target energy system can be obtained based on the initial decision object and the target decision object, and the current operating state data of the target energy system can be quickly adjusted through the target scheduling strategy, so that the target energy system can be processed on a large scale. And the ability to respond quickly, alleviating the difficulties faced by manual scheduling solutions.
下述结合附图2,以本说明书提供的数据处理方法在对电力系统进行实时调度场景下的应用为例,对所述数据处理方法进行进一步说明。其中,图2示出了本说明书一个实施例提供的一种数据处理方法的处理过程流程图。The data processing method will be further described below by taking the application of the data processing method provided in this specification in the real-time dispatching scenario of the electric power system as an example in conjunction with the accompanying drawing 2 . Wherein, FIG. 2 shows a flowchart of a processing process of a data processing method provided by an embodiment of this specification.
参见图2可知,本说明书提供的在对电力系统进行实时调度场景下的数据处理方法,分为两个部分:离线训练部分、在线实时\准实时调度部分。其中,该离线训练部分是在对电力系统进行实时调度之前,预先对调度过程中使用的深度强化学习模型(DRL)进行训练,从而实现后续通过训练完成的深度强化模块生成调度策略。而在线实时\准实时调度部分,则是在完成深度强化学习模型的训练之后,通过数学建模与强化学习相结合的方式,通过数据建模获得的ACOPF模型或者DCOPF模型,与深度强化学习模型(DRL)相结合,共同基于电力系统的真实环境数据,生成针对该电力系统的调度决策结果;Referring to Fig. 2, it can be seen that the data processing method provided in this specification in the real-time dispatching scenario of the power system is divided into two parts: an offline training part, and an online real-time/quasi-real-time dispatching part. Among them, the offline training part is to pre-train the deep reinforcement learning model (DRL) used in the scheduling process before the real-time scheduling of the power system, so as to realize the subsequent deep reinforcement module generation scheduling strategy completed through training. In the online real-time/quasi-real-time scheduling part, after completing the training of the deep reinforcement learning model, through the combination of mathematical modeling and reinforcement learning, the ACOPF model or DCOPF model obtained through data modeling is combined with the deep reinforcement learning model. (DRL) are combined to generate dispatch decision results for the power system based on the real environmental data of the power system;
具体地,本说明书提供的在对电力系统进行实时调度场景下的数据处理方法,先使用仿真器对深度强化学习模型进行离线训练,如图2中“离线训练”虚线框中所示。具体包括以下步骤。Specifically, the data processing method provided in this specification in the real-time scheduling scenario of the power system first uses the emulator to perform offline training on the deep reinforcement learning model, as shown in the dotted box of "offline training" in Figure 2 . Specifically include the following steps.
步骤202:基于仿真器提供的样本数据,对深度强化学习模型中的动作决策模型进行训练。Step 202: Based on the sample data provided by the simulator, train the action decision model in the deep reinforcement learning model.
其中,该仿真器可以理解为上述实施例中的状态模拟模块;样本数据可以理解为上述实施例中的模拟运行状态数据;深度强化学习模型可以理解为上述实施例中的初始决策对象;动作决策模型可以理解为上述实施例中的策略确定模块。Wherein, the simulator can be understood as the state simulation module in the above embodiment; the sample data can be understood as the simulated running state data in the above embodiment; the deep reinforcement learning model can be understood as the initial decision object in the above embodiment; action decision The model can be understood as the policy determination module in the above embodiments.
本实施例中的深度强化学习模型本身会有2个子模型,一个是动作策略模型(Actor),另一个是动作评价模型(Critic)。该动作策略模型可以理解为上述实施例中的策略确定子模型;该动作评价模型可以理解为上述实施例中的策略评估模型。 The deep reinforcement learning model in this embodiment itself has two sub-models, one is an action policy model (Actor), and the other is an action evaluation model (Critic). The action policy model can be understood as the policy determination sub-model in the above embodiment; the action evaluation model can be understood as the policy evaluation model in the above embodiment.
具体地,仿真器会模拟出电力系统的当前状态St;将该模拟的当前状态St作为训练样本,并将该训练样本输入至待训练的动作策略模型中,对该动作决策模型进行训练,该待训练的动作决策模型在接收到当前状态St后,能够进行动作策略的响应,从而输出动作At,其中,该当前状态St可以理解为上述实施例中的模拟当前运行状态数据,动作At可以理解为上述实施例中的模拟调度策略。Specifically, the simulator will simulate the current state St of the power system; use the simulated current state St as a training sample, and input the training sample into the action strategy model to be trained to train the action decision model, the After receiving the current state St, the action decision-making model to be trained can respond to the action strategy, thereby outputting the action At, wherein the current state St can be understood as the simulated current operating state data in the above-mentioned embodiment, and the action At can be understood as is the simulation scheduling strategy in the above embodiment.
步骤204:通过仿真器对动作决策模型输出的动作进行评估。Step 204: Evaluate the action output by the action decision model through the simulator.
具体地,在待训练的动作策略模型输出动作At之后,将该动作At输入至仿真器中。通过仿真器能够对该动作At进行评估,从而获得该动作At的即时收益Rt(奖励)。其中,该即时收益Rt可以理解为上述实施例中的模拟策略评估结果。Specifically, after the action policy model to be trained outputs the action At, the action At is input into the simulator. The action At can be evaluated by the simulator, so as to obtain the immediate income Rt (reward) of the action At. Wherein, the immediate return Rt can be understood as the simulation strategy evaluation result in the above embodiment.
在实际应用中,当获得该动作At的即时收益Rt之后,仿真器继续生成新的当前状态St+1,通过当前状态St+1继续对该动作策略模型进行训练,重复执行该操作,直至动作策略模型到达训练停止条件。In practical applications, after obtaining the immediate income Rt of the action At, the simulator continues to generate a new current state St+1, continues to train the action policy model through the current state St+1, and repeats this operation until the action The policy model reaches the training stop condition.
其中,该训练停止条件可以基于即时收益Rt确定;在确定即时收益Rt满足训练停止条件的情况下,则可以确定该动作策略模型已完成的训练。例如,即时收益Rt可以为区间[0,1]中任意分值。基于此,当仿真器对近10次连续的动作At进行评估的过程中,若近10次连续的动作At均为1分,则表示该动作决策模型以到达训练停止条件。Wherein, the training stop condition can be determined based on the immediate gain Rt; when it is determined that the immediate gain Rt satisfies the training stop condition, it can be determined that the training of the action strategy model has been completed. For example, the immediate return Rt can be any score in the interval [0,1]. Based on this, when the simulator evaluates the nearly 10 consecutive actions At, if the nearly 10 consecutive actions At are all 1 point, it means that the action decision model has reached the training stop condition.
步骤206:基于动作决策模型输出的动以及该动作的即时收益对动作评价模型进行训练。Step 206: Train the action evaluation model based on the action output by the action decision model and the immediate benefits of the action.
其中,该动作评估模型可以理解为上述实施例中的策略评估模型。该动作评估模型的作用是去评估在状态St采取动作At后,能获得的平均收益是多少,即包括了当前的即时收益Rt,也包含了未来可能的平均收益。Wherein, the action evaluation model can be understood as the policy evaluation model in the above embodiment. The function of the action evaluation model is to evaluate the average income that can be obtained after taking the action At in the state St, which includes the current immediate income Rt and the possible average income in the future.
具体地,将动作决策模型输出的动作At作为样本数据,将该动作的即时收益作为样本标签,通过样本数据和样本标签对动作评价模型进行训练,直至动作评价模型达到收敛,从而确定该动作评价模型达到训练停止条件。Specifically, the action At output by the action decision-making model is used as sample data, and the immediate income of the action is used as the sample label, and the action evaluation model is trained through the sample data and sample labels until the action evaluation model reaches convergence, so as to determine the action evaluation The model reaches the training stop condition.
在实际应用中,动作决策模型的输出(动作At)可以作为动作评价模型的输入,基于此,在对动作评价模型进行训练的过程中,在通过动作At以及对应的即时收益Rt对该动作评价模型进行训练的过程中,还可以继续基于动作决策模型对当前状态St+1进行处理后获得的动作At+1,以及该动作At+1对应的即时收益Rt+1对该动作评价模型进行训练。In practical applications, the output of the action decision model (action At) can be used as the input of the action evaluation model. Based on this, in the process of training the action evaluation model, the action is evaluated by the action At and the corresponding immediate income Rt In the process of model training, the action evaluation model can also be trained based on the action At+1 obtained after processing the current state St+1 based on the action decision-making model, and the immediate income Rt+1 corresponding to the action At+1 .
步骤208:对训练好的深度强化学习模型进行在线部署。Step 208: Deploy the trained deep reinforcement learning model online.
通过与仿真器的不断交互、动作空间的不断探索,再加上合理设计的模型结构与学习策略,深度强化学习模型可以通过离线训练的方式来学习到优秀的动作策略模型以及对应的动作评价模型。当训练完成后,便可以对训练好的深度强化学习模型进行在线部署。便于后续基于训练好的深度强化学习模型实时的生成电力系统的调度策略。Through continuous interaction with the simulator, continuous exploration of the action space, coupled with a reasonably designed model structure and learning strategy, the deep reinforcement learning model can learn excellent action strategy models and corresponding action evaluation models through offline training. . After the training is completed, the trained deep reinforcement learning model can be deployed online. It is convenient for subsequent real-time generation of power system dispatching strategies based on the trained deep reinforcement learning model.
具体地,本说明书提供的在对电力系统进行实时调度场景下的数据处理方法,在对深度强化学习模型进行训练之后,能够基于该深度强化学习模型进行在线实时或准实时调度,在进行在线实时或准实时调度时,可以通过深度强化学习模型以及数学模型基于观测到的真实的电网环境状态进行动作决策的响应。如图2中的“在线实时\准实时调度”虚线框中所示;具体包括如下步骤。Specifically, the data processing method provided in this specification under the scenario of real-time scheduling of the power system, after training the deep reinforcement learning model, can perform online real-time or quasi-real-time scheduling based on the deep reinforcement learning model. Or during quasi-real-time scheduling, the deep reinforcement learning model and mathematical model can be used to respond to action decisions based on the observed real power grid environment state. As shown in the dotted box of "online real-time\quasi-real-time scheduling" in Figure 2; specifically, the following steps are included.
步骤210:获取真实的电网环境状态。Step 210: Obtain the real grid environment status.
其中,该真实的电网环境状态可以理解为上述实施例中的当前状态运行数据;Wherein, the real power grid environment state can be understood as the current state operation data in the above embodiment;
具体地,通过对真实的电网环境状态进行观测,获得真实的电网环境状态St,并将真实的电网环境状态St输入至深度强化学习模型。其中,该电网环境状态St可以理解为上 述实施例中的当前运行状态数据。Specifically, by observing the real grid environment state, the real grid environment state St is obtained, and the real grid environment state St is input into the deep reinforcement learning model. Among them, the grid environment state St can be understood as the above The current running status data in the above-mentioned embodiment.
步骤212:深度强化学习模型基于真实的电网环境状态,获得初始动作策略。Step 212: The deep reinforcement learning model obtains an initial action strategy based on the real power grid environment state.
其中,该初始动作策略可以理解为初始调度策略。Wherein, the initial action strategy can be understood as an initial scheduling strategy.
具体地,通过深度强化学习模型中的动作策略模型,响应与真实的电网环境状态St获得动作At,该动作At可以为将包含(发电机A、发电机B、发电机C)电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为100兆瓦。其中,该动作At可以理解为上述实施例中的初始调度策略。Specifically, through the action strategy model in the deep reinforcement learning model, the action At is obtained in response to the real power grid environment state St, and the action At can be the power system that will contain (generator A, generator B, generator C) The generator A is turned on, and the generator A is set to generate electricity with a minimum limit (50 MW), the power generation of the generator B is set to 100 MW, and the power generation of the generator C is set to 100 MW. Wherein, the action At may be understood as the initial scheduling policy in the above embodiment.
对该动作At中的变量进行调整,获得变量调整后的动作A1、以及动作A2,该动作A1可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为80兆瓦;The variable in the action At is adjusted to obtain the variable-adjusted action A1 and action A2. The action A1 can be: set the generator A in the power system to perform minimum quota power generation (50 MW), and set the power generation The power generation of generator B is set to 100 MW, and the power generation of generator C is set to 80 MW;
动作A2可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为70兆瓦、将发电机C的发电量设置为100兆瓦。Action A2 can be: set generator A in the power system to generate the minimum quota (50 MW), set the power generation of generator B to 70 MW, and set the power generation of generator C to 100 MW .
通过动作评价模型对动作At以及变量调整后的动作A1、以及动作A2进行评价,获得即时收益Rt;其中,动作At的即时收益Rt为1分,以及变量调整后的动作A1的即时收益Rt为0.3分,以及动作A2的即时收益Rt为0.9分。Use the action evaluation model to evaluate the action At, the variable-adjusted action A1, and the action A2 to obtain the immediate income Rt; where the immediate income Rt of the action At is 1 point, and the immediate income Rt of the variable-adjusted action A1 is 0.3 points, and the immediate benefit Rt of action A2 is 0.9 points.
基于该即时收益Rt,将该动作At中发电机B的发电量(100兆瓦)确定为影响长期效果的决策变量。并将该动作At中发电机A的发电量(50兆瓦)确定为离散决策变量。Based on the immediate revenue Rt, the power generation (100 MW) of the generator B in the action At is determined as a decision variable affecting the long-term effect. And determine the generating capacity of generator A (50 MW) in this action At as a discrete decision variable.
将动作At中的离散决策变量与影响长期效果的决策变量进行固定,也即是将离散决策变量与影响长期效果的决策变量设置为不可修改,从而获得初始动作策略,并将该初始动作策略输入至数学模型(ACOPF模型或者DCOPF模型)中。Fix the discrete decision variable and the decision variable affecting the long-term effect in the action At, that is, set the discrete decision variable and the decision variable affecting the long-term effect as unmodifiable, so as to obtain the initial action strategy, and input the initial action strategy To the mathematical model (ACOPF model or DCOPF model).
在具体实施的过程中,在进行在线调度时,利用深度强化学习已学习到的动作评价模型Critic,来判断哪些决策变量会更加影响长期效果,具体的可以对动作中每一个连续决策变量进行单独扰动,来观测Critic输出的结果变化,如果变化较大,则可以认为该决策变量对未来收益的影响较大。并且,可以从长期收益最大化的角度来应对机组组合、爬坡、以及N-1电网安全准则等问题。In the process of specific implementation, when performing online scheduling, use the action evaluation model Critic that has been learned by deep reinforcement learning to judge which decision variables will affect the long-term effect more. Specifically, each continuous decision variable in the action can be individually Disturbance, to observe the change of Critic output results, if the change is large, it can be considered that the decision variable has a greater impact on future earnings. Moreover, issues such as unit combination, ramping, and N-1 power grid safety criteria can be dealt with from the perspective of long-term profit maximization.
步骤214:通过数学模型对初始动作策略中的其他变量进行调整,从而获得调度决策结果。Step 214: Adjust other variables in the initial action strategy through a mathematical model, so as to obtain a scheduling decision result.
其中,该数学模型可以理解为上述实施例中的目标决策模型,该调度决策结果可以理解为上述实施例中的目标调度策略。Wherein, the mathematical model can be understood as the target decision model in the above embodiment, and the scheduling decision result can be understood as the target scheduling policy in the above embodiment.
具体地,在将动作At中的离散决策变量与影响长期效果的决策变量进行固定,获得初始动作策略之后,通过数学模型对初始动作策略中的其余变量进行调整和修改,来进行进一步的单步优化以及物理安全约束的保障,从而获得目标调度策略。Specifically, after fixing the discrete decision variables in the action At and the decision variables that affect the long-term effect and obtaining the initial action strategy, the remaining variables in the initial action strategy are adjusted and modified through the mathematical model to perform a further single-step Optimization and the guarantee of physical security constraints, so as to obtain the target scheduling strategy.
比如,初始动作策略可以为发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为100兆瓦。其中,该发电机A的发电量(50兆瓦)、发电机C的发电量(100兆瓦)已经进行固定。基于此,将该初始动作策略输入至数学模型中,通过该数学模型对初始动作策略中发电机B的发电量(100兆瓦)进行调整,从而获得目标调度策略。从而对该初始动作策略进行单步优化以及物理安全约束的保障,避免强化学习的决策结果存在安全隐患的问题,同时因为已经固定了一部分决策变量的结果,进一步提高了数学模型的处理速度。For example, the initial action strategy can be to turn on generator A, and set the generator A to generate the minimum quota (50 megawatts), set the power generation of generator B to 100 megawatts, and set the power generation of generator C to 100 megawatts. Among them, the power generation capacity of the generator A (50 megawatts) and the power generation capacity of the generator C (100 megawatts) have been fixed. Based on this, the initial action strategy is input into the mathematical model, and the power generation of generator B (100 MW) in the initial action strategy is adjusted through the mathematical model, so as to obtain the target dispatching strategy. In this way, the single-step optimization of the initial action strategy and the guarantee of physical safety constraints can avoid the problem of potential safety hazards in the decision-making results of reinforcement learning. At the same time, because the results of some decision variables have been fixed, the processing speed of the mathematical model is further improved.
其中,该目标调度策略可以为:发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为80兆瓦、将发电机C的发电量设置为100兆瓦。 Among them, the target scheduling strategy can be: turn on generator A, and set the generator A to generate electricity with a minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to Set to 100 MW.
步骤216:基于该调度决策结果对电力系统的当前运行状态进行调整。Step 216: Adjust the current operating state of the power system based on the scheduling decision result.
具体地,基于该调度决策结果将电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为80兆瓦、将发电机C的发电量设置为100兆瓦。Specifically, based on the scheduling decision result, generator A in the power system is turned on, and the generator A is set to perform minimum quota power generation (50 megawatts), the power generation of generator B is set to 80 megawatts, and the generator C's power generation is set to 100 MW.
在实际应用中,还包括当更关注决策响应后的即时收益时,例如必须立即对电网的越线线路进行恢复,上述强化学习模型的输出结果(初始动作策略)中除离散决策变量外,其他全部决策变量都可以在数学建模方法ACOPF/DCOPF中进行重新调整,这样更容易进行当前收益的优化。In practical applications, it also includes when more attention is paid to the immediate benefits after the decision-making response, for example, the over-the-line line of the power grid must be restored immediately. In addition to the discrete decision variables, other All decision variables can be readjusted in the mathematical modeling method ACOPF/DCOPF, which makes it easier to optimize the current benefits.
本说明书提供的数据处理方法,将数学建模ACOPF/DCOPF与强化学习相结合,在强化学习输出结果的基础上,通过固定离散决策变量与影响长期效果的决策变量的结果,来对剩余决策变量进行数学建模的求解,即可以加快数学建模的求解速度,保障求解结果满足安全约束,同时因为有强化学习的影响,还能够兼顾长期目标的优化,最终决策的耗时可以控制在5分钟以内,基本达到了实时/准实时调度的效果,使得电网安全运行具备了实时调度的能力。同时因为有强化学习的参与,整个调度决策的结果还综合考虑了长期整体的收益。The data processing method provided in this manual combines mathematical modeling ACOPF/DCOPF with reinforcement learning. On the basis of the output results of reinforcement learning, the remaining decision variables are calculated by fixing discrete decision variables and the results of decision variables that affect long-term effects. Solving mathematical modeling can speed up the solving of mathematical modeling and ensure that the solution results meet safety constraints. At the same time, due to the influence of reinforcement learning, it can also take into account the optimization of long-term goals, and the final decision-making time can be controlled within 5 minutes. Within this period, the effect of real-time/quasi-real-time dispatching is basically achieved, making the safe operation of the power grid capable of real-time dispatching. At the same time, because of the participation of reinforcement learning, the results of the entire scheduling decision also take into account the long-term overall benefits.
图3示出了根据本说明书一个实施例提供的另一种数据处理方法的流程图,具体包括以下步骤。Fig. 3 shows a flow chart of another data processing method provided according to an embodiment of the present specification, which specifically includes the following steps.
步骤302:获取目标电力系统的当前运行状态数据。Step 302: Obtain current operating state data of the target power system.
步骤304:将所述当前运行状态数据输入初始决策对象,获得所述目标电力系统的初始调度策略。Step 304: Input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system.
步骤306:将所述初始调度策略输入目标决策对象,获得所述目标电力系统的目标调度策略。Step 306: Input the initial dispatch strategy into the target decision object to obtain the target dispatch strategy of the target power system.
步骤308:基于所述目标调度策略对所述目标电力系统的所述当前运行状态数据进行调整。Step 308: Adjust the current operating state data of the target power system based on the target dispatch strategy.
具体地,本说明书提供的另一种数据处理方法中,能够获取到目标电力系统的当前运行状态,并将该当前运行状态数据输入初始决策对象中,获得该目标电力系统的初始调度策略;之后将该初始调度策略输入至目标决策对象,获得目标电力系统的目标调度策略。并基于该目标调度策略对目标电力系统的当前运行状态数据进行调整,从而目标电力系统具备大规模处理,以及快速响应的能力,缓解人工调度方案所面临的困境。Specifically, in another data processing method provided in this specification, the current operating state of the target power system can be obtained, and the current operating state data is input into the initial decision object to obtain the initial dispatching strategy of the target power system; then Input the initial dispatching strategy into the target decision object to obtain the target dispatching strategy of the target power system. And based on the target scheduling strategy, the current operating status data of the target power system is adjusted, so that the target power system has the ability of large-scale processing and rapid response, and alleviates the difficulties faced by the manual scheduling scheme.
沿用上例,本说明书应用于调整电力系统的当前运行状态的场景下的另一种数据处理方法,能够获取到电力系统的当前运行状态参数,该当前运行状态参数可以为电力系统的当前用电负荷较高,并且该电力系统中3台发电机(发电机A、发电机B、发电机C)中发电机A为关闭状态、发电机B、发电机C为开启状态。Following the above example, another data processing method applied in this specification to the scene of adjusting the current operating state of the power system can obtain the current operating state parameters of the power system, and the current operating state parameters can be the current power consumption of the power system. The load is high, and among the three generators (generator A, generator B, and generator C) in the power system, generator A is off, and generator B and generator C are on.
在获得电力系统的当前运行状态数据之后,将该当前运行状态数据输入至深度强化学习模型中,基于该深度强化学习模型能够基于该当前运行状态数据对电力系统的当前状态进行判断,从而生成调度策略。比如该深度强化学习模型基于当前运行状态数据判断电力系统的当前用电负荷较高的情况下,可以确定出能够应对该电力系统的用电负荷较高情况的调度策略。该深度强化学习模型输出的调度策略可以为,将电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为90兆瓦。After the current operating state data of the power system is obtained, the current operating state data is input into the deep reinforcement learning model, and based on the deep reinforcement learning model, the current state of the power system can be judged based on the current operating state data, thereby generating a schedule Strategy. For example, when the deep reinforcement learning model judges that the current power load of the power system is high based on the current operating status data, it can determine a dispatch strategy that can cope with the high power load of the power system. The scheduling strategy output by the deep reinforcement learning model can be as follows: turn on generator A in the power system, set the generator A to generate the minimum quota of power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 90 megawatts.
在深度强化学习模型输出的调度策略之后,能够对该调度策略中的特定变量(离散决策变量与影响长期效果的决策变量)进行固定,从而获得初始调度策略。将该初始调度策 略输入至数学模型中,通过数学模型对初始调度策略中未被固定的决策变量进行调整和修改,来进行进一步的单步优化以及物理安全约束的保障,从而获得目标调度策略。其中,该目标调度策略为:发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为80兆瓦、将发电机C的发电量设置为100兆瓦的情况下。After the scheduling policy output by the deep reinforcement learning model, specific variables (discrete decision variables and decision variables affecting long-term effects) in the scheduling policy can be fixed to obtain the initial scheduling policy. the initial scheduling policy The unfixed decision variables in the initial scheduling strategy are adjusted and modified through the mathematical model to carry out further single-step optimization and guarantee of physical safety constraints, so as to obtain the target scheduling strategy. Among them, the target scheduling strategy is: turn on generator A, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to for the case of 100 MW.
基于该目标调度策略将电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为80兆瓦、将发电机C的发电量设置为100兆瓦。Based on the target scheduling strategy, turn on generator A in the power system, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to The amount is set to 100 MW.
在本说明书提供的一实施例中,该该初始决策对象中还可以包括两个子模块,分别为策略确定模块以及策略评估模块;基于此,可以通过初始决策对象中的策略确定模块以及策略评估模块对标电力系统的当前运行状态数据进行处理,获得目标电力系统的初始调度策略,从而提高后续目标决策对象的处理效率,进一步使得目标电力系统能够具备大规模处理,以及快速响应的能力,具体实现方式如下。In an embodiment provided in this specification, the initial decision object may also include two sub-modules, which are respectively a strategy determination module and a strategy evaluation module; based on this, the strategy determination module and the strategy evaluation module in the initial decision object may Process the current operating status data of the target power system to obtain the initial dispatch strategy of the target power system, thereby improving the processing efficiency of subsequent target decision-making objects, and further enabling the target power system to have large-scale processing and rapid response capabilities. The way is as follows.
所述将所述当前运行状态数据输入初始决策对象,获得所述目标电力系统的初始调度策略,包括:The inputting the current operating state data into the initial decision-making object to obtain the initial dispatch strategy of the target power system includes:
将所述当前运行状态数据输入初始决策对象的策略确定模块,获得待处理调度策略;Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;
基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果;The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
基于所述待处理调度策略以及对应的策略评估结果确定所述目标电力系统的初始调度策略。An initial dispatch strategy of the target power system is determined based on the dispatch strategy to be processed and a corresponding strategy evaluation result.
沿用上例,该策略确定模块为深度强化学习模型中的策略确定子模型。基于此,在获得电力系统的当前运行状态数据之后,将该当前运行状态数据输入至深度强化学习模型中的策略确定子模型中,基于该策略确定子模型在电力系统的当前用电负荷较高的情况下,确定出能够应对该电力系统的用电负荷情况的待处理调度策略。该策略确定子模型输出的调度策略可以为,将电力系统中的发电机A开启,并设置该发电机A进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为100兆瓦。Following the above example, the policy determination module is a policy determination sub-model in the deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the strategy determination sub-model in the deep reinforcement learning model, and the current power consumption load of the power system is relatively high based on the strategy determination sub-model. In the case of , determine the dispatch strategy to be processed that can cope with the power load of the power system. The scheduling strategy for determining the output of the sub-model in this strategy can be as follows: turn on generator A in the power system, set the generator A to generate electricity with a minimum quota (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of generator C to 100 megawatts.
策略确定子模型在基于电力系统的当前运行状态数据获得待处理调度策略之后,能够确定该待处理调度策略中能够进行调整的变量,该变量可以为发电机的发电量。After the strategy determination sub-model obtains the dispatch strategy to be processed based on the current operating state data of the power system, it can determine the variable that can be adjusted in the dispatch strategy to be processed, and the variable can be the power generation of the generator.
并通过基于预先设定的对变量进行修改规则,对待处理调度策略中发电机的发电量进行调整,从而获得变量调整后的待处理调度策略,变量调整后的待处理调度策略A可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为100兆瓦、将发电机C的发电量设置为80兆瓦。变量调整后的待处理调度策略B可以为:将电力系统中的发电机A设置为进行最小限额发电(50兆瓦)、将发电机B的发电量设置为70兆瓦、将发电机C的发电量设置为100兆瓦。And by modifying the variables based on the preset rules, the power generation of the generators in the dispatching strategy to be processed is adjusted, so as to obtain the dispatching strategy to be processed after variable adjustment. The dispatching strategy A to be processed after variable adjustment can be: Generator A in the power system is set to generate the minimum quota (50 MW), generator B is set to generate 100 MW, and generator C is set to generate 80 MW. The scheduling strategy B to be processed after variable adjustment can be: set generator A in the power system to produce the minimum quota power generation (50 MW), set the power generation of generator B to 70 MW, set generator C’s The power generation is set to 100 MW.
在对待处理调度策略中的变量进行调整之后,分别将待处理调度策略、变量调整后的待处理调度策略A、变量调整后的待处理调度策略B,分别输入至决策评估子模块中进行评估,从而获得评估结果;该待处理调度策略的评估结果可以为1分;变量调整后的待处理调度策略A的评估结果可以为0.3分,变量调整后的待处理调度策略B的评估结果可以为0.9分。之后将三个评估结果作为待处理数据对应的策略评估结果。After the variables in the scheduling strategy to be processed are adjusted, the scheduling strategy to be processed, the variable-adjusted pending scheduling strategy A, and the variable-adjusted pending scheduling strategy B are respectively input into the decision evaluation sub-module for evaluation, Thus, the evaluation result is obtained; the evaluation result of the scheduling strategy to be processed can be 1 point; the evaluation result of the scheduling strategy A to be processed after the variable adjustment can be 0.3 points, and the evaluation result of the scheduling strategy B to be processed after the variable adjustment can be 0.9 point. Afterwards, the three evaluation results are used as the strategy evaluation results corresponding to the data to be processed.
通过该策略评估结果确定出待处理调度策略中发电机B的发电量(100兆瓦)确定为影响长期效果的决策变量。并基于该预设确定条件将该动作At中发电机A的发电量(50兆瓦)确定为离散决策变量。According to the evaluation result of the strategy, it is determined that the power generation of generator B (100 MW) in the scheduling strategy to be processed is determined as a decision variable affecting the long-term effect. And based on the preset determination condition, the power generation capacity (50 megawatts) of the generator A in the action At is determined as a discrete decision variable.
之后将该待处理调度策略中影响长期效果的决策变量和离散决策变量设置为固定参 数,也即是不可进行修改和调整的决策变量,从而获得初始调度策略。Then set the decision variables and discrete decision variables that affect the long-term effect in the pending scheduling strategy as fixed parameters The number, that is, the decision variable that cannot be modified and adjusted, so as to obtain the initial scheduling strategy.
本说明书提供的另一种数据处理方法,通过初始决策对象以及目标决策对象,对目标能源系统的当前运行状态数据进行处理,获得该目标电力系统的目标调度策略,并通过该目标调度策略快速的对目标电力系统的当前运行状态数据进行调整,使得目标电力系统具备大规模处理,以及快速响应的能力,缓解人工调度方案所面临的困境。Another data processing method provided in this specification is to process the current operating state data of the target energy system through the initial decision object and the target decision object, obtain the target dispatching strategy of the target power system, and use the target dispatching strategy to quickly Adjusting the current operating status data of the target power system enables the target power system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual dispatching schemes.
上述为本实施例的另一种数据处理方法的示意性方案。需要说明的是,该另一种数据处理方法的技术方案与上述的一种数据处理方法的技术方案属于同一构思,另一种数据处理方法的技术方案未详细描述的细节内容,均可以参见上述一种数据处理方法的技术方案的描述。The foregoing is a schematic solution of another data processing method in this embodiment. It should be noted that the technical solution of the other data processing method belongs to the same idea as the technical solution of the above-mentioned one data processing method, and the details of the technical solution of the other data processing method that are not described in detail can be referred to above A description of the technical solution of a data processing method.
与上述方法实施例相对应,本说明书还提供了一种数据处理装置实施例,图4示出了本说明书一个实施例提供的一种数据处理装置的结构示意图。如图4所示,该装置包括:Corresponding to the foregoing method embodiments, this specification also provides an embodiment of a data processing device. FIG. 4 shows a schematic structural diagram of a data processing device provided by an embodiment of this specification. As shown in Figure 4, the device includes:
数据获取模块402,被配置为获取目标能源系统的当前运行状态数据;The data acquisition module 402 is configured to acquire the current operating state data of the target energy system;
第一策略获取模块404,被配置为将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略;The first strategy acquisition module 404 is configured to input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target energy system;
第二策略获取模块406,被配置为将所述初始调度策略输入目标决策对象,获得所述目标能源系统的目标调度策略;The second strategy acquisition module 406 is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target energy system;
调整模块408,被配置为基于所述目标调度策略对所述目标能源系统的所述当前运行状态数据进行调整。The adjustment module 408 is configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.
可选地,所述第一策略获取模块404,被配置为:Optionally, the first policy acquisition module 404 is configured to:
将所述当前运行状态数据输入初始决策对象的策略确定模块,获得待处理调度策略;Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;
基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果;The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
基于所述待处理调度策略以及对应的策略评估结果确定初始调度策略。An initial scheduling policy is determined based on the pending scheduling policy and a corresponding policy evaluation result.
可选地,所述第一策略获取模块404,被配置为:Optionally, the first policy acquisition module 404 is configured to:
基于预设参数调整规则对所述待处理调度策略中的决策参数进行修改,获得调整后的待处理调度策略;modifying the decision parameters in the scheduling strategy to be processed based on preset parameter adjustment rules to obtain the adjusted scheduling strategy to be processed;
将所述待处理调度策略以及所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的策略评估结果。Inputting the scheduling policy to be processed and the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object to obtain a policy evaluation result of the scheduling policy to be processed.
可选地,所述第一策略获取模块404,被配置为:Optionally, the first policy acquisition module 404 is configured to:
将所述待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的第一评估结果;inputting the scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a first evaluation result of the scheduling policy to be processed;
将所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述调整后的待处理调度策略的第二评估结果;inputting the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a second evaluation result of the adjusted scheduling policy to be processed;
基于所述第一评估结果以及所述第二评估结果确定所述待处理调度策略的策略评估结果。Determine a policy evaluation result of the scheduling policy to be processed based on the first evaluation result and the second evaluation result.
可选地,所述第一策略获取模块404,被配置为:Optionally, the first policy acquisition module 404 is configured to:
基于所述策略评估结果确定所述待处理调度策略中的第一参数;determining a first parameter in the pending scheduling policy based on the policy evaluation result;
基于预设确定条件确定所述待处理调度策略中的第二参数;determining a second parameter in the scheduling policy to be processed based on preset determination conditions;
将所述待处理调度策略中所述第一参数以及所述第二参数设置为固定参数,获得初始调度策略。 Setting the first parameter and the second parameter in the pending scheduling strategy as fixed parameters to obtain an initial scheduling strategy.
可选地,所述数据处理装置还包括处理模块,被配置为:Optionally, the data processing device further includes a processing module configured to:
基于状态模拟模块确定模拟运行状态数据;Determining simulated running state data based on the state simulation module;
将所述模拟运行状态数据输入待处理决策对象,获得模拟调度策略;Inputting the simulated running status data into the decision object to be processed to obtain a simulated scheduling strategy;
基于所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象。The decision object to be processed is processed based on the simulation scheduling strategy to obtain an initial decision object.
可选地,所述处理模块,还被配置为:Optionally, the processing module is further configured to:
基于所述状态模拟模块对所述模拟调度策略进行评估,获得模拟策略评估结果;Evaluating the simulation scheduling strategy based on the state simulation module to obtain a simulation strategy evaluation result;
基于所述模拟策略评估结果以及所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象。Based on the evaluation result of the simulation policy and the simulation scheduling policy, the decision object to be processed is processed to obtain an initial decision object.
可选地,所述处理模块,还被配置为:Optionally, the processing module is further configured to:
基于所述模拟策略评估结果对所述待处理决策对象中的策略确定模块进行处理,获得处理后的所述策略确定模块;Processing the policy determination module in the pending decision object based on the simulated policy evaluation result to obtain the processed policy determination module;
基于所述模拟策略评估结果以及所述模拟调度策略,对所述待处理决策对象中的策略评估模块进行处理,获得处理后的所述策略评估模块;Based on the simulated policy evaluation result and the simulated scheduling policy, process the policy evaluation module in the pending decision object to obtain the processed policy evaluation module;
基于处理后的所述策略确定模块以及处理后的所述策略评估模块,确定初始决策对象。An initial decision object is determined based on the processed policy determination module and the processed policy evaluation module.
可选地,所述处理模块,还被配置为:Optionally, the processing module is further configured to:
基于状态模拟模块确定样本运行状态数据。Sample operating state data is determined based on the state simulation module.
可选地,所述处理模块,还被配置为:Optionally, the processing module is further configured to:
将所述样本运行状态数据输入待处理决策模型,获得模拟调度策略。The sample running status data is input into the decision-making model to be processed to obtain a simulation scheduling strategy.
可选地,所述处理模块,还被配置为:Optionally, the processing module is further configured to:
基于所述模拟调度策略对所述待处理决策模型进行训练,获得初始决策模型。The decision model to be processed is trained based on the simulated scheduling strategy to obtain an initial decision model.
可选地,所述第一策略获取模块404,还被配置为Optionally, the first policy acquiring module 404 is further configured to
确定历史运行状态数据与所述当前运行状态数据的相似度,将所述相似度中最大相似度对应的历史运行状态数据,确定为所述当前运行状态数据的相似运行状态数据;determining the similarity between the historical operating state data and the current operating state data, and determining the historical operating state data corresponding to the maximum similarity among the similarities as similar operating state data to the current operating state data;
获取所述相似运行状态数据对应的历史调度策略,其中,所述历史调度策略为历史基于目标决策对象获得的目标调度策略;Obtaining a historical scheduling policy corresponding to the similar operating status data, wherein the historical scheduling policy is a historical target scheduling policy obtained based on a target decision object;
将所述当前运行状态数据以及所述历史调度策略输入至初始决策对象,获得所述目标能源系统的待更新调度策略;Inputting the current operating state data and the historical scheduling strategy into the initial decision object to obtain the updated scheduling strategy of the target energy system;
将所述目标能源系统的待更新调度策略发送至策略更新对象;Send the to-be-updated scheduling policy of the target energy system to the policy update object;
接收所述策略更新对象发送的初始调度策略,其中,所述初始调度策略为所述策略更新对象基于预设更新条件对所述待更新调度策略进行更新获得。An initial scheduling policy sent by the policy update object is received, wherein the initial scheduling policy is obtained by updating the scheduling policy to be updated based on preset update conditions by the policy update object.
本说明书提供的数据处理装置,通过初始决策对象以及目标决策对象,对目标能源系统的当前运行状态数据进行处理,获得该目标能源系统的目标调度策略,并通过该目标调度策略快速的对目标能源系统的当前运行状态数据进行调整,使得目标能源系统具备大规模处理,以及快速响应的能力,缓解人工调度方案所面临的困境。The data processing device provided in this manual processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target energy system’s target scheduling strategy, and quickly implements the target energy system through the target scheduling strategy. The current operating status data of the system is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.
上述为本实施例的一种数据处理装置的示意性方案。需要说明的是,该一种数据处理装置的技术方案与上述的一种数据处理方法的技术方案属于同一构思,一种数据处理装置的技术方案未详细描述的细节内容,均可以参见上述一种数据处理方法的技术方案的描述。The foregoing is a schematic solution of a data processing device in this embodiment. It should be noted that the technical solution of the data processing device and the technical solution of the above-mentioned data processing method belong to the same concept, and details of the technical solution of the data processing device that are not described in detail can be found in the above-mentioned A description of the technical solution of the data processing method.
与上述方法实施例相对应,本说明书还提供了另一种数据处理装置实施例,图5示出了本说明书一个实施例提供的另一种数据处理装置的结构示意图。如图5所示,该装置包 括:Corresponding to the foregoing method embodiments, this specification also provides another embodiment of a data processing device. FIG. 5 shows a schematic structural diagram of another data processing device provided by an embodiment of this specification. As shown in Figure 5, the device package include:
数据获取模块502,被配置为获取目标电力系统的当前运行状态数据;A data acquisition module 502 configured to acquire current operating state data of the target power system;
第一策略获取模块504,被配置为将所述当前运行状态数据输入初始决策对象,获得所述目标电力系统的初始调度策略;The first strategy acquisition module 504 is configured to input the current operating state data into an initial decision object, and obtain an initial dispatch strategy of the target power system;
第二策略获取模块506,被配置为将所述初始调度策略输入目标决策对象,获得所述目标电力系统的目标调度策略;The second strategy acquisition module 506 is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target power system;
调整模块508,被配置为基于所述目标调度策略对所述目标电力系统的所述当前运行状态数据进行调整。The adjustment module 508 is configured to adjust the current operating state data of the target power system based on the target dispatch strategy.
可选地,所述第一策略获取模块504,还被配置为:Optionally, the first policy acquisition module 504 is further configured to:
将所述当前运行状态数据输入初始决策对象的策略确定模块,获得待处理调度策略;Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;
基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果;The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
基于所述待处理调度策略以及对应的策略评估结果确定所述目标电力系统的初始调度策略。An initial dispatch strategy of the target power system is determined based on the dispatch strategy to be processed and a corresponding strategy evaluation result.
本说明书提供的另一种数据处理装置,通过初始决策对象以及目标决策对象,对目标能源系统的当前运行状态数据进行处理,获得该目标电力系统的目标调度策略,并通过该目标调度策略快速的对目标电力系统的当前运行状态数据进行调整,使得目标电力系统具备大规模处理,以及快速响应的能力,缓解人工调度方案所面临的困境。Another data processing device provided in this manual processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target dispatching strategy of the target power system, and quickly Adjusting the current operating status data of the target power system enables the target power system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual dispatching schemes.
上述为本实施例的另一种数据处理装置的示意性方案。需要说明的是,该另一种数据处理装置的技术方案与上述的另一种数据处理方法的技术方案属于同一构思,另一种数据处理装置的技术方案未详细描述的细节内容,均可以参见上述另一种数据处理方法的技术方案的描述。The foregoing is a schematic solution of another data processing device in this embodiment. It should be noted that the technical solution of the other data processing device belongs to the same concept as the technical solution of the above-mentioned another data processing method, and details of the technical solution of the other data processing device that are not described in detail can be found in A description of the technical solution of the above another data processing method.
图6示出了根据本说明书一个实施例提供的一种计算设备600的结构框图。该计算设备600的部件包括但不限于存储器610和处理器620。处理器620与存储器610通过总线630相连接,数据库650用于保存数据。FIG. 6 shows a structural block diagram of a computing device 600 provided according to an embodiment of this specification. Components of the computing device 600 include, but are not limited to, memory 610 and processor 620 . The processor 620 is connected to the memory 610 through the bus 630, and the database 650 is used for saving data.
计算设备600还包括接入设备640,接入设备640使得计算设备600能够经由一个或多个网络660通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备640可以包括有线或无线的任何类型的网络接口(例如,网络接口卡(NIC))中的一个或多个,诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口,等等。Computing device 600 also includes an access device 640 that enables computing device 600 to communicate via one or more networks 660 . Examples of these networks include the Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or a combination of communication networks such as the Internet. Access device 640 may include one or more of any type of network interface (e.g., a network interface card (NIC)), wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, Worldwide Interoperability for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, etc.
在本说明书的一个实施例中,计算设备600的上述部件以及图6中未示出的其他部件也可以彼此相连接,例如通过总线。应当理解,图6所示的计算设备结构框图仅仅是出于示例的目的,而不是对本说明书范围的限制。本领域技术人员可以根据需要,增添或替换其他部件。In an embodiment of the present specification, the above-mentioned components of the computing device 600 and other components not shown in FIG. 6 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 6 is only for the purpose of illustration, rather than limiting the scope of this description. Those skilled in the art can add or replace other components as needed.
计算设备600可以是任何类型的静止或移动计算设备,包括移动计算机或移动计算设备(例如,平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如,智能手机)、可佩戴的计算设备(例如,智能手表、智能眼镜等)或其他类型的移动设备,或者诸如台式计算机或PC的静止计算设备。计算设备600还可以是移动式或静止式的服务器。 Computing device 600 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile telephones (e.g., smartphones), ), wearable computing devices (eg, smart watches, smart glasses, etc.), or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. Computing device 600 may also be a mobile or stationary server.
其中,处理器620用于执行如下计算机可执行指令,该计算机可执行指令被处理器420执行时实现上述数据处理方法的步骤。Wherein, the processor 620 is configured to execute the following computer-executable instructions. When the computer-executable instructions are executed by the processor 420, the steps of the above-mentioned data processing method are implemented.
上述为本实施例的一种计算设备的示意性方案。需要说明的是,该计算设备的技术方案与上述的数据处理方法的技术方案属于同一构思,计算设备的技术方案未详细描述的细节内容,均可以参见上述数据处理方法的技术方案的描述。The foregoing is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device and the above-mentioned technical solution of the data processing method belong to the same concept, and details not described in detail in the technical solution of the computing device can refer to the description of the technical solution of the above-mentioned data processing method.
本说明书一实施例还提供一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现上述数据处理方法的步骤。An embodiment of the present specification also provides a computer-readable storage medium, which stores computer-executable instructions, and implements the steps of the above-mentioned data processing method when the computer-executable instructions are executed by a processor.
上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述的数据处理方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述数据处理方法的技术方案的描述。The foregoing is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium and the above-mentioned technical solution of the data processing method belong to the same idea, and details of the technical solution of the storage medium that are not described in detail can be found in the description of the technical solution of the above-mentioned data processing method.
本说明书一实施例还提供一种计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行上述数据处理方法的步骤。An embodiment of the present specification also provides a computer program, wherein, when the computer program is executed in a computer, the computer is caused to execute the steps of the above data processing method.
上述为本实施例的一种计算机程序的示意性方案。需要说明的是,该计算机程序的技术方案与上述的数据处理方法的技术方案属于同一构思,计算机程序的技术方案未详细描述的细节内容,均可以参见上述数据处理方法的技术方案的描述。The foregoing is a schematic solution of a computer program in this embodiment. It should be noted that the technical solution of the computer program and the technical solution of the above-mentioned data processing method belong to the same concept, and details not described in detail in the technical solution of the computer program can refer to the description of the technical solution of the above-mentioned data processing method.
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.
所述计算机指令包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program code, which may be in source code form, object code form, executable file or some intermediate form or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excludes electrical carrier signals and telecommunication signals.
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本说明书实施例并不受所描述的动作顺序的限制,因为依据本说明书实施例,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本说明书实施例所必须的。It should be noted that, for the sake of simplicity of description, the aforementioned method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the embodiments of this specification are not limited by the described action sequences. Because according to the embodiment of the present specification, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of the specification.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.
以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书实施例的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本说明书实施例的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。 The preferred embodiments of the present specification disclosed above are only for helping to explain the present specification. Alternative embodiments are not exhaustive in all detail, nor are the inventions limited to specific implementations described. Obviously, many modifications and changes can be made according to the contents of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that those skilled in the art can well understand and use this specification. This specification is to be limited only by the claims, along with their full scope and equivalents.

Claims (14)

  1. 一种数据处理方法,包括:A data processing method, comprising:
    获取目标能源系统的当前运行状态数据;Obtain the current operating status data of the target energy system;
    将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略;Inputting the current operating status data into an initial decision object to obtain an initial scheduling strategy for the target energy system;
    将所述初始调度策略输入目标决策对象,获得所述目标能源系统的目标调度策略;Inputting the initial scheduling strategy into the target decision object to obtain the target scheduling strategy of the target energy system;
    基于所述目标调度策略对所述目标能源系统的所述当前运行状态数据进行调整。The current operating state data of the target energy system is adjusted based on the target dispatch strategy.
  2. 根据权利要求1所述的数据处理方法,所述将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略,包括:According to the data processing method according to claim 1, said inputting said current operating state data into an initial decision-making object to obtain an initial scheduling strategy of said target energy system comprises:
    将所述当前运行状态数据输入初始决策对象的策略确定模块,获得待处理调度策略;Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;
    基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果;The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
    基于所述待处理调度策略以及对应的策略评估结果确定初始调度策略。An initial scheduling policy is determined based on the pending scheduling policy and a corresponding policy evaluation result.
  3. 根据权利要求2所述的数据处理方法,所述基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果,包括:According to the data processing method according to claim 2, the policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed, including:
    基于预设参数调整规则对所述待处理调度策略中的决策参数进行修改,获得调整后的待处理调度策略;modifying the decision parameters in the scheduling strategy to be processed based on preset parameter adjustment rules to obtain the adjusted scheduling strategy to be processed;
    将所述待处理调度策略以及所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的策略评估结果。Inputting the scheduling policy to be processed and the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object to obtain a policy evaluation result of the scheduling policy to be processed.
  4. 根据权利要求3所述的数据处理方法,所述将所述待处理调度策略以及所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的策略评估结果,包括:According to the data processing method according to claim 3, said inputting said pending scheduling strategy and said adjusted pending scheduling strategy into the policy evaluation module of said initial decision object, to obtain the strategy of said pending scheduling strategy Assessment results, including:
    将所述待处理调度策略输入所述初始决策对象的策略评估模块,获得所述待处理调度策略的第一评估结果;inputting the scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a first evaluation result of the scheduling policy to be processed;
    将所述调整后的待处理调度策略输入所述初始决策对象的策略评估模块,获得所述调整后的待处理调度策略的第二评估结果;inputting the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a second evaluation result of the adjusted scheduling policy to be processed;
    基于所述第一评估结果以及所述第二评估结果确定所述待处理调度策略的策略评估结果。Determine a policy evaluation result of the scheduling policy to be processed based on the first evaluation result and the second evaluation result.
  5. 根据权利要求2所述的数据处理方法,所述基于所述待处理调度策略以及对应的策略评估结果确定初始调度策略,包括:The data processing method according to claim 2, said determining an initial scheduling strategy based on said scheduling strategy to be processed and a corresponding strategy evaluation result, comprising:
    基于所述策略评估结果确定所述待处理调度策略中的第一参数;determining a first parameter in the pending scheduling policy based on the policy evaluation result;
    基于预设确定条件确定所述待处理调度策略中的第二参数;determining a second parameter in the scheduling policy to be processed based on preset determination conditions;
    将所述待处理调度策略中所述第一参数以及所述第二参数设置为固定参数,获得初始调度策略。Setting the first parameter and the second parameter in the pending scheduling strategy as fixed parameters to obtain an initial scheduling strategy.
  6. 根据权利要求1所述的数据处理方法,所述获取目标能源系统的当前运行状态数据之前,还包括:According to the data processing method according to claim 1, before the acquisition of the current operating state data of the target energy system, further comprising:
    基于状态模拟模块确定模拟运行状态数据;Determining simulated running state data based on the state simulation module;
    将所述模拟运行状态数据输入待处理决策对象,获得模拟调度策略;Inputting the simulated running status data into the decision object to be processed to obtain a simulated scheduling strategy;
    基于所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象。The decision object to be processed is processed based on the simulation scheduling strategy to obtain an initial decision object.
  7. 根据权利要求6所述的数据处理方法,所述基于所述模拟调度策略对所述待处理决 策对象进行处理,获得初始决策对象,包括:According to the data processing method according to claim 6, said processing of said pending solutions based on said simulated scheduling strategy Process the policy object to obtain the initial decision object, including:
    基于所述状态模拟模块对所述模拟调度策略进行评估,获得模拟策略评估结果;Evaluating the simulation scheduling strategy based on the state simulation module to obtain a simulation strategy evaluation result;
    基于所述模拟策略评估结果以及所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象。Based on the evaluation result of the simulation policy and the simulation scheduling policy, the decision object to be processed is processed to obtain an initial decision object.
  8. 根据权利要求6所述的数据处理方法,所述基于所述模拟策略评估结果以及所述模拟调度策略对所述待处理决策对象进行处理,获得初始决策对象,包括:According to the data processing method according to claim 6, the processing of the decision object to be processed based on the evaluation result of the simulation strategy and the simulation scheduling strategy to obtain an initial decision object comprises:
    基于所述模拟策略评估结果对所述待处理决策对象中的策略确定模块进行处理,获得处理后的所述策略确定模块;Processing the policy determination module in the pending decision object based on the simulated policy evaluation result to obtain the processed policy determination module;
    基于所述模拟策略评估结果以及所述模拟调度策略,对所述待处理决策对象中的策略评估模块进行处理,获得处理后的所述策略评估模块;Based on the simulated policy evaluation result and the simulated scheduling policy, process the policy evaluation module in the pending decision object to obtain the processed policy evaluation module;
    基于处理后的所述策略确定模块以及处理后的所述策略评估模块,确定初始决策对象。An initial decision object is determined based on the processed policy determination module and the processed policy evaluation module.
  9. 根据权利要求1所述的数据处理方法,所述将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略,包括:According to the data processing method according to claim 1, said inputting said current operating state data into an initial decision-making object to obtain an initial scheduling strategy of said target energy system comprises:
    确定历史运行状态数据与所述当前运行状态数据的相似度,将所述相似度中最大相似度对应的历史运行状态数据,确定为所述当前运行状态数据的相似运行状态数据;determining the similarity between the historical operating state data and the current operating state data, and determining the historical operating state data corresponding to the maximum similarity among the similarities as similar operating state data to the current operating state data;
    获取所述相似运行状态数据对应的历史调度策略,其中,所述历史调度策略为历史基于目标决策对象获得的目标调度策略;Obtaining a historical scheduling policy corresponding to the similar operating status data, wherein the historical scheduling policy is a historical target scheduling policy obtained based on a target decision object;
    将所述当前运行状态数据以及所述历史调度策略输入至初始决策对象,获得所述目标能源系统的待更新调度策略;Inputting the current operating state data and the historical scheduling strategy into the initial decision object to obtain the updated scheduling strategy of the target energy system;
    将所述目标能源系统的待更新调度策略发送至策略更新对象;Send the to-be-updated scheduling policy of the target energy system to the policy update object;
    接收所述策略更新对象发送的初始调度策略,其中,所述初始调度策略为所述策略更新对象基于预设更新条件对所述待更新调度策略进行更新获得。An initial scheduling policy sent by the policy update object is received, wherein the initial scheduling policy is obtained by updating the scheduling policy to be updated based on preset update conditions by the policy update object.
  10. 一种数据处理方法,包括:A data processing method, comprising:
    获取目标电力系统的当前运行状态数据;Obtain the current operating state data of the target power system;
    将所述当前运行状态数据输入初始决策对象,获得所述目标电力系统的初始调度策略;Inputting the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system;
    将所述初始调度策略输入目标决策对象,获得所述目标电力系统的目标调度策略;Inputting the initial dispatch strategy into a target decision object to obtain a target dispatch strategy of the target power system;
    基于所述目标调度策略对所述目标电力系统的所述当前运行状态数据进行调整。The current operating state data of the target power system is adjusted based on the target dispatch strategy.
  11. 根据权利要求10所述的数据处理方法,所述将所述当前运行状态数据输入初始决策对象,获得所述目标电力系统的初始调度策略,包括:According to the data processing method according to claim 10, said inputting said current operating status data into an initial decision-making object to obtain an initial scheduling strategy of said target power system comprises:
    将所述当前运行状态数据输入初始决策对象的策略确定模块,获得待处理调度策略;Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;
    基于所述初始决策对象的策略评估模块对所述待处理调度策略进行处理,获得所述待处理调度策略对应的策略评估结果;The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;
    基于所述待处理调度策略以及对应的策略评估结果确定所述目标电力系统的初始调度策略。An initial dispatch strategy of the target power system is determined based on the dispatch strategy to be processed and a corresponding strategy evaluation result.
  12. 一种数据处理装置,包括:A data processing device, comprising:
    数据获取模块,被配置为获取目标能源系统的当前运行状态数据;A data acquisition module configured to acquire current operating state data of the target energy system;
    第一策略获取模块,被配置为将所述当前运行状态数据输入初始决策对象,获得所述目标能源系统的初始调度策略;The first strategy acquisition module is configured to input the current operating status data into an initial decision object to obtain an initial dispatch strategy of the target energy system;
    第二策略获取模块,被配置为将所述初始调度策略输入目标决策对象,获得所述目标 能源系统的目标调度策略;The second policy acquisition module is configured to input the initial scheduling policy into the target decision object, and obtain the target Target scheduling strategy for energy systems;
    调整模块,被配置为基于所述目标调度策略对所述目标能源系统的所述当前运行状态数据进行调整。An adjustment module configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.
  13. 一种计算设备,包括:A computing device comprising:
    存储器和处理器;memory and processor;
    所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令,该计算机可执行指令被处理器执行时实现权利要求1至9或者权利要求10至11任意一项所述数据处理方法的步骤。The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the computer-executable instructions are executed by the processor, the computer-executable instructions described in any one of claims 1 to 9 or claims 10 to 11 are implemented. The steps of the data processing method described above.
  14. 一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现权利要求1至9或者权利要求10至11任意一项所述数据处理方法的步骤。 A computer-readable storage medium, which stores computer-executable instructions. When the computer-executable instructions are executed by a processor, the steps of the data processing method described in any one of claims 1 to 9 or claims 10 to 11 are implemented.
PCT/CN2023/072238 2022-01-17 2023-01-16 Data processing method and apparatus WO2023134759A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210046910.8 2022-01-17
CN202210046910.8A CN114066333A (en) 2022-01-17 2022-01-17 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2023134759A1 true WO2023134759A1 (en) 2023-07-20

Family

ID=80231046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072238 WO2023134759A1 (en) 2022-01-17 2023-01-16 Data processing method and apparatus

Country Status (2)

Country Link
CN (1) CN114066333A (en)
WO (1) WO2023134759A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738239A (en) * 2023-08-11 2023-09-12 浙江菜鸟供应链管理有限公司 Model training method, resource scheduling method, device, system, equipment and medium
CN116957362A (en) * 2023-09-18 2023-10-27 国网江西省电力有限公司经济技术研究院 Multi-target planning method and system for regional comprehensive energy system

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114066333A (en) * 2022-01-17 2022-02-18 阿里巴巴达摩院(杭州)科技有限公司 Data processing method and device
CN114338231B (en) * 2022-02-22 2023-10-31 浙江网商银行股份有限公司 Policy processing method and system
CN114666152A (en) * 2022-04-08 2022-06-24 广州能信数字科技有限公司 Method for ensuring accurate and safe transmission of power grid dispatching command
CN114709880B (en) * 2022-06-06 2022-08-30 阿里巴巴达摩院(杭州)科技有限公司 Control method and system of unit in target power grid, storage medium and processor
CN115081940B (en) * 2022-07-21 2022-11-22 阿里巴巴达摩院(杭州)科技有限公司 Resource scheduling method, power resource allocation method and device
CN115953009B (en) * 2023-03-01 2023-07-21 阿里巴巴(中国)有限公司 Scheduling method of power system and training method of scheduling decision model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170104846A1 (en) * 2015-10-13 2017-04-13 Sap Portals Israel Ltd. Managing identical data requests
CN111476622A (en) * 2019-11-21 2020-07-31 北京沃东天骏信息技术有限公司 Article pushing method and device and computer readable storage medium
CN111859039A (en) * 2020-07-16 2020-10-30 河海大学常州校区 Workshop disturbance decision method and device based on improved case reasoning technology
CN113869795A (en) * 2021-10-26 2021-12-31 大连理工大学 Long-term scheduling method for industrial byproduct gas system
CN114066333A (en) * 2022-01-17 2022-02-18 阿里巴巴达摩院(杭州)科技有限公司 Data processing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112670982B (en) * 2020-12-14 2022-11-08 广西电网有限责任公司电力科学研究院 Active power scheduling control method and system for micro-grid based on reward mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170104846A1 (en) * 2015-10-13 2017-04-13 Sap Portals Israel Ltd. Managing identical data requests
CN111476622A (en) * 2019-11-21 2020-07-31 北京沃东天骏信息技术有限公司 Article pushing method and device and computer readable storage medium
CN111859039A (en) * 2020-07-16 2020-10-30 河海大学常州校区 Workshop disturbance decision method and device based on improved case reasoning technology
CN113869795A (en) * 2021-10-26 2021-12-31 大连理工大学 Long-term scheduling method for industrial byproduct gas system
CN114066333A (en) * 2022-01-17 2022-02-18 阿里巴巴达摩院(杭州)科技有限公司 Data processing method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738239A (en) * 2023-08-11 2023-09-12 浙江菜鸟供应链管理有限公司 Model training method, resource scheduling method, device, system, equipment and medium
CN116738239B (en) * 2023-08-11 2023-11-24 浙江菜鸟供应链管理有限公司 Model training method, resource scheduling method, device, system, equipment and medium
CN116957362A (en) * 2023-09-18 2023-10-27 国网江西省电力有限公司经济技术研究院 Multi-target planning method and system for regional comprehensive energy system

Also Published As

Publication number Publication date
CN114066333A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2023134759A1 (en) Data processing method and apparatus
Yang et al. Reinforcement learning in sustainable energy and electric systems: A survey
Cheng et al. A new generation of AI: A review and perspective on machine learning technologies applied to smart energy and electric power systems
Zhou et al. Game-theoretical energy management for energy Internet with big data-based renewable power forecasting
Kuznetsova et al. Reinforcement learning for microgrid energy management
Du et al. Approximating Nash equilibrium in day-ahead electricity market bidding with multi-agent deep reinforcement learning
Kamruzzaman et al. A convolutional neural network-based approach to composite power system reliability evaluation
Urgun et al. Importance sampling using multilabel radial basis classification for composite power system reliability evaluation
CN110837915B (en) Low-voltage load point prediction and probability prediction method for power system based on hybrid integrated deep learning
Duan et al. A deep reinforcement learning based approach for optimal active power dispatch
CN116245033B (en) Artificial intelligent driven power system analysis method and intelligent software platform
CN105279575B (en) Multiple-energy-source main body distributed game optimization method based on generating prediction
Jamshidi et al. Using artificial neural networks and system identification methods for electricity price modeling
Wang et al. Cloud computing and extreme learning machine for a distributed energy consumption forecasting in equipment-manufacturing enterprises
Eissa et al. Assessment of wind power prediction using hybrid method and comparison with different models
Raju et al. Reinforcement learning in adaptive control of power system generation
Zhao et al. A mid-long term load forecasting model based on improved grey theory
Zhou et al. Deriving AC OPF Solutions via Proximal Policy Optimization for Secure and Economic Grid Operation
CN115099136A (en) Mechanism-data fusion drive-based bidirectional wireless charger control method and system
CN115081940A (en) Resource scheduling method, power resource allocation method and device
CN115360768A (en) Power scheduling method and device based on muzero and deep reinforcement learning and storage medium
Nakawiro et al. A combined GA-ANN strategy for solving optimal power flow with voltage security constraint
Li et al. Multiagent deep meta reinforcement learning for sea computing-based energy management of interconnected grids considering renewable energy sources in sustainable cities
Mohammadian et al. Learning solutions for intertemporal power systems optimization with recurrent neural networks
Ge et al. EV Charging Behavior Simulation and Analysis Using Real-World Charging Load Data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23740100

Country of ref document: EP

Kind code of ref document: A1