WO2023134759A1

WO2023134759A1 - Data processing method and apparatus

Info

Publication number: WO2023134759A1
Application number: PCT/CN2023/072238
Authority: WO
Inventors: 杨超; 钮孟洋; 韩佳澦; 杨程; 辛焱
Original assignee: 阿里巴巴达摩院(杭州)科技有限公司
Priority date: 2022-01-17
Filing date: 2023-01-16
Publication date: 2023-07-20
Also published as: CN114066333A

Abstract

Embodiments of the present application provide a data processing method and apparatus. The data processing method comprises: obtaining current operational state data of a target energy system; inputting the current operational state data into an initial decision object to obtain an initial scheduling policy of the target energy system; inputting the initial scheduling policy into a target decision object to obtain a target scheduling policy of the target energy system; and adjusting the current operational state data of the target energy system on the basis of the target scheduling policy. The present invention causes the target energy system to have large-scale processing and quick response capabilities, and difficulties faced by a manual scheduling scheme are solved.

Description

Data processing method and device

This application claims the priority of the Chinese patent application with the application number 202210046910.8 and the application title "data processing method and device" submitted to the China Patent Office on January 17, 2022, the entire contents of which are incorporated in this application by reference.

technical field

The embodiments of this specification relate to the technical field of energy regulation, and in particular to a data processing method.

Background technique

With the continuous development of science and technology, the energy industry is building a new energy system with new energy as the main body, such as a new power system. However, due to the volatility and intermittency of new energy, the power system dispatch must have the ability to respond quickly. , and large-scale scheduling processing capabilities, and the traditional manual scheduling method is obviously unable to meet the above requirements.

Contents of the invention

In view of this, the embodiment of this specification provides a data processing method. One or more embodiments of this specification also relate to another data processing method, a data processing device, another data processing device, a computing device, a computer-readable storage medium, and a computer program to solve Technical defects existing in the prior art.

According to the first aspect of the embodiments of this specification, a data processing method is provided, including:

Obtain the current operating status data of the target energy system;

Inputting the current operating status data into an initial decision object to obtain an initial scheduling strategy for the target energy system;

Inputting the initial scheduling strategy into the target decision object to obtain the target scheduling strategy of the target energy system;

The current operating state data of the target energy system is adjusted based on the target dispatch strategy.

According to a second aspect of the embodiments of this specification, a data processing device is provided, including:

A data acquisition module configured to acquire current operating state data of the target energy system;

The first strategy acquisition module is configured to input the current operating status data into an initial decision object to obtain an initial dispatch strategy of the target energy system;

The second strategy acquisition module is configured to input the initial dispatch strategy into the target decision-making object, and obtain the target dispatch strategy of the target energy system;

An adjustment module configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.

According to the third aspect of the embodiments of this specification, another data processing method is provided, including:

Obtain the current operating state data of the target power system;

Inputting the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system;

Inputting the initial dispatch strategy into a target decision object to obtain a target dispatch strategy of the target power system;

The current operating state data of the target power system is adjusted based on the target dispatch strategy.

According to the fourth aspect of the embodiments of this specification, another data processing device is provided, including:

A data acquisition module configured to acquire current operating state data of the target power system;

The first strategy acquisition module is configured to input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system;

The second strategy acquisition module is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target power system;

An adjustment module configured to adjust the current operating state data of the target power system based on the target dispatch strategy.

According to a fifth aspect of the embodiments of this specification, a computing device is provided, including:

memory and processor;

The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the computer-executable instructions are executed by the processor, the steps of the data processing method are implemented.

According to a sixth aspect of the embodiments of the present specification, there is provided a computer-readable storage medium, which stores computer-executable instructions, and implements the steps of the data processing method when the computer-executable instructions are executed by a processor.

According to a seventh aspect of the embodiments of the present specification, a computer program is provided, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.

The data processing method provided in this specification includes: obtaining the current operating state data of the target energy system; inputting the current operating state data into the initial decision object to obtain the initial scheduling strategy of the target energy system; inputting the initial scheduling strategy into The target decision object is to obtain the target dispatch strategy of the target energy system; and adjust the current operation status data of the target energy system based on the target dispatch strategy.

Specifically, the method processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target scheduling strategy of the target energy system, and quickly analyzes the current state of the target energy system through the target scheduling strategy The operation status data is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.

Description of drawings

Fig. 1 is a flow chart of a data processing method provided by an embodiment of this specification;

Fig. 2 is a processing flow chart of a data processing method provided by an embodiment of this specification;

Fig. 3 is a flowchart of another data processing method provided by an embodiment of this specification;

Fig. 4 is a schematic structural diagram of a data processing device provided by an embodiment of this specification;

Fig. 5 is a schematic structural diagram of another data processing device provided by an embodiment of this specification;

Fig. 6 is a structural block diagram of a computing device provided by an embodiment of this specification.

Detailed ways

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the specification. However, this specification can be implemented in many other ways different from those described here, and those skilled in the art can make similar extensions without violating the connotation of this specification, so this specification is not limited by the specific implementations disclosed below.

Terms used in one or more embodiments of this specification are for the purpose of describing specific embodiments only, and are not intended to limit one or more embodiments of this specification. As used in one or more embodiments of this specification and the appended claims, the singular forms "a", "the", and "the" are also intended to include the plural forms unless the context clearly dictates otherwise. It should also be understood that the term "and/or" used in one or more embodiments of the present specification refers to and includes any or all possible combinations of one or more associated listed items.

It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of the present specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, the first may also be referred to as the second, and similarly, the second may also be referred to as the first without departing from the scope of one or more embodiments of the present specification. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

First, terms and terms involved in one or more embodiments of this specification are explained.

Mathematical modeling: establish the model of the problem sought by mathematical means, and solve it through the solver.

Reinforcement learning: An artificial intelligence machine learning method that can learn action strategies only through rewards/punishments given by the environment.

Power flow calculation (PF, Power Flow): According to the given power grid structure, parameters and operating conditions of components such as generators and loads, the calculation of the steady-state operating state parameters of each part of the power system is determined.

Power flow calculation (OPF, Optimal Power Flow): Under the premise of satisfying the steady-state power flow and the operation constraints of other equipment components, determine the power of the generator and the node voltage amplitude, so that a certain performance index of the system (such as power generation cost or network loss ) to achieve excellence.

ACOPF: A power flow calculation model.

DCOPF: A power flow calculation model.

In order to better serve the strategic goal of "dual carbon", many energy industries are building new energy systems with new energy as the main body. For example, the electric power industry is building a new type of power system with new energy as the main body. Many challenges. On the one hand, due to its own volatility and intermittency, new energy forces the power grid scheduling to have the ability to respond quickly; The daytime manual scheduling method is obviously unable to meet the above requirements.

Based on this, the three schemes provided in this manual are: power flow calculation scheme, unit combination scheduling scheme, and reinforcement learning-based scheduling scheme. Among them, the first power flow calculation (OPF) scheme: the scheme itself uses mathematical modeling methods to determine the power of the generator and the node voltage amplitude under the premise of satisfying the steady-state power flow and other equipment components, so that the system A certain performance index (such as power generation cost or network loss) is excellent. It only considers single-step decision-making, and the scheme can usually be divided into ACOPF and DCOPF.

Because ACOPF contains nonlinear functions in the optimization problem, it usually leads to a slow solution time. On a power system with a scale of 10,000 nodes, the solution time of ACOPF is about 10 minutes.

DCOPF simplifies and approximates the operation of the grid system on the basis of ACOPF, turning the mathematical model into a linear model, which can speed up the solution speed, and the speed increase is more than doubled, but if multi-step DCOPF is also considered, it is also time-sensitive slow.

The ACOPF model or DCOPF model uses the means of mathematical modeling to calculate the single-step power flow. The advantage is that the decision-making results are safe and interpretable. The disadvantage is that in a power grid system with a scale of 10,000 nodes, the estimated time consumption of ACOPF is about 10 minutes, and that of DCOPF is about 5 minutes. The current decision is made from the perspective of optimization, and secondly, it cannot make discrete decisions such as switching on and off, otherwise the solution will be more time-consuming.

The second unit combination scheduling (SCUC, Security Constrained Unit Commitment) scheme: this scheme determines the start-stop decision and output decision of the unit at each time based on the overall target excellence of multiple time steps, and is usually used as a daytime scheduling decision plan. Although decisions can be made one day in advance, the volatility of new energy makes it difficult for day-level decision-making to cope with real-time changes.

Although the SCUC scheme has sufficient time for switching on and off and multi-step decision-making optimization, the volatility of new energy can only be more accurately modeled and dealt with during real-time/quasi-real-time scheduling. Reinforcement learning can learn and optimize long-term goals in an off-line data-driven manner through the simulator. Its advantage is that through the reasonable design of the model, decisions can be made from the perspective of long-term profit maximization on issues such as unit combination, ramping, and N-1 power grid safety criteria. Online scheduling decisions can be made in seconds, while the disadvantages It is the result of the decision that may have potential safety hazards.

The third scheduling scheme based on reinforcement learning: reinforcement learning itself is suitable for sequential decision-making scenarios, so it is often used to optimize long-term goals. The decision-making response time of this scheme is very fast, usually at the second level. However, because reinforcement learning is based on data-driven learning, it is not good at dealing with problems with hard constraints. In the power grid dispatching scenario, the decision result of reinforcement learning may not satisfy the security constraints, which makes the scheme of power grid dispatching based on reinforcement learning have security risks.

Based on this, in this specification, a data processing method is provided. This specification also relates to a data processing device, another data processing method, another data processing device, a computing device, and a computer-readable The storage medium and a computer program are described in detail in the following embodiments.

Fig. 1 shows a flow chart of a data processing method provided according to an embodiment of this specification, which specifically includes the following steps.

Step 102: Obtain current operating state data of the target energy system.

Wherein, the target energy system can be understood as a system capable of processing energy; when the data processing method provided in this specification is applied in different scenarios, the target energy system is also different. For example, in the case where the data processing method provided in this specification can be applied to a new energy scenario, the target energy system can be understood as a new energy power system, such as a solar power system, a wind power system, a hydropower system, and the like. In the case that the data processing method provided in this specification can be applied to oil extraction, the target energy system can be understood as an oil extraction system.

In the case where the target energy system is a power system, the current operating state data can be understood as parameters such as the grid structure and operating state parameters of elements such as generators in the power system, and the load of the power system. Including but not limited to the switch state of the generator, the power generation efficiency of the generator, the load of the power system, etc. In practical applications, when the power system is a hydroelectric power generation system, the current operating state data can be the operating state parameters of the power grid structure and hydroelectric generating units and other components in the hydroelectric power generation system, and the power supply load of the hydroelectric power generation system. . Similarly, when the power system is a wind power generation system, the current operating state data may be parameters such as the power grid structure and the operating state parameters of components such as wind power generating sets in the wind power generating system, and the power supply load of the wind power generating system.

In the case where the target energy system is an oil extraction system, the current operating status data can be understood as the operating status parameters of components such as pump valves and oil pipelines in the oil extraction system, including but not limited to the switch status of pump valves, oil pipelines, etc. line pressure etc.

In order to facilitate understanding, the following will take the target energy system as an example of an electric power system to illustrate the data processing method provided in this specification, wherein the electric power system can be any new energy power system, such as a solar power generation system, a wind power generation system , hydroelectric power generation system, tidal power generation system, etc.

Specifically, in the data processing method provided in this specification, the current operating state of the target energy system can be obtained.

Taking the scenario where the data processing method is applied to adjust the current operating state of the power system as an example, the acquisition of the current operating state data of the target energy system will be further described. The target energy system is an electric power system, and the current operating state data is an operating state parameter of components such as a power grid structure and a generator in the electric power system.

Based on this, this specification is applied to the data processing method in the scenario of adjusting the current operating state of the power system, which can obtain the current operating state parameters of the power system. The current operating state parameters can be that the current power load of the power system is relatively high, In addition, among the three generators (generator A, generator B, and generator C) in the power system, generator A is in an off state, and generator B and generator C are in an on state.

In the specific implementation process, the current operating state data of the electric power system can be obtained through any method of obtaining the current operating state data of the electric power system, and this specification does not specifically limit this. For example, through various sensors configured in the power system, the current operating status data of the power system can be determined.

In addition, the data processing method provided in this specification can be applied to a decision-making scheduling module that can adjust the current operating state data of the target energy system. The decision-making scheduling module can obtain the target scheduling strategy through the initial decision object and the target decision object, and The current operating state data of the target energy system is adjusted based on the target scheduling strategy. Wherein, the decision scheduling module can be deployed in the target energy system, or can be independent of the target energy system. In practical applications, the decision-making scheduling module can be a decision-making scheduling platform independent of the power system, a decision-making scheduling degree server, or the decision-making scheduling module may be a decision-making scheduling device, a decision-making scheduling server, etc. deployed in the power system. This specification does not specifically limit this.

Step 104: Input the current operating status data into an initial decision object to obtain an initial scheduling strategy for the target energy system.

Wherein, the initial decision object can be understood as an object that can obtain the initial scheduling strategy of the target energy system based on the current operating state data. In practical applications, the initial decision object can be a deep learning model, electronic equipment, or an application program. For ease of understanding, the following describes the data processing method provided in this manual by taking the initial decision object as an example of a deep reinforcement learning model. Wherein, in the case that the initial decision object is a deep reinforcement learning model, the deep learning model can be understood as any model that can obtain the initial scheduling strategy of the target energy system based on the current operating state data.

The initial scheduling strategy can be understood as a strategy for adjusting the current running state data of the target object. For example, in the case that the target energy system is an electric power system, the initial scheduling strategy may be a strategy for adjusting current operating state data of the electric power system.

Following the above example, the initial decision object is a deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the deep reinforcement learning model, and based on the deep reinforcement learning model, the current state of the power system can be judged based on the current operating state data, Thus generating a scheduling policy. For example, when the deep reinforcement learning model judges that the current power load of the power system is high based on the current operating status data, it can determine a dispatch strategy that can cope with the high power load of the power system. The scheduling strategy output by the deep reinforcement learning model can be as follows: turn on generator A in the power system, set the generator A to generate the minimum quota of power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 90 megawatts.

In an embodiment of this specification, the initial decision-making object may also include two sub-modules, namely a strategy determination module and a strategy evaluation module; The current operating status data of the target energy system is processed to obtain the initial scheduling strategy of the target energy system, thereby improving the processing efficiency of subsequent target decision-making objects, and further enabling the target energy system to have large-scale processing and rapid response capabilities. The specific implementation methods are as follows.

The inputting the current operating state data into the initial decision-making object to obtain the initial scheduling strategy of the target energy system includes steps 1042 to 1046:

Step 1042: Input the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed.

Wherein, the policy determination module can be understood as a module in the initial decision object that can process the current running state data and obtain the scheduling policy to be processed. In the case that the initial decision object is a deep reinforcement learning model, the policy determination module may be a policy determination sub-model in the deep reinforcement learning model.

Specifically, the current operating state data of the target energy system is input into the policy determination module of the initial decision object, and the current operating state data is processed by the policy determination module to obtain the pending scheduling strategy of the target energy system.

Following the above example, the policy determination module is a policy determination sub-model in the deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the strategy determination sub-model in the deep reinforcement learning model, and the current power consumption load of the power system is relatively high based on the strategy determination sub-model. In the case of , a dispatching strategy that can cope with the power load of the power system is determined. The scheduling strategy for determining the output of the sub-model in this strategy can be as follows: turn on generator A in the power system, set the generator A to generate electricity with a minimum quota (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of generator C to 100 megawatts.

Step 1044: The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed.

Among them, the policy evaluation module can be understood as a module in the initial decision object that can evaluate the scheduling strategy. In the case that the initial decision object is a deep reinforcement learning model, the policy evaluation module can be a policy evaluation in the deep reinforcement learning model submodel.

The policy evaluation result is understood as the result obtained by the policy evaluation module evaluating the scheduling policy to be processed; in practical applications, the policy evaluation result may be an evaluation score. For example, 1 point, 0.9 points, etc. In the case that the policy evaluation module is a policy evaluation sub-model, the scheduling policy to be processed can be input into the policy evaluation sub-model to obtain an evaluation result corresponding to the scheduling policy to be processed.

Specifically, after the pending scheduling policy is obtained through the policy determination module of the initial decision object, the pending scheduling policy can be evaluated based on the policy evaluation module in the initial decision object, thereby obtaining the policy evaluation corresponding to the pending scheduling policy result.

In practical applications, in order to improve the processing efficiency of the target decision-making object, so that the target energy system has the ability of large-scale processing and rapid response, the data processing method provided by the embodiment of this specification will fix the specific variables in the scheduling strategy to be processed , that is, setting specific variables in the pending scheduling policy to be non-tunable. The subsequent target decision-making object only needs to process the variables that can be adjusted in the initial scheduling strategy, thereby improving the processing efficiency of the target decision-making object. Wherein, the specific variable can be understood as a variable that has a relatively large negative impact on the effect of the scheduling strategy.

Based on this, after the pending scheduling policy is obtained through the policy determination sub-model, the variables in the scheduling policy can also be adjusted, so as to obtain the pending scheduling policy after variable adjustment.

Moreover, after adjusting the variables in the scheduling strategy to be processed, the scheduling strategy to be processed and the scheduling strategy to be processed after the variable adjustment can be distributed to evaluate the scheduling strategy to be processed through the policy evaluation sub-model, so that based on the evaluation result of the scheduling strategy to be processed , and the evaluation result of the scheduling strategy to be processed after variable adjustment, and determine the variable that has a greater negative impact on the effect of the scheduling strategy from the scheduling strategy to be processed. Furthermore, setting the variable as non-adjustable improves the processing efficiency of the subsequent target decision-making object, and avoids the problem that the effect of the scheduling strategy is poor due to the modification of the variable by the target decision-making object.

Wherein, the manner of adjusting the variables in the pending scheduling strategy and evaluating the pending scheduling strategy and the pending scheduling strategy after variable adjustment is as follows.

The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed, including:

modifying the decision parameters in the scheduling strategy to be processed based on preset parameter adjustment rules to obtain the adjusted scheduling strategy to be processed;

Inputting the scheduling policy to be processed and the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object to obtain a policy evaluation result of the scheduling policy to be processed.

Wherein, the decision parameter can be understood as a variable that can be adjusted in the dispatch strategy to be processed, for example, the decision variable can be the power generation (50 MW) set for generator A in the power system in the dispatch strategy.

The preset parameter modification rule can be understood as a rule for modifying the decision parameters in the scheduling strategy to be processed. For example, when the decision variable is the power generation (50 megawatts) set by generator A, the preset parameter modification rule can be In order to reduce the power generation (50 MW) of generator A in the dispatch strategy by 10 MW.

Specifically, after obtaining the scheduling strategy to be processed through the strategy determination module of the initial decision object, the decision parameters in the scheduling strategy to be processed can be determined, and the decision parameters in the scheduling strategy to be processed can be determined based on the preset parameter modification rules. Adjust, so as to obtain the adjusted pending scheduling policy.

By inputting the pending scheduling policy and the adjusted pending scheduling policy into the policy evaluation module of the initial decision object, based on the policy evaluation module, evaluating the pending scheduling policy and the adjusted pending scheduling policy, thereby obtaining the Policy evaluation results for pending scheduling policies.

Using the above example, where the decision parameter can be a variable that can be adjusted in the pending scheduling strategy, the pending scheduling strategy can be: set the generator A in the power system to perform minimum quota power generation (50 megawatts), Set the generating capacity of Generator B to 100 MW and the generating capacity of Generator C to 100 MW. Based on this, after the strategy determination sub-model obtains the dispatch strategy to be processed based on the current operating state data of the power system, it can determine the variable that can be adjusted in the dispatch strategy to be processed, and the variable can be the power generation of the generator.

And by modifying the variables based on the preset rules, the power generation of the generators in the dispatching strategy to be processed is adjusted, so as to obtain the dispatching strategy to be processed after the variable adjustment, and the dispatching strategy to be processed after the variable adjustment can be: Generator A in the power system is set to generate the minimum quota (50 MW), generator B is set to generate 100 MW, and generator C is set to generate 80 MW.

After adjusting the variables in the scheduling strategy to be processed, the scheduling strategy to be processed and the scheduling strategy to be processed after the variable adjustment are respectively input into the decision evaluation sub-module for evaluation, so as to obtain the decision evaluation sub-module for the pending scheduling Policy evaluation result for the policy.

In the embodiment provided in this specification, the adjusted scheduling strategy to be processed is obtained by adjusting the decision parameters in the scheduling strategy to be processed, and the scheduling strategy to be processed is input into the strategy evaluation module of the initial decision object to obtain the scheduling strategy to be processed The evaluation result of the strategy is convenient for the subsequent generation of an initial scheduling strategy based on the evaluation result of the strategy, further improving the processing efficiency of the target decision object, and enabling the target energy system to have large-scale processing and rapid response capabilities.

Further, the inputting the pending scheduling policy and the adjusted pending scheduling policy into the policy evaluation module of the initial decision object to obtain the policy evaluation result of the pending scheduling policy includes:

inputting the scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a first evaluation result of the scheduling policy to be processed;

inputting the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a second evaluation result of the scheduling policy to be processed;

Determine a policy evaluation result of the scheduling policy to be processed based on the first evaluation result and the second evaluation result.

Wherein, the first evaluation result can be understood as an evaluation result of the scheduling policy to be processed by the policy evaluation module. The second evaluation result can be understood as the evaluation result of the adjusted scheduling strategy to be processed by the policy evaluation module. In practical applications, the evaluation result can be an evaluation score, such as 1 point, 0.9 point and so on.

Specifically, after obtaining the adjusted scheduling strategy to be processed, the scheduling strategy to be processed is input into the strategy evaluation module of the initial decision object, and the scheduling strategy to be processed is evaluated by the strategy evaluation module to obtain the first evaluation of the scheduling strategy to be processed result. Input the adjusted scheduling strategy to be processed into the strategy evaluation module of the initial decision object, evaluate the adjusted scheduling strategy to be processed through the strategy evaluation module, obtain the second evaluation result of the adjusted scheduling strategy to be processed, and then based on the first The first evaluation result and the second evaluation result determine the policy evaluation result of the scheduling policy to be processed.

Following the above example, the scheduling strategy A to be processed after variable adjustment can be as follows: set generator A in the power system to produce the minimum quota power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 80 megawatts. The scheduling strategy B to be processed after variable adjustment can be: set generator A in the power system to produce the minimum quota power generation (50 MW), set the power generation of generator B to 70 MW, set generator C’s The power generation is set to 100 MW.

Based on this, the scheduling strategy to be processed, the scheduling strategy A to be processed after variable adjustment, and the scheduling strategy B to be processed after variable adjustment are respectively input into the decision evaluation sub-module for evaluation, so as to obtain the evaluation result; the pending scheduling The evaluation result of the strategy can be 1 point; the evaluation result of the pending scheduling strategy A after the variable adjustment can be 0.3 points, and the evaluation result of the pending scheduling strategy B after the variable adjustment can be 0.9 points. Afterwards, the three evaluation results are used as the strategy evaluation results corresponding to the data to be processed.

In the embodiment provided in this specification, the policy evaluation result of the scheduling policy to be processed is determined based on the first evaluation result of the scheduling policy to be processed and the second evaluation result of the scheduling policy to be processed. It is convenient to generate an initial scheduling strategy based on the evaluation result of the strategy, further improve the processing efficiency of the target decision object, and enable the target energy system to have large-scale processing and rapid response capabilities.

Step 1046: Determine an initial scheduling strategy based on the scheduling strategy to be processed and the corresponding strategy evaluation result.

Specifically, the determining an initial scheduling strategy based on the scheduling strategy to be processed and the corresponding strategy evaluation result includes:

determining a first parameter in the pending scheduling policy based on the policy evaluation result;

determining a second parameter in the scheduling policy to be processed based on preset determination conditions;

Setting the first parameter and the second parameter in the pending scheduling strategy as fixed parameters to obtain an initial scheduling strategy.

Among them, the first parameter can be understood as a decision variable that affects long-term effects. The preset determination condition can be set according to the actual application scenario, for example, the preset determination condition can be to determine a discrete decision variable; the second parameter can be understood as a discrete decision determined from the scheduling strategy to be processed based on the preset determination condition Variables, fixed parameters can be understood as parameters that cannot be modified or adjusted.

Following the above example, the evaluation result of the policy includes that the evaluation result of the scheduling policy to be processed can be 1 point; the evaluation result of the scheduling policy A to be processed after variable adjustment can be 0.3 points, and the scheduling policy B to be processed after variable adjustment The evaluation result of can be 0.9 points. Based on this, the power generation of generator B (100 MW) in the dispatching strategy to be processed is determined as the decision variable affecting the long-term effect through the evaluation result of the strategy. And based on the preset determination condition, the power generation capacity (50 megawatts) of the generator A in the action At is determined as a discrete decision variable.

Afterwards, the decision variables and discrete decision variables that affect the long-term effect in the pending scheduling strategy are set as fixed parameters, that is, decision variables that cannot be modified and adjusted, so as to obtain the initial scheduling strategy.

In the embodiment of this specification, the initial scheduling strategy is obtained by setting the first parameter determined based on the strategy evaluation result and the second parameter determined based on the preset determination condition in the scheduling strategy to be processed as fixed parameters, thereby reducing the work of the subsequent target decision-making object increase the processing speed of the target decision object.

Step 106: Input the initial dispatch strategy into the target decision object to obtain the target dispatch strategy of the target energy system.

In practical applications, the initial decision object can very quickly generate a scheduling strategy based on the current operating state data of the target energy system to adjust the current operating state data of the target energy system. However, the initial decision object may not be good at dealing with Problems with hard constraints, where the hard constraints can be designed according to actual application scenarios, for example, when the target energy system is a power system, the hard constraints can be that the lines in the power system cannot exceed the limit situation, or the situation that the voltage in the grid cannot exceed the preset voltage threshold.

Based on this, the data processing method provided in this manual, after the initial decision-making object generates an initial scheduling strategy based on the current operating status data, inputs the initial scheduling strategy into the target decision-making object, and adjusts the initial scheduling strategy through the target decision-making object to avoid The situation that the target energy system violates the hard constraints occurs because of the adjustment strategy.

Wherein, the target decision-making object can be understood as an object that can obtain the target energy system's target scheduling strategy based on the initial scheduling strategy. In practical applications, the target decision-making object can be a mathematical model, an electronic device, or an application program sequence etc. For ease of understanding, the data processing method provided in this specification is described below by taking the initial decision object as an example of a mathematical model.

Using the above example, the specific variables (discrete decision variables and decision variables affecting long-term effects) in the initial scheduling strategy have been fixed. Based on this, the initial scheduling strategy is input into the mathematical model, and the unfixed decision variables in the initial scheduling strategy are adjusted and modified through the mathematical model to carry out further single-step optimization and guarantee of physical safety constraints, so as to obtain the target Scheduling strategy.

In the initial scheduling strategy, generator A can be turned on, and the generator A can be set to generate the minimum quota (50 MW), the power generation of generator B can be set to 100 MW, and the power generation of generator C can be set to 100 case of megawatts. Among them, the power generation capacity of the generator A (50 megawatts) and the power generation capacity of the generator C (100 megawatts) have been fixed. Based on this, the initial dispatch strategy is input into the mathematical model, and the power generation of generator B (100 MW) in the initial dispatch strategy is adjusted through the mathematical model, so as to obtain the target dispatch strategy. The target scheduling strategy can be: generator A is turned on, and the generator A is set to generate a minimum quota of power generation (50 megawatts), the power generation of generator B is set to 80 megawatts, and the power generation of generator C is set to 100 megawatts.

In this way, the single-step optimization of the initial action strategy and the guarantee of physical safety constraints can be realized, and the problem of potential safety hazards in the decision-making results of reinforcement learning can be avoided. At the same time, the processing speed of the mathematical model is further improved because the results of some decision variables have been fixed.

Step 108: Adjust the current operating state data of the target energy system based on the target dispatch strategy.

Using the above example, the target scheduling strategy is: Generator A is turned on, and the generator A is set to generate electricity with a minimum quota (50 MW), the power generation of generator B is set to 80 MW, and the power generation of generator C is set to 80 MW. The case where the amount is set to 100 MW. Based on the target scheduling strategy, turn on generator A in the power system, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to The amount is set to 100 MW.

In the data processing method provided in this specification, the current operating state data of the target energy system is processed through the initial decision object and the target decision object, and the target energy dispatching strategy of the target energy system is obtained, and the target energy resource is quickly obtained through the target scheduling strategy. The current operating status data of the system is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.

In an embodiment provided in this specification, after the target dispatch strategy for the power system is obtained, the current target dispatch strategy and the current operating state data of the power system can be saved. When the running state data corresponding to the current running state data is subsequently obtained, the saved target scheduling strategy is used as an initial value, so as to quickly generate a new scheduling strategy based on the initial value, thereby improving the efficiency of generating the scheduling strategy, Based on this, in the process of determining the dispatch strategy for the power system based on the deep reinforcement learning model, the deep learning reinforcement model can refer to the dispatch strategy saved in history, through the dispatch strategy saved in history and the current operating status data of the power system , to obtain the dispatch strategy for the power system.

In addition, after inputting the current operating state data of the power system into the deep learning model and obtaining the dispatching strategy for the power system, the dispatching strategy can also be sent to the policy update object, and based on the policy update object based on the current power system The demand condition modifies the dispatching strategy, and the modified dispatching strategy is input into the mathematical model to generate the target dispatching strategy, so as to improve the adaptability of the dispatching strategy and the power system. The specific implementation is as follows.

The inputting the current operating state data into the initial decision-making object to obtain the initial scheduling strategy of the target energy system includes:

determining the similarity between the historical operating state data and the current operating state data, and determining the historical operating state data corresponding to the maximum similarity among the similarities as similar operating state data to the current operating state data;

Obtaining a historical scheduling policy corresponding to the similar operating status data, wherein the historical scheduling policy is a historical target scheduling policy obtained based on a target decision object;

Inputting the current operating state data and the historical scheduling strategy into the initial decision object to obtain the updated scheduling strategy of the target energy system;

Send the to-be-updated scheduling policy of the target energy system to the policy update object;

An initial scheduling policy sent by the policy update object is received, wherein the initial scheduling policy is obtained by updating the scheduling policy to be updated based on preset update conditions by the policy update object.

Wherein, the historical operation status data can be understood as the operation status data of the target energy system acquired and saved in history. The historical scheduling strategy can be understood as a target scheduling strategy for historical preservation. The similarity can be understood as a numerical value indicating the similarity between the historical operating state data and the current operating state data. For example, any value in the interval [0, 1] or [0, 100]. Similar operating state data can be understood as historical operating state data that is most similar to current operating state data. The scheduling strategy to be updated can be understood as a scheduling strategy that needs to be updated by a policy update object. The policy update object can be understood as an object that can update the scheduling strategy to be updated based on preset update conditions. For example, the policy update object can be understood as a power system The operation and maintenance personnel of the power system, the operation and maintenance robot of the power system, the neural network model or the program. The preset update condition can be understood as the demand for the power system, for example, for the power supply load demand of the power system, for the power load demand of the power system, for the voltage demand of the transmission line of the power system, for specific equipment in the power system on/off requirements, etc. In actual application, this requirement can be set according to the actual application scenario, and this specification does not make specific restrictions on it.

Specifically, after obtaining the current operating state data of the target energy system, it is possible to obtain historically saved historical operating state data and determine the similarity between the historical operating state data and the current operating state. The historical operating state data can be one or Multiple. In practical applications, the similarity method can be obtained by tools such as neural network models, programs, or robots.

From the similarities between each historical operating state data and the current operating state data, the maximum similarity is determined; and the historical operating state data corresponding to the maximum degree of acquaintance is used as similar operating state data to the current operating state data.

Determine the historical dispatching strategy corresponding to the similar operating state data, and input the historical dispatching strategy and the current operating state data of the target energy system into the initial decision object, so as to obtain the updated dispatching strategy of the target energy system.

Then send the scheduling policy to be updated to the policy update object, and receive the initial scheduling policy returned by the policy update object. Wherein, the policy update object can update the scheduling policy to be updated based on preset update conditions, obtain an updated scheduling policy, and use the updated scheduling policy as an initial scheduling policy.

Following the above example, after obtaining the current operating state data of the power system, it is possible to obtain the historically saved historical operating state data of the power system; and determine the similarity between the historical operating state data and the current operating state data; The similarity is sorted in descending order, and the maximum similarity is determined from the similarity based on the sorting result of the similarity, and the historical operation status data corresponding to the maximum similarity is taken as the historical operation status with the highest similarity to the current operation status data Data, that is, similar operating status data.

From the saved historical scheduling policies, determine the historical scheduling policies corresponding to similar running states. Wherein, the historical scheduling strategy is a deep reinforcement learning model and a data model, and a scheduling strategy generated based on the similar running state data.

The historical scheduling strategy is used as the reference value of the deep reinforcement learning model, and the scheduling strategy for the power system is obtained by inputting the historical scheduling strategy and the current operating state data of the power system into the deep reinforcement learning model.

In addition, after obtaining the scheduling strategy for the power system, in order to match the scheduling strategy with the current demand of the power system, it is necessary to send the scheduling strategy to the operation and maintenance personnel of the power system. According to the current demand of the power system (power supply charge demand), the scheduling strategy is modified to obtain the modified scheduling strategy, and the modified scheduling strategy is sent to the decision-making scheduling module as a scheduling strategy that needs to be processed again by the mathematical model .

The decision-making scheduling can receive the scheduling strategy returned by the operation and maintenance personnel of the power system, and then the scheduling strategy can be input into the mathematical model for processing.

In the embodiment of this specification, the historical operating state data corresponding to the maximum similarity between the historical operating state data and the current operating state data is determined as the similar operating state data of the current operating state data; and the similar operating state data is corresponding to The historical scheduling strategy and the current operating status data are input to the initial decision object, and the scheduling strategy to be updated of the target energy system is quickly obtained; the efficiency of generating the scheduling strategy is improved.

Moreover, by sending the scheduling strategy to be updated of the target energy system to the strategy update object, and receiving the initial scheduling strategy obtained by updating the scheduling strategy to be updated based on the preset update conditions, it is convenient to improve the subsequent generated scheduling strategy and the power system. adaptability.

In the embodiments provided in this specification, before generating the target dispatch strategy based on the initial decision object and the target decision object, the initial decision object needs to be generated, so as to achieve the goal of obtaining the target energy system based on the initial decision object and the target decision object Scheduling strategy, the specific implementation is as follows.

Before the acquisition of the current operating state data of the target energy system, steps 1 to 3 are also included:

Step 1: Determine the simulated running state data based on the state simulation module.

Wherein, the state simulation module can be understood as a module capable of modeling the current operating state data of the target energy system, for example, the state simulation module can be a simulator. Correspondingly, the simulated operating state data is the operating state data simulated by the state simulation module. For example, in the case where the target energy system is a power system, the state simulation module may be the operating state data of the power system simulated by the simulator . For example, in the case that the target energy system is an oil extraction system, the state simulation module may be the operating state data of the oil extraction system simulated based on a simulator.

In addition, when the initial decision object is a deep reinforcement learning model, the process of obtaining the initial decision object can be understood as the process of training the deep reinforcement learning model, so as to obtain the trained deep reinforcement learning model. Based on this, the state-based simulation module determines the simulated running state data, including:

Sample operating state data is determined based on the state simulation module.

Specifically, the simulated running state data of the state simulation module is used as sample running state data for training the model.

Step 2: Input the simulation operation state data into the decision object to be processed to obtain a simulation scheduling strategy.

Among them, the decision object to be processed can be understood as a deep reinforcement learning model to be trained. Correspondingly, the simulation scheduling strategy can be understood as the scheduling strategy obtained after the deep reinforcement learning model calculates the simulation running status data. In an embodiment of the present specification, the decision object to be processed may include a strategy determination module. Based on this, the simulation scheduling strategy can be understood as a scheduling strategy generated by the strategy determination module based on the simulated running status data.

Specifically, the input of the simulated running status data into the decision object to be processed to obtain a simulated scheduling strategy includes:

The sample running status data is input into the decision-making model to be processed to obtain a simulation scheduling strategy.

Following the above example, in the process of training the deep reinforcement learning model, it is necessary to use the operating state data of the power system simulated by the simulator as sample data, and input the sample data into the deep reinforcement learning model to be trained for calculation. Thus, the scheduling policy output by the deep reinforcement learning model is obtained.

In practical applications, the deep reinforcement learning model includes a policy determination sub-model and a policy evaluation sub-model. Based on this, inputting the simulated operation status data into the decision-making object to be processed to obtain the simulation scheduling strategy can also be understood as The simulated running status data is input into the strategy determination sub-model of the decision object to be processed, and the simulation scheduling strategy is calculated through the strategy determination sub-model to obtain the simulation scheduling strategy.

Step 3: Process the decision object to be processed based on the simulated scheduling strategy to obtain an initial decision object.

Wherein, in the case that the initial decision object is a deep reinforcement learning model, the initial decision object can be understood as a trained deep reinforcement learning model. Based on this, the processing of the decision object to be processed based on the simulation scheduling strategy to obtain an initial decision object includes:

The decision model to be processed is trained based on the simulated scheduling strategy to obtain an initial decision model.

Following the above example, after the simulation scheduling policy is obtained through the deep reinforcement learning model to be trained, the deep reinforcement learning model is trained through the simulation scheduling policy until the training completion condition is met.

Specifically, the processing of the decision object to be processed based on the simulation scheduling strategy to obtain an initial decision object includes:

Evaluating the simulation scheduling strategy based on the state simulation module to obtain a simulation strategy evaluation result;

Based on the evaluation result of the simulation policy and the simulation scheduling policy, the decision object to be processed is processed to obtain an initial decision object.

Wherein, the simulation strategy evaluation result can be understood as the evaluation result of the simulation scheduling strategy by the state simulation module, for example, the simulation strategy evaluation result can be an evaluation score.

Following the above example, after the deep reinforcement learning model to be trained outputs the simulated scheduling policy, the simulated scheduling policy is input into the simulator. The simulated scheduling strategy can be evaluated through the simulator, so as to obtain the simulated strategy evaluation result of the simulated scheduling strategy. Wherein, the evaluation result of the simulation strategy is the evaluation score.

In practical applications, after obtaining the simulation policy evaluation results, the simulator continues to generate new simulated running status data, and continues to train the deep reinforcement learning model through the simulated running status data, and repeats this operation until the deep reinforced learning model reaches Training stop condition.

Wherein, the training stop condition can be determined based on the simulation strategy evaluation result; when it is determined that the simulation strategy evaluation result satisfies the training stop condition, it can be determined that the training of the deep reinforcement learning model has been completed. For example, the evaluation result of the simulated strategy can be any score in the interval [0,1], where 0 means that the effect of the simulated scheduling strategy is poor, and 1 means that the effect of the simulated scheduling strategy is better. Based on this, when the simulator evaluates the nearly 10 consecutive simulated scheduling strategies, if the nearly 10 consecutive simulated scheduling strategies are all 1 point, it means that the deep reinforcement learning model has reached the training stop condition.

In practical applications, the deep reinforcement learning model can include two sub-models, one is the policy determination sub-model, and the other is the policy evaluation model. Therefore, the process of the deep reinforcement learning model can be understood as determining the sub-model for the strategy and the strategy In the training process of the evaluation model, the trained deep reinforcement learning model is determined based on the trained policy determination sub-model and the policy evaluation model. The specific implementation method is as follows.

The processing of the decision object to be processed based on the evaluation result of the simulation strategy and the simulation scheduling strategy to obtain an initial decision object includes:

Processing the policy determination module in the pending decision object based on the simulated policy evaluation result to obtain the processed policy determination module;

Based on the simulated policy evaluation result and the simulated scheduling policy, process the policy evaluation module in the pending decision object to obtain the processed policy evaluation module;

An initial decision object is determined based on the processed policy determination module and the processed policy evaluation module.

Wherein, when the decision object to be processed is a deep reinforcement learning model, the policy determination module may be a policy determination sub-model, and the policy evaluation module may be a policy evaluation sub-model.

Following the above example, after the policy determination sub-model outputs the simulation scheduling policy, it inputs the simulation scheduling policy into the simulator. The simulated scheduling policy can be evaluated through the simulator, so as to obtain the policy evaluation result of the simulated scheduling policy.

In practical applications, after obtaining the strategy evaluation result of the simulated scheduling strategy, the simulator continues to generate new simulated operating state data, and continues to train the policy determination sub-model through the new simulated operating state data, and repeats this operation. Until the strategy determines that the sub-model reaches the training stop condition.

Wherein, the training stop condition may be determined based on the strategy evaluation result; if it is determined that the strategy evaluation result satisfies the training stop condition, then it may be determined that the strategy determines that the sub-model has completed training. For example, the policy evaluation result can be any score in the interval [0,1], where 0 means that the effect of the simulated scheduling strategy is poor, and 1 means that the effect of the simulated scheduling strategy is better. Based on this, when the simulator evaluates the nearly 10 consecutive simulated scheduling strategies, if the nearly 10 consecutive simulated scheduling strategies are all 1 point, it means that the strategy determines the sub-model to reach the training stop condition.

In the training process of the policy evaluation sub-model, the simulated scheduling policy output by the policy determination sub-model can be used as sample data, and the simulated policy evaluation result of the simulated scheduling policy can be used as a sample label, and the policy evaluation sub-model can be evaluated by sample data and sample labels. The model is trained until the policy evaluation sub-model reaches convergence, so it is determined that the policy evaluation sub-model meets the training stop condition.

In the data processing method provided by the embodiment of this specification, the simulated operation status data determined by the state simulation module is input into the decision object to be processed to obtain the simulation scheduling strategy; and the decision object to be processed is processed based on the simulation scheduling strategy to obtain the initial decision object. In this way, the target scheduling strategy of the target energy system can be obtained based on the initial decision object and the target decision object, and the current operating state data of the target energy system can be quickly adjusted through the target scheduling strategy, so that the target energy system can be processed on a large scale. And the ability to respond quickly, alleviating the difficulties faced by manual scheduling solutions.

The data processing method will be further described below by taking the application of the data processing method provided in this specification in the real-time dispatching scenario of the electric power system as an example in conjunction with the accompanying drawing 2 . Wherein, FIG. 2 shows a flowchart of a processing process of a data processing method provided by an embodiment of this specification.

Referring to Fig. 2, it can be seen that the data processing method provided in this specification in the real-time dispatching scenario of the power system is divided into two parts: an offline training part, and an online real-time/quasi-real-time dispatching part. Among them, the offline training part is to pre-train the deep reinforcement learning model (DRL) used in the scheduling process before the real-time scheduling of the power system, so as to realize the subsequent deep reinforcement module generation scheduling strategy completed through training. In the online real-time/quasi-real-time scheduling part, after completing the training of the deep reinforcement learning model, through the combination of mathematical modeling and reinforcement learning, the ACOPF model or DCOPF model obtained through data modeling is combined with the deep reinforcement learning model. (DRL) are combined to generate dispatch decision results for the power system based on the real environmental data of the power system;

Specifically, the data processing method provided in this specification in the real-time scheduling scenario of the power system first uses the emulator to perform offline training on the deep reinforcement learning model, as shown in the dotted box of "offline training" in Figure 2 . Specifically include the following steps.

Step 202: Based on the sample data provided by the simulator, train the action decision model in the deep reinforcement learning model.

Wherein, the simulator can be understood as the state simulation module in the above embodiment; the sample data can be understood as the simulated running state data in the above embodiment; the deep reinforcement learning model can be understood as the initial decision object in the above embodiment; action decision The model can be understood as the policy determination module in the above embodiments.

The deep reinforcement learning model in this embodiment itself has two sub-models, one is an action policy model (Actor), and the other is an action evaluation model (Critic). The action policy model can be understood as the policy determination sub-model in the above embodiment; the action evaluation model can be understood as the policy evaluation model in the above embodiment.

Specifically, the simulator will simulate the current state St of the power system; use the simulated current state St as a training sample, and input the training sample into the action strategy model to be trained to train the action decision model, the After receiving the current state St, the action decision-making model to be trained can respond to the action strategy, thereby outputting the action At, wherein the current state St can be understood as the simulated current operating state data in the above-mentioned embodiment, and the action At can be understood as is the simulation scheduling strategy in the above embodiment.

Step 204: Evaluate the action output by the action decision model through the simulator.

Specifically, after the action policy model to be trained outputs the action At, the action At is input into the simulator. The action At can be evaluated by the simulator, so as to obtain the immediate income Rt (reward) of the action At. Wherein, the immediate return Rt can be understood as the simulation strategy evaluation result in the above embodiment.

In practical applications, after obtaining the immediate income Rt of the action At, the simulator continues to generate a new current state St+1, continues to train the action policy model through the current state St+1, and repeats this operation until the action The policy model reaches the training stop condition.

Wherein, the training stop condition can be determined based on the immediate gain Rt; when it is determined that the immediate gain Rt satisfies the training stop condition, it can be determined that the training of the action strategy model has been completed. For example, the immediate return Rt can be any score in the interval [0,1]. Based on this, when the simulator evaluates the nearly 10 consecutive actions At, if the nearly 10 consecutive actions At are all 1 point, it means that the action decision model has reached the training stop condition.

Step 206: Train the action evaluation model based on the action output by the action decision model and the immediate benefits of the action.

Wherein, the action evaluation model can be understood as the policy evaluation model in the above embodiment. The function of the action evaluation model is to evaluate the average income that can be obtained after taking the action At in the state St, which includes the current immediate income Rt and the possible average income in the future.

Specifically, the action At output by the action decision-making model is used as sample data, and the immediate income of the action is used as the sample label, and the action evaluation model is trained through the sample data and sample labels until the action evaluation model reaches convergence, so as to determine the action evaluation The model reaches the training stop condition.

In practical applications, the output of the action decision model (action At) can be used as the input of the action evaluation model. Based on this, in the process of training the action evaluation model, the action is evaluated by the action At and the corresponding immediate income Rt In the process of model training, the action evaluation model can also be trained based on the action At+1 obtained after processing the current state St+1 based on the action decision-making model, and the immediate income Rt+1 corresponding to the action At+1 .

Step 208: Deploy the trained deep reinforcement learning model online.

Through continuous interaction with the simulator, continuous exploration of the action space, coupled with a reasonably designed model structure and learning strategy, the deep reinforcement learning model can learn excellent action strategy models and corresponding action evaluation models through offline training. . After the training is completed, the trained deep reinforcement learning model can be deployed online. It is convenient for subsequent real-time generation of power system dispatching strategies based on the trained deep reinforcement learning model.

Specifically, the data processing method provided in this specification under the scenario of real-time scheduling of the power system, after training the deep reinforcement learning model, can perform online real-time or quasi-real-time scheduling based on the deep reinforcement learning model. Or during quasi-real-time scheduling, the deep reinforcement learning model and mathematical model can be used to respond to action decisions based on the observed real power grid environment state. As shown in the dotted box of "online real-time\quasi-real-time scheduling" in Figure 2; specifically, the following steps are included.

Step 210: Obtain the real grid environment status.

Wherein, the real power grid environment state can be understood as the current state operation data in the above embodiment;

Specifically, by observing the real grid environment state, the real grid environment state St is obtained, and the real grid environment state St is input into the deep reinforcement learning model. Among them, the grid environment state St can be understood as the above The current running status data in the above-mentioned embodiment.

Step 212: The deep reinforcement learning model obtains an initial action strategy based on the real power grid environment state.

Wherein, the initial action strategy can be understood as an initial scheduling strategy.

Specifically, through the action strategy model in the deep reinforcement learning model, the action At is obtained in response to the real power grid environment state St, and the action At can be the power system that will contain (generator A, generator B, generator C) The generator A is turned on, and the generator A is set to generate electricity with a minimum limit (50 MW), the power generation of the generator B is set to 100 MW, and the power generation of the generator C is set to 100 MW. Wherein, the action At may be understood as the initial scheduling policy in the above embodiment.

The variable in the action At is adjusted to obtain the variable-adjusted action A1 and action A2. The action A1 can be: set the generator A in the power system to perform minimum quota power generation (50 MW), and set the power generation The power generation of generator B is set to 100 MW, and the power generation of generator C is set to 80 MW;

Action A2 can be: set generator A in the power system to generate the minimum quota (50 MW), set the power generation of generator B to 70 MW, and set the power generation of generator C to 100 MW .

Use the action evaluation model to evaluate the action At, the variable-adjusted action A1, and the action A2 to obtain the immediate income Rt; where the immediate income Rt of the action At is 1 point, and the immediate income Rt of the variable-adjusted action A1 is 0.3 points, and the immediate benefit Rt of action A2 is 0.9 points.

Based on the immediate revenue Rt, the power generation (100 MW) of the generator B in the action At is determined as a decision variable affecting the long-term effect. And determine the generating capacity of generator A (50 MW) in this action At as a discrete decision variable.

Fix the discrete decision variable and the decision variable affecting the long-term effect in the action At, that is, set the discrete decision variable and the decision variable affecting the long-term effect as unmodifiable, so as to obtain the initial action strategy, and input the initial action strategy To the mathematical model (ACOPF model or DCOPF model).

In the process of specific implementation, when performing online scheduling, use the action evaluation model Critic that has been learned by deep reinforcement learning to judge which decision variables will affect the long-term effect more. Specifically, each continuous decision variable in the action can be individually Disturbance, to observe the change of Critic output results, if the change is large, it can be considered that the decision variable has a greater impact on future earnings. Moreover, issues such as unit combination, ramping, and N-1 power grid safety criteria can be dealt with from the perspective of long-term profit maximization.

Step 214: Adjust other variables in the initial action strategy through a mathematical model, so as to obtain a scheduling decision result.

Wherein, the mathematical model can be understood as the target decision model in the above embodiment, and the scheduling decision result can be understood as the target scheduling policy in the above embodiment.

Specifically, after fixing the discrete decision variables in the action At and the decision variables that affect the long-term effect and obtaining the initial action strategy, the remaining variables in the initial action strategy are adjusted and modified through the mathematical model to perform a further single-step Optimization and the guarantee of physical security constraints, so as to obtain the target scheduling strategy.

For example, the initial action strategy can be to turn on generator A, and set the generator A to generate the minimum quota (50 megawatts), set the power generation of generator B to 100 megawatts, and set the power generation of generator C to 100 megawatts. Among them, the power generation capacity of the generator A (50 megawatts) and the power generation capacity of the generator C (100 megawatts) have been fixed. Based on this, the initial action strategy is input into the mathematical model, and the power generation of generator B (100 MW) in the initial action strategy is adjusted through the mathematical model, so as to obtain the target dispatching strategy. In this way, the single-step optimization of the initial action strategy and the guarantee of physical safety constraints can avoid the problem of potential safety hazards in the decision-making results of reinforcement learning. At the same time, because the results of some decision variables have been fixed, the processing speed of the mathematical model is further improved.

Among them, the target scheduling strategy can be: turn on generator A, and set the generator A to generate electricity with a minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to Set to 100 MW.

Step 216: Adjust the current operating state of the power system based on the scheduling decision result.

Specifically, based on the scheduling decision result, generator A in the power system is turned on, and the generator A is set to perform minimum quota power generation (50 megawatts), the power generation of generator B is set to 80 megawatts, and the generator C's power generation is set to 100 MW.

In practical applications, it also includes when more attention is paid to the immediate benefits after the decision-making response, for example, the over-the-line line of the power grid must be restored immediately. In addition to the discrete decision variables, other All decision variables can be readjusted in the mathematical modeling method ACOPF/DCOPF, which makes it easier to optimize the current benefits.

The data processing method provided in this manual combines mathematical modeling ACOPF/DCOPF with reinforcement learning. On the basis of the output results of reinforcement learning, the remaining decision variables are calculated by fixing discrete decision variables and the results of decision variables that affect long-term effects. Solving mathematical modeling can speed up the solving of mathematical modeling and ensure that the solution results meet safety constraints. At the same time, due to the influence of reinforcement learning, it can also take into account the optimization of long-term goals, and the final decision-making time can be controlled within 5 minutes. Within this period, the effect of real-time/quasi-real-time dispatching is basically achieved, making the safe operation of the power grid capable of real-time dispatching. At the same time, because of the participation of reinforcement learning, the results of the entire scheduling decision also take into account the long-term overall benefits.

Fig. 3 shows a flow chart of another data processing method provided according to an embodiment of the present specification, which specifically includes the following steps.

Step 302: Obtain current operating state data of the target power system.

Step 304: Input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system.

Step 306: Input the initial dispatch strategy into the target decision object to obtain the target dispatch strategy of the target power system.

Step 308: Adjust the current operating state data of the target power system based on the target dispatch strategy.

Specifically, in another data processing method provided in this specification, the current operating state of the target power system can be obtained, and the current operating state data is input into the initial decision object to obtain the initial dispatching strategy of the target power system; then Input the initial dispatching strategy into the target decision object to obtain the target dispatching strategy of the target power system. And based on the target scheduling strategy, the current operating status data of the target power system is adjusted, so that the target power system has the ability of large-scale processing and rapid response, and alleviates the difficulties faced by the manual scheduling scheme.

Following the above example, another data processing method applied in this specification to the scene of adjusting the current operating state of the power system can obtain the current operating state parameters of the power system, and the current operating state parameters can be the current power consumption of the power system. The load is high, and among the three generators (generator A, generator B, and generator C) in the power system, generator A is off, and generator B and generator C are on.

After the current operating state data of the power system is obtained, the current operating state data is input into the deep reinforcement learning model, and based on the deep reinforcement learning model, the current state of the power system can be judged based on the current operating state data, thereby generating a schedule Strategy. For example, when the deep reinforcement learning model judges that the current power load of the power system is high based on the current operating status data, it can determine a dispatch strategy that can cope with the high power load of the power system. The scheduling strategy output by the deep reinforcement learning model can be as follows: turn on generator A in the power system, set the generator A to generate the minimum quota of power generation (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of the generator C to 90 megawatts.

After the scheduling policy output by the deep reinforcement learning model, specific variables (discrete decision variables and decision variables affecting long-term effects) in the scheduling policy can be fixed to obtain the initial scheduling policy. the initial scheduling policy The unfixed decision variables in the initial scheduling strategy are adjusted and modified through the mathematical model to carry out further single-step optimization and guarantee of physical safety constraints, so as to obtain the target scheduling strategy. Among them, the target scheduling strategy is: turn on generator A, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to for the case of 100 MW.

Based on the target scheduling strategy, turn on generator A in the power system, and set the generator A to generate the minimum quota (50 MW), set the power generation of generator B to 80 MW, and set the power generation of generator C to The amount is set to 100 MW.

In an embodiment provided in this specification, the initial decision object may also include two sub-modules, which are respectively a strategy determination module and a strategy evaluation module; based on this, the strategy determination module and the strategy evaluation module in the initial decision object may Process the current operating status data of the target power system to obtain the initial dispatch strategy of the target power system, thereby improving the processing efficiency of subsequent target decision-making objects, and further enabling the target power system to have large-scale processing and rapid response capabilities. The way is as follows.

The inputting the current operating state data into the initial decision-making object to obtain the initial dispatch strategy of the target power system includes:

Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;

The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;

An initial dispatch strategy of the target power system is determined based on the dispatch strategy to be processed and a corresponding strategy evaluation result.

Following the above example, the policy determination module is a policy determination sub-model in the deep reinforcement learning model. Based on this, after obtaining the current operating state data of the power system, the current operating state data is input into the strategy determination sub-model in the deep reinforcement learning model, and the current power consumption load of the power system is relatively high based on the strategy determination sub-model. In the case of , determine the dispatch strategy to be processed that can cope with the power load of the power system. The scheduling strategy for determining the output of the sub-model in this strategy can be as follows: turn on generator A in the power system, set the generator A to generate electricity with a minimum quota (50 MW), and set the power generation of generator B to 100 MW , Set the generating capacity of generator C to 100 megawatts.

After the strategy determination sub-model obtains the dispatch strategy to be processed based on the current operating state data of the power system, it can determine the variable that can be adjusted in the dispatch strategy to be processed, and the variable can be the power generation of the generator.

And by modifying the variables based on the preset rules, the power generation of the generators in the dispatching strategy to be processed is adjusted, so as to obtain the dispatching strategy to be processed after variable adjustment. The dispatching strategy A to be processed after variable adjustment can be: Generator A in the power system is set to generate the minimum quota (50 MW), generator B is set to generate 100 MW, and generator C is set to generate 80 MW. The scheduling strategy B to be processed after variable adjustment can be: set generator A in the power system to produce the minimum quota power generation (50 MW), set the power generation of generator B to 70 MW, set generator C’s The power generation is set to 100 MW.

After the variables in the scheduling strategy to be processed are adjusted, the scheduling strategy to be processed, the variable-adjusted pending scheduling strategy A, and the variable-adjusted pending scheduling strategy B are respectively input into the decision evaluation sub-module for evaluation, Thus, the evaluation result is obtained; the evaluation result of the scheduling strategy to be processed can be 1 point; the evaluation result of the scheduling strategy A to be processed after the variable adjustment can be 0.3 points, and the evaluation result of the scheduling strategy B to be processed after the variable adjustment can be 0.9 point. Afterwards, the three evaluation results are used as the strategy evaluation results corresponding to the data to be processed.

According to the evaluation result of the strategy, it is determined that the power generation of generator B (100 MW) in the scheduling strategy to be processed is determined as a decision variable affecting the long-term effect. And based on the preset determination condition, the power generation capacity (50 megawatts) of the generator A in the action At is determined as a discrete decision variable.

Then set the decision variables and discrete decision variables that affect the long-term effect in the pending scheduling strategy as fixed parameters The number, that is, the decision variable that cannot be modified and adjusted, so as to obtain the initial scheduling strategy.

Another data processing method provided in this specification is to process the current operating state data of the target energy system through the initial decision object and the target decision object, obtain the target dispatching strategy of the target power system, and use the target dispatching strategy to quickly Adjusting the current operating status data of the target power system enables the target power system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual dispatching schemes.

The foregoing is a schematic solution of another data processing method in this embodiment. It should be noted that the technical solution of the other data processing method belongs to the same idea as the technical solution of the above-mentioned one data processing method, and the details of the technical solution of the other data processing method that are not described in detail can be referred to above A description of the technical solution of a data processing method.

Corresponding to the foregoing method embodiments, this specification also provides an embodiment of a data processing device. FIG. 4 shows a schematic structural diagram of a data processing device provided by an embodiment of this specification. As shown in Figure 4, the device includes:

The data acquisition module 402 is configured to acquire the current operating state data of the target energy system;

The first strategy acquisition module 404 is configured to input the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target energy system;

The second strategy acquisition module 406 is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target energy system;

The adjustment module 408 is configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.

Optionally, the first policy acquisition module 404 is configured to:

An initial scheduling policy is determined based on the pending scheduling policy and a corresponding policy evaluation result.

Optionally, the first policy acquisition module 404 is configured to:

inputting the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a second evaluation result of the adjusted scheduling policy to be processed;

Optionally, the first policy acquisition module 404 is configured to:

Optionally, the data processing device further includes a processing module configured to:

Determining simulated running state data based on the state simulation module;

Inputting the simulated running status data into the decision object to be processed to obtain a simulated scheduling strategy;

The decision object to be processed is processed based on the simulation scheduling strategy to obtain an initial decision object.

Optionally, the processing module is further configured to:

Sample operating state data is determined based on the state simulation module.

Optionally, the processing module is further configured to:

Optionally, the first policy acquiring module 404 is further configured to

The data processing device provided in this manual processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target energy system’s target scheduling strategy, and quickly implements the target energy system through the target scheduling strategy. The current operating status data of the system is adjusted to enable the target energy system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual scheduling schemes.

The foregoing is a schematic solution of a data processing device in this embodiment. It should be noted that the technical solution of the data processing device and the technical solution of the above-mentioned data processing method belong to the same concept, and details of the technical solution of the data processing device that are not described in detail can be found in the above-mentioned A description of the technical solution of the data processing method.

Corresponding to the foregoing method embodiments, this specification also provides another embodiment of a data processing device. FIG. 5 shows a schematic structural diagram of another data processing device provided by an embodiment of this specification. As shown in Figure 5, the device package include:

A data acquisition module 502 configured to acquire current operating state data of the target power system;

The first strategy acquisition module 504 is configured to input the current operating state data into an initial decision object, and obtain an initial dispatch strategy of the target power system;

The second strategy acquisition module 506 is configured to input the initial dispatch strategy into the target decision object, and obtain the target dispatch strategy of the target power system;

The adjustment module 508 is configured to adjust the current operating state data of the target power system based on the target dispatch strategy.

Optionally, the first policy acquisition module 504 is further configured to:

Another data processing device provided in this manual processes the current operating state data of the target energy system through the initial decision object and the target decision object, obtains the target dispatching strategy of the target power system, and quickly Adjusting the current operating status data of the target power system enables the target power system to have large-scale processing and rapid response capabilities, alleviating the difficulties faced by manual dispatching schemes.

The foregoing is a schematic solution of another data processing device in this embodiment. It should be noted that the technical solution of the other data processing device belongs to the same concept as the technical solution of the above-mentioned another data processing method, and details of the technical solution of the other data processing device that are not described in detail can be found in A description of the technical solution of the above another data processing method.

FIG. 6 shows a structural block diagram of a computing device 600 provided according to an embodiment of this specification. Components of the computing device 600 include, but are not limited to, memory 610 and processor 620 . The processor 620 is connected to the memory 610 through the bus 630, and the database 650 is used for saving data.

Computing device 600 also includes an access device 640 that enables computing device 600 to communicate via one or more networks 660 . Examples of these networks include the Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or a combination of communication networks such as the Internet. Access device 640 may include one or more of any type of network interface (e.g., a network interface card (NIC)), wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, Worldwide Interoperability for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, etc.

In an embodiment of the present specification, the above-mentioned components of the computing device 600 and other components not shown in FIG. 6 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 6 is only for the purpose of illustration, rather than limiting the scope of this description. Those skilled in the art can add or replace other components as needed.

Computing device 600 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile telephones (e.g., smartphones), ), wearable computing devices (eg, smart watches, smart glasses, etc.), or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. Computing device 600 may also be a mobile or stationary server.

Wherein, the processor 620 is configured to execute the following computer-executable instructions. When the computer-executable instructions are executed by the processor 420, the steps of the above-mentioned data processing method are implemented.

The foregoing is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device and the above-mentioned technical solution of the data processing method belong to the same concept, and details not described in detail in the technical solution of the computing device can refer to the description of the technical solution of the above-mentioned data processing method.

An embodiment of the present specification also provides a computer-readable storage medium, which stores computer-executable instructions, and implements the steps of the above-mentioned data processing method when the computer-executable instructions are executed by a processor.

The foregoing is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium and the above-mentioned technical solution of the data processing method belong to the same idea, and details of the technical solution of the storage medium that are not described in detail can be found in the description of the technical solution of the above-mentioned data processing method.

An embodiment of the present specification also provides a computer program, wherein, when the computer program is executed in a computer, the computer is caused to execute the steps of the above data processing method.

The foregoing is a schematic solution of a computer program in this embodiment. It should be noted that the technical solution of the computer program and the technical solution of the above-mentioned data processing method belong to the same concept, and details not described in detail in the technical solution of the computer program can refer to the description of the technical solution of the above-mentioned data processing method.

The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

The computer instructions include computer program code, which may be in source code form, object code form, executable file or some intermediate form or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excludes electrical carrier signals and telecommunication signals.

It should be noted that, for the sake of simplicity of description, the aforementioned method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the embodiments of this specification are not limited by the described action sequences. Because according to the embodiment of the present specification, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of the specification.

In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are only for helping to explain the present specification. Alternative embodiments are not exhaustive in all detail, nor are the inventions limited to specific implementations described. Obviously, many modifications and changes can be made according to the contents of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that those skilled in the art can well understand and use this specification. This specification is to be limited only by the claims, along with their full scope and equivalents.

Claims

A data processing method, comprising:

Obtain the current operating status data of the target energy system;

Inputting the current operating status data into an initial decision object to obtain an initial scheduling strategy for the target energy system;

Inputting the initial scheduling strategy into the target decision object to obtain the target scheduling strategy of the target energy system;

The current operating state data of the target energy system is adjusted based on the target dispatch strategy.
According to the data processing method according to claim 1, said inputting said current operating state data into an initial decision-making object to obtain an initial scheduling strategy of said target energy system comprises:

Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;

The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;

An initial scheduling policy is determined based on the pending scheduling policy and a corresponding policy evaluation result.
According to the data processing method according to claim 2, the policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed, including:

modifying the decision parameters in the scheduling strategy to be processed based on preset parameter adjustment rules to obtain the adjusted scheduling strategy to be processed;

Inputting the scheduling policy to be processed and the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object to obtain a policy evaluation result of the scheduling policy to be processed.
According to the data processing method according to claim 3, said inputting said pending scheduling strategy and said adjusted pending scheduling strategy into the policy evaluation module of said initial decision object, to obtain the strategy of said pending scheduling strategy Assessment results, including:

inputting the scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a first evaluation result of the scheduling policy to be processed;

inputting the adjusted scheduling policy to be processed into the policy evaluation module of the initial decision object, and obtaining a second evaluation result of the adjusted scheduling policy to be processed;

Determine a policy evaluation result of the scheduling policy to be processed based on the first evaluation result and the second evaluation result.
The data processing method according to claim 2, said determining an initial scheduling strategy based on said scheduling strategy to be processed and a corresponding strategy evaluation result, comprising:

determining a first parameter in the pending scheduling policy based on the policy evaluation result;

determining a second parameter in the scheduling policy to be processed based on preset determination conditions;

Setting the first parameter and the second parameter in the pending scheduling strategy as fixed parameters to obtain an initial scheduling strategy.
According to the data processing method according to claim 1, before the acquisition of the current operating state data of the target energy system, further comprising:

Determining simulated running state data based on the state simulation module;

Inputting the simulated running status data into the decision object to be processed to obtain a simulated scheduling strategy;

The decision object to be processed is processed based on the simulation scheduling strategy to obtain an initial decision object.
According to the data processing method according to claim 6, said processing of said pending solutions based on said simulated scheduling strategy Process the policy object to obtain the initial decision object, including:

Evaluating the simulation scheduling strategy based on the state simulation module to obtain a simulation strategy evaluation result;

Based on the evaluation result of the simulation policy and the simulation scheduling policy, the decision object to be processed is processed to obtain an initial decision object.
According to the data processing method according to claim 6, the processing of the decision object to be processed based on the evaluation result of the simulation strategy and the simulation scheduling strategy to obtain an initial decision object comprises:

Processing the policy determination module in the pending decision object based on the simulated policy evaluation result to obtain the processed policy determination module;

Based on the simulated policy evaluation result and the simulated scheduling policy, process the policy evaluation module in the pending decision object to obtain the processed policy evaluation module;

An initial decision object is determined based on the processed policy determination module and the processed policy evaluation module.
According to the data processing method according to claim 1, said inputting said current operating state data into an initial decision-making object to obtain an initial scheduling strategy of said target energy system comprises:

determining the similarity between the historical operating state data and the current operating state data, and determining the historical operating state data corresponding to the maximum similarity among the similarities as similar operating state data to the current operating state data;

Obtaining a historical scheduling policy corresponding to the similar operating status data, wherein the historical scheduling policy is a historical target scheduling policy obtained based on a target decision object;

Inputting the current operating state data and the historical scheduling strategy into the initial decision object to obtain the updated scheduling strategy of the target energy system;

Send the to-be-updated scheduling policy of the target energy system to the policy update object;

An initial scheduling policy sent by the policy update object is received, wherein the initial scheduling policy is obtained by updating the scheduling policy to be updated based on preset update conditions by the policy update object.
A data processing method, comprising:

Obtain the current operating state data of the target power system;

Inputting the current operating state data into an initial decision object to obtain an initial dispatch strategy of the target power system;

Inputting the initial dispatch strategy into a target decision object to obtain a target dispatch strategy of the target power system;

The current operating state data of the target power system is adjusted based on the target dispatch strategy.
According to the data processing method according to claim 10, said inputting said current operating status data into an initial decision-making object to obtain an initial scheduling strategy of said target power system comprises:

Inputting the current running state data into the policy determination module of the initial decision object to obtain the scheduling policy to be processed;

The policy evaluation module based on the initial decision object processes the scheduling policy to be processed, and obtains a policy evaluation result corresponding to the scheduling policy to be processed;

An initial dispatch strategy of the target power system is determined based on the dispatch strategy to be processed and a corresponding strategy evaluation result.
A data processing device, comprising:

A data acquisition module configured to acquire current operating state data of the target energy system;

The first strategy acquisition module is configured to input the current operating status data into an initial decision object to obtain an initial dispatch strategy of the target energy system;

The second policy acquisition module is configured to input the initial scheduling policy into the target decision object, and obtain the target Target scheduling strategy for energy systems;

An adjustment module configured to adjust the current operating state data of the target energy system based on the target dispatch strategy.
A computing device comprising:

memory and processor;

The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the computer-executable instructions are executed by the processor, the computer-executable instructions described in any one of claims 1 to 9 or claims 10 to 11 are implemented. The steps of the data processing method described above.
A computer-readable storage medium, which stores computer-executable instructions. When the computer-executable instructions are executed by a processor, the steps of the data processing method described in any one of claims 1 to 9 or claims 10 to 11 are implemented.