CN112287972A

CN112287972A - Power system power flow adjusting method based on reinforcement learning and multi-source data integration

Info

Publication number: CN112287972A
Application number: CN202011039205.2A
Authority: CN
Inventors: 胡伟; 吴双; 仲立军; 盛银波; 郭秋婷
Original assignee: Tsinghua University; State Grid Corp of China SGCC; Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Tsinghua University; State Grid Corp of China SGCC; Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2021-01-29

Abstract

The invention discloses a method and a device for adjusting power system power flow based on reinforcement learning and multi-source data integration, wherein the method comprises the following steps: integrating the tide data of the power system by using a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by using a multithreading technology, and storing the generated tide sample into the database; constructing a power system reinforcement learning environment, and constructing a value function training model suitable for load flow adjustment in the power system reinforcement learning environment; and (3) an intelligent adjustment strategy based on a reinforcement learning algorithm, and a value function training model for load flow adjustment is trained by adopting an SARSA algorithm according to load flow data in a database so as to generate a power system load flow adjustment strategy. The method is more intelligent and automatic, has high efficiency and can quickly expand the tidal current sample library.

Description

Power system power flow adjusting method based on reinforcement learning and multi-source data integration

Technical Field

The invention relates to the technical field of electric power system analysis and calculation, in particular to a method and a device for adjusting power system load flow based on reinforcement learning and multi-source data integration.

Background

In the analysis and compilation process of the operation mode of the power system, the load flow calculation is undoubtedly the most basic and important. The multi-source heterogeneous data is difficult to store and analyze, and inconvenience is brought to power flow analysis of the power system. In addition, in the traditional power system mode calculation, the power flow of the power system is manually adjusted manually according to experience, and the power flow adjustment mode is relatively extensive and low in efficiency, so that the requirement on the fine management of the current power system cannot be met.

With higher requirements on the refinement of analysis of the operation mode of the power system, the generation and adjustment of a large number of operation tide modes need to be realized. The main ideas of the traditional power system load flow adjusting method are model solving and manual parameter adjustment. When the model is solved, a power flow equation with constraint is solved so as to obtain a convergence power flow, and the method is difficult to model and solve when the system scale is large; the manual parameter adjustment depends on manual experience seriously, the adjustment efficiency is low, the trend convergence difficulty is high, time and labor are wasted, and the adjustment result is limited. Therefore, in order to solve the problem of the current and future power flow adjustment in the operation mode of the power system, a more intelligent and refined power flow adjustment means is urgently needed to deal with the actual variable operation state of the power system, so that the overall safety, stability, economy and flexibility of the system are ensured.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present invention is to provide a power system flow adjustment method based on reinforcement learning and multi-source data integration, which uses an automatic reinforcement learning adjustment strategy for power system flow adjustment and sample generation, and can generate diversified flow samples while achieving intellectualization of flow adjustment, thereby improving adjustment efficiency.

The invention further aims to provide a power system flow adjusting device based on reinforcement learning and multi-source data integration.

In order to achieve the above object, an embodiment of the present invention provides a power flow adjustment method for an electric power system based on reinforcement learning and multi-source data integration, including the following steps:

s1, integrating the tide data of the power system by using a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by using a multithreading technology, and storing the generated tide sample into the database;

s2, constructing a power system reinforcement learning environment, and constructing a value function training model suitable for load flow adjustment in the power system reinforcement learning environment;

and S3, training the power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate the power flow adjustment strategy of the power system.

In order to achieve the above object, an embodiment of the present invention provides a power flow adjusting apparatus for an electrical power system based on reinforcement learning and multi-source data integration, including:

the multi-source data integration module is used for integrating the tide data of the power system by utilizing a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by utilizing a multithreading technology, and storing the generated tide sample into the database;

the model building module is used for building a power system reinforcement learning environment, and in the power system reinforcement learning environment, a value function training model suitable for load flow adjustment is built;

and the model training module is used for training the power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate the power flow adjustment strategy of the power system.

The method and the device for adjusting the power system trend based on reinforcement learning and multi-source data integration have the advantages that:

(1) by adopting the database, the load flow calculation data of the multi-source heterogeneous power system can be integrated, so that the data can be conveniently stored and read; and the load flow calculation is carried out by adopting a multithreading technology, so that the efficiency of generating the sample can be improved.

(2) In the construction of the reinforcement learning model, the characteristics of power system load flow calculation are fully considered, a value function model suitable for power system load flow adjustment is constructed, uncertainty factors are artificially added, and diversified convergent load flows can be automatically generated while non-convergent load flows are intelligently adjusted.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a power system flow adjustment method based on reinforcement learning and multi-source data integration according to an embodiment of the invention;

FIG. 2 is a flow chart of a power system flow adjustment method based on reinforcement learning and multi-source data integration according to another embodiment of the present invention;

FIG. 3 is a block diagram of a multi-threaded computing framework according to one embodiment of the invention;

FIG. 4 is a state transition diagram of a reinforcement learning model according to one embodiment of the present invention;

FIG. 5 is a flow chart of policy learning that takes into account power system flow adjustment features according to one embodiment of the present invention;

fig. 6 is a schematic structural diagram of a power flow adjustment apparatus of an electrical power system based on reinforcement learning and multi-source data integration according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a power flow adjustment method and device based on reinforcement learning and multi-source data integration, which are provided by the embodiments of the present invention, with reference to the accompanying drawings.

First, a power flow adjustment method of an electric power system based on reinforcement learning and multi-source data integration according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a flow chart of a power flow adjustment method of an electric power system based on reinforcement learning and multi-source data integration according to an embodiment of the invention.

As shown in fig. 1, the power flow adjustment method based on reinforcement learning and multi-source data integration includes the following steps:

and step S1, integrating the tide data of the power system by using a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by using a multithreading technology, and storing the generated tide sample into the database.

Further, in one embodiment of the present invention, the integrating and storing the power flow data of the power system into the database by using the database technology comprises: and unpacking the multi-source tide data in the electric power system analysis and synthesis program, and integrating and storing the data into a database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load.

Referring to fig. 2, the embodiment of the invention includes three stages of multi-source data integration, model construction and model training, the first stage is a multi-source data integration stage, power system load flow data is normalized and stored by using a database technology, a sample is generated by using a multithreading technology, and a load flow sample library is supplemented efficiently. A second stage, namely a model construction stage, wherein a reinforcement learning adjustment model suitable for power flow adjustment of the power system is constructed according to the power flow calculation characteristics of the power system; and in the third stage, a model training stage, a reinforcement learning algorithm is adopted, a value function training model suitable for load flow adjustment is defined, and meanwhile uncertainty is artificially added, so that the model can generate diversified convergent load flow data while completing a load flow adjustment task. Because the traditional trend adjustment is time-consuming and labor-consuming according to manual experience, and the reinforcement learning algorithm has self-adaptability and intelligence, and can automatically carry out the trend under the condition of less manual intervention, the efficiency of the trend adjustment can be improved.

Particularly, in the multi-source data integration stage, the power flow data of the power system needs to be integrated into a unified form for organization management and storage. The most commonly used Power flow calculation tool in the calculation of the operation mode of the Power System is a Power System Analysis Software Package (PSASP) developed by the institute of electrical Power science in china. The program usually adopts a graphical operation interface, which is convenient for a power system mode calculator to perform manual work of load flow adjustment. This conventional adjustment method, while intuitive, is inefficient. The core of the application is the automation and intellectualization of the power flow adjustment, so that the power flow calculation data in the PSASP needs to be integrated, processed and stored, and managed by adopting the Mysql database.

Specifically, the power flow calculation of the PSASP can be generally performed only from the graphical interface, because the power flow calculation preparation data is stored by a plurality of individual files, respectively, and the information stored in each file is associated with each other. In order to realize the intelligent adjustment of the power flow, the power flow calculation preparation data needs to be called from the outside of a program, and the organization form of the PSASP data file cannot be directly called from the outside, so that the extraction, integration and processing of the multi-source calculation data are needed. These data files include: lf.l0, lf.l1, lf.l2, lf.l3, lf.l4, lf.l5, and lf.l 6. The LF.L0 contains control information data, including total number of buses, total number of alternating current lines, total number of transformer windings, number of direct current lines, number of generators, number of loads, number of areas and the like; l1 contains bus data, and each row represents a record of one bus, including bus detailed information, reference voltage, subareas, voltage upper and lower limits and the like; l2, the data of the alternating current line is contained, and each line represents an alternating current line record and comprises switching states at two sides of the alternating current line, detailed parameters of the alternating current line, a region to which the alternating current line belongs, line capacity, line name and the like; each line of lf.l3, lf.l4 represents a transformer and dc line record, respectively, including data similar to lf.l 3; the LF.L5 contains generator data, and each line represents a generator record and comprises an effective mark, a line number of a generator in the LF.L1, a node type, active power P, reactive power Q, a voltage amplitude value, a voltage phase angle, generator output upper and lower limits, a generator name and the like; lf.l6 contains load data, each row representing a load record, containing data similar to lf.l 5. After the multi-source tide data are unpacked, the data are integrated and stored in a Mysql database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load, so that subsequent calling is facilitated.

And calculating and expanding a load flow sample library by utilizing a multithread technology, wherein the load flow sample can be automatically generated from none to some, from few to many in the interaction process of the intelligent agent and the environment in the process of carrying out load flow intelligent adjustment by adopting a reinforcement learning method. If partial load flow sample data can be prepared in advance before model training, the efficiency of model training can be improved, and the time of model training is shortened. Therefore, in this stage, a multithreading technique is adopted, and the base trend data is selected from the database and then the trend sample is generated, as shown in fig. 3. Thread 1, thread 2 and thread 3 are three threads opened up for the sample generation program respectively, and each thread runs independently, so that each thread carries out load flow calculation operation independently. And the load flow calculation performed by each thread is independent, and the calculation results are summarized in the database to supplement the load flow sample library.

And step S2, constructing a power system reinforcement learning environment, and constructing a value function training model suitable for load flow adjustment in the power system reinforcement learning environment.

Further, the electric power system reinforcement learning environment includes: agent, environment, action, status, and reward; the power system itself participates in interaction with the intelligent agent as an environment, the state set comprises bus voltage level and line power flow level in the power system, the action set comprises increase or decrease of all controllable variables, and the return value is self-defined according to the power flow adjustment task.

The basic framework of reinforcement learning comprises five parts of an agent, an environment, an action, a state and a return, and is learning of mapping the environment state to the action, and the aim is to enable a controller to obtain the maximum accumulated return in the interaction process with the environment. In order to apply reinforcement learning to power flow adjustment of a power system, the reinforcement learning environment model suitable for power flow adjustment is constructed according to the characteristics of the power system, and is specifically shown in table 1:

TABLE 1 tidal current adjustment Environment model

In the power flow adjustment task, the power system itself participates in interaction with the intelligent agent as an environment; the state set comprises bus voltage level and line tide level in the power system; the action set comprises the increase or decrease of all controllable variables, for example, the increase of the active output of a certain generator is an action in the action set, and the decrease of the reactive output of another generator can also be regarded as an action in the action set; and the return value is self-defined according to the power flow adjustment task. In the present invention, the reward value is defined as (1):

the return value R contains three parts: r₁，R₂And R₃。R₁The output result given by the power flow convergence device is adopted in the inventionA deep neural network DNN comprising three hidden layers is used as a power flow astringer, wherein the output layer of the DNN uses a sigmoid activation function, and the output value ranges from 0 to 1 as shown in (2). When the output value f is less than p₁Is greater than p₂And at p₁And p₂In between, R₁Different return values are returned.

R2 represents the unbalance amount of the power system, as shown in equation (3). Sigma Q_genRepresents the reactive output of a generator in a certain area, sigma Q_loadRepresenting reactive load in the same area, B_qAnd the ratio of the reactive power output to the reactive demand of the area is expressed, and the reactive balance degree of the area is reflected.

R₃The penalty item is a penalty item in the reported value, and after the intelligent agent executes an action, if the output of the generator exceeds the limit, a negative value is returned; if not, R₃Equal to 0. The total return value in the process of one interaction is composed of the sum of the three return values.

The relationship of the environmental state transition of the power system based on reinforcement learning is shown in FIG. 4, wherein S_iIndicating the state, subscript i indicates a certain time scale, a_iIs shown in state S_iAction taken in the following, r_iIndicating the current reward resulting from taking the action, followed by S_iTransfer to the next state S_i+1. The total return value is (4), wherein gamma represents an attenuation coefficient, and a larger coefficient represents a larger influence of the accumulated return value on the state at the future time.

And step S3, training a power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate a power system power flow adjustment strategy.

In the embodiment of the invention, the characteristic of power system power flow adjustment is considered in the reinforcement learning algorithm, the SARSA algorithm is adopted to train the reinforcement learning model, a power system power flow adjustment strategy is generated, and the algorithm flow chart is shown in FIG. 5. The specific process is as follows:

1) randomly initializing all state action values Q (s, a);

2) setting the current round i to 1, if i < T:

a) initializing s as a first state, and selecting an action a through an epsilon-greedy algorithm;

b) executing the action a, and transferring the state s to the next state s', and returning a return value R;

c) selecting a next action a' by an epsilon-greedy algorithm;

d) updating the state action value Q (s, a) according to the following equation:

Q(s,a)＝Q(s,a)+α(R+γQ(s',a')-Q(s,a))

e) setting s 'as a current state s and setting a' as a current action a;

f) if s is the termination state, the current iteration is ended and s is made s +1, otherwise b) is returned.

In summary, the database technology is used for integrating the power system load flow data, the power system comprehensive analysis program (PSASP) data is converted and analyzed, and the reinforcement learning method considering the power system characteristics is used for realizing the power system load flow adjustment, so that the power system does not converge the load flow adjustment to the convergence, and diversified converged load flow samples are formed.

According to the power system trend adjusting method based on reinforcement learning and multi-source data integration, provided by the embodiment of the invention, power system trend data are integrated by utilizing a database technology and a multi-thread technology, power system comprehensive analysis program data are converted and analyzed to form a large number of power system trend samples so as to expand a sample library, and then a power system reinforcement learning environment is constructed for realizing interaction between an intelligent agent and the environment, so that a power system trend adjusting strategy is generated, and non-convergent trends are adjusted to be convergent. Compared with the traditional manual power flow adjustment method, the power system power flow adjustment method based on reinforcement learning and multi-source data integration has the characteristics of intellectualization and automation, and can quickly expand a power flow sample library. Compared with the traditional manual method, the method is more efficient, is very suitable for mode analysis and calculation of the power system, and has wide application prospect.

Next, a power flow adjustment device for an electric power system based on reinforcement learning and multi-source data integration according to an embodiment of the present invention will be described with reference to the drawings.

As shown in fig. 6, the power system flow adjusting apparatus based on reinforcement learning and multi-source data integration includes: a multi-source data integration module 601, a model construction module 602, and a model training module 603.

The multi-source data integration module 601 is configured to integrate the tidal current data of the power system by using a database technology and store the tidal current data in a database, extract basic tidal current data from the database, generate a tidal current sample for the basic tidal current data by using a multithreading technology, and store the generated tidal current sample in the database.

The model building module 602 is configured to build a power system reinforcement learning environment, and in the power system reinforcement learning environment, a value function training model suitable for load flow adjustment is built.

The model training module 603 is configured to train a power flow adjustment value function training model by using an SARSA algorithm according to power flow data in a database based on an intelligent adjustment strategy of a reinforcement learning algorithm, so as to generate a power flow adjustment strategy of an electric power system.

Further, in one embodiment of the present invention, the electric power system reinforcement learning environment includes: agent, environment, action, status, and reward;

the power system itself participates in interaction with the intelligent agent as an environment, the state set comprises bus voltage level and line power flow level in the power system, the action set comprises increase or decrease of all controllable variables, and the return value is self-defined according to the power flow adjustment task.

Further, in one embodiment of the present invention, the reported value is:

R＝R₁+R₂+R₃

wherein R is₁Is the output result given by the power flow convergence device, R2 represents the unbalance amount of the power system, R₃Is a penalty term in the reward value.

Further, in an embodiment of the present invention, the model training module is specifically configured to:

1) randomly initializing all state action values Q (s, a);

2) setting the current round i to 1, if i < T:

c) selecting a next action a' by an epsilon-greedy algorithm;

Q(s,a)＝Q(s,a)+α(R+γQ(s',a')-Q(s,a))

e) setting s 'as a current state s and setting a' as a current action a;

It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.

According to the power system trend adjusting device based on reinforcement learning and multi-source data integration, provided by the embodiment of the invention, power system trend data are integrated by utilizing a database technology and a multi-thread technology, power system comprehensive analysis program data are converted and analyzed to form a large number of power system trend samples so as to expand a sample library, and then a power system reinforcement learning environment is constructed for realizing interaction between an intelligent agent and the environment, so that a power system trend adjusting strategy is generated, and non-convergent trends are adjusted to be convergent. Compared with the traditional method for manually adjusting the trend, the method has the characteristics of intellectualization and automation, and can quickly expand the trend sample library. Therefore, compared with the traditional manual method, the method is more efficient, is very suitable for mode analysis and calculation of the power system, and has wide application prospect.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A power system power flow adjusting method based on reinforcement learning and multi-source data integration is characterized by comprising the following steps:

2. The method of claim 1, wherein the integrating and storing power flow data of the power system into a database using a database technique comprises: and unpacking the multi-source tide data in the electric power system analysis and synthesis program, and integrating and storing the data into a database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load.

3. The method of claim 1, wherein the power system reinforcement learning environment comprises: agent, environment, action, status, and reward;

4. The method of claim 3, wherein the reward value is:

R＝R₁+R₂+R₃

5. The method according to claim 1, wherein the S3 further comprises:

1) randomly initializing all state action values Q (s, a);

2) setting the current round i to 1, if i < T:

c) selecting a next action a' by an epsilon-greedy algorithm;

Q(s,a)＝Q(s,a)+α(R+γQ(s',a')-Q(s,a))

e) setting s 'as a current state s and setting a' as a current action a;

6. A power system trend adjusting device based on reinforcement learning and multi-source data integration is characterized by comprising:

7. The apparatus of claim 6, wherein the integrating and storing the power flow data of the power system into the database using the database technique comprises: and unpacking the multi-source tide data in the electric power system analysis and synthesis program, and integrating and storing the data into a database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load.

8. The apparatus of claim 6, wherein the power system reinforcement learning environment comprises: agent, environment, action, status, and reward;

9. The apparatus of claim 6, wherein the reward value is:

R＝R₁+R₂+R₃

10. The apparatus of claim 6, wherein the model training module is specifically configured to:

1) randomly initializing all state action values Q (s, a);

2) setting the current round i to 1, if i < T:

c) selecting a next action a' by an epsilon-greedy algorithm;

Q(s_,a)＝Q(s_,a)+α(R+γQ(s',a')-Q(s_,a))

e) setting s 'as a current state s and setting a' as a current action a;