CN112287972A - Power system power flow adjusting method based on reinforcement learning and multi-source data integration - Google Patents
Power system power flow adjusting method based on reinforcement learning and multi-source data integration Download PDFInfo
- Publication number
- CN112287972A CN112287972A CN202011039205.2A CN202011039205A CN112287972A CN 112287972 A CN112287972 A CN 112287972A CN 202011039205 A CN202011039205 A CN 202011039205A CN 112287972 A CN112287972 A CN 112287972A
- Authority
- CN
- China
- Prior art keywords
- power system
- database
- data
- reinforcement learning
- power
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000010354 integration Effects 0.000 title claims abstract description 32
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000005516 engineering process Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims abstract description 18
- 230000009471 action Effects 0.000 claims description 41
- 230000003993 interaction Effects 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 230000008520 organization Effects 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 19
- 230000008569 process Effects 0.000 description 6
- 206010063385 Intellectualisation Diseases 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000004804 winding Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Water Supply & Treatment (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a method and a device for adjusting power system power flow based on reinforcement learning and multi-source data integration, wherein the method comprises the following steps: integrating the tide data of the power system by using a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by using a multithreading technology, and storing the generated tide sample into the database; constructing a power system reinforcement learning environment, and constructing a value function training model suitable for load flow adjustment in the power system reinforcement learning environment; and (3) an intelligent adjustment strategy based on a reinforcement learning algorithm, and a value function training model for load flow adjustment is trained by adopting an SARSA algorithm according to load flow data in a database so as to generate a power system load flow adjustment strategy. The method is more intelligent and automatic, has high efficiency and can quickly expand the tidal current sample library.
Description
Technical Field
The invention relates to the technical field of electric power system analysis and calculation, in particular to a method and a device for adjusting power system load flow based on reinforcement learning and multi-source data integration.
Background
In the analysis and compilation process of the operation mode of the power system, the load flow calculation is undoubtedly the most basic and important. The multi-source heterogeneous data is difficult to store and analyze, and inconvenience is brought to power flow analysis of the power system. In addition, in the traditional power system mode calculation, the power flow of the power system is manually adjusted manually according to experience, and the power flow adjustment mode is relatively extensive and low in efficiency, so that the requirement on the fine management of the current power system cannot be met.
With higher requirements on the refinement of analysis of the operation mode of the power system, the generation and adjustment of a large number of operation tide modes need to be realized. The main ideas of the traditional power system load flow adjusting method are model solving and manual parameter adjustment. When the model is solved, a power flow equation with constraint is solved so as to obtain a convergence power flow, and the method is difficult to model and solve when the system scale is large; the manual parameter adjustment depends on manual experience seriously, the adjustment efficiency is low, the trend convergence difficulty is high, time and labor are wasted, and the adjustment result is limited. Therefore, in order to solve the problem of the current and future power flow adjustment in the operation mode of the power system, a more intelligent and refined power flow adjustment means is urgently needed to deal with the actual variable operation state of the power system, so that the overall safety, stability, economy and flexibility of the system are ensured.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide a power system flow adjustment method based on reinforcement learning and multi-source data integration, which uses an automatic reinforcement learning adjustment strategy for power system flow adjustment and sample generation, and can generate diversified flow samples while achieving intellectualization of flow adjustment, thereby improving adjustment efficiency.
The invention further aims to provide a power system flow adjusting device based on reinforcement learning and multi-source data integration.
In order to achieve the above object, an embodiment of the present invention provides a power flow adjustment method for an electric power system based on reinforcement learning and multi-source data integration, including the following steps:
s1, integrating the tide data of the power system by using a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by using a multithreading technology, and storing the generated tide sample into the database;
s2, constructing a power system reinforcement learning environment, and constructing a value function training model suitable for load flow adjustment in the power system reinforcement learning environment;
and S3, training the power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate the power flow adjustment strategy of the power system.
In order to achieve the above object, an embodiment of the present invention provides a power flow adjusting apparatus for an electrical power system based on reinforcement learning and multi-source data integration, including:
the multi-source data integration module is used for integrating the tide data of the power system by utilizing a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by utilizing a multithreading technology, and storing the generated tide sample into the database;
the model building module is used for building a power system reinforcement learning environment, and in the power system reinforcement learning environment, a value function training model suitable for load flow adjustment is built;
and the model training module is used for training the power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate the power flow adjustment strategy of the power system.
The method and the device for adjusting the power system trend based on reinforcement learning and multi-source data integration have the advantages that:
(1) by adopting the database, the load flow calculation data of the multi-source heterogeneous power system can be integrated, so that the data can be conveniently stored and read; and the load flow calculation is carried out by adopting a multithreading technology, so that the efficiency of generating the sample can be improved.
(2) In the construction of the reinforcement learning model, the characteristics of power system load flow calculation are fully considered, a value function model suitable for power system load flow adjustment is constructed, uncertainty factors are artificially added, and diversified convergent load flows can be automatically generated while non-convergent load flows are intelligently adjusted.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a power system flow adjustment method based on reinforcement learning and multi-source data integration according to an embodiment of the invention;
FIG. 2 is a flow chart of a power system flow adjustment method based on reinforcement learning and multi-source data integration according to another embodiment of the present invention;
FIG. 3 is a block diagram of a multi-threaded computing framework according to one embodiment of the invention;
FIG. 4 is a state transition diagram of a reinforcement learning model according to one embodiment of the present invention;
FIG. 5 is a flow chart of policy learning that takes into account power system flow adjustment features according to one embodiment of the present invention;
fig. 6 is a schematic structural diagram of a power flow adjustment apparatus of an electrical power system based on reinforcement learning and multi-source data integration according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a power flow adjustment method and device based on reinforcement learning and multi-source data integration, which are provided by the embodiments of the present invention, with reference to the accompanying drawings.
First, a power flow adjustment method of an electric power system based on reinforcement learning and multi-source data integration according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 1 is a flow chart of a power flow adjustment method of an electric power system based on reinforcement learning and multi-source data integration according to an embodiment of the invention.
As shown in fig. 1, the power flow adjustment method based on reinforcement learning and multi-source data integration includes the following steps:
and step S1, integrating the tide data of the power system by using a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by using a multithreading technology, and storing the generated tide sample into the database.
Further, in one embodiment of the present invention, the integrating and storing the power flow data of the power system into the database by using the database technology comprises: and unpacking the multi-source tide data in the electric power system analysis and synthesis program, and integrating and storing the data into a database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load.
Referring to fig. 2, the embodiment of the invention includes three stages of multi-source data integration, model construction and model training, the first stage is a multi-source data integration stage, power system load flow data is normalized and stored by using a database technology, a sample is generated by using a multithreading technology, and a load flow sample library is supplemented efficiently. A second stage, namely a model construction stage, wherein a reinforcement learning adjustment model suitable for power flow adjustment of the power system is constructed according to the power flow calculation characteristics of the power system; and in the third stage, a model training stage, a reinforcement learning algorithm is adopted, a value function training model suitable for load flow adjustment is defined, and meanwhile uncertainty is artificially added, so that the model can generate diversified convergent load flow data while completing a load flow adjustment task. Because the traditional trend adjustment is time-consuming and labor-consuming according to manual experience, and the reinforcement learning algorithm has self-adaptability and intelligence, and can automatically carry out the trend under the condition of less manual intervention, the efficiency of the trend adjustment can be improved.
Particularly, in the multi-source data integration stage, the power flow data of the power system needs to be integrated into a unified form for organization management and storage. The most commonly used Power flow calculation tool in the calculation of the operation mode of the Power System is a Power System Analysis Software Package (PSASP) developed by the institute of electrical Power science in china. The program usually adopts a graphical operation interface, which is convenient for a power system mode calculator to perform manual work of load flow adjustment. This conventional adjustment method, while intuitive, is inefficient. The core of the application is the automation and intellectualization of the power flow adjustment, so that the power flow calculation data in the PSASP needs to be integrated, processed and stored, and managed by adopting the Mysql database.
Specifically, the power flow calculation of the PSASP can be generally performed only from the graphical interface, because the power flow calculation preparation data is stored by a plurality of individual files, respectively, and the information stored in each file is associated with each other. In order to realize the intelligent adjustment of the power flow, the power flow calculation preparation data needs to be called from the outside of a program, and the organization form of the PSASP data file cannot be directly called from the outside, so that the extraction, integration and processing of the multi-source calculation data are needed. These data files include: lf.l0, lf.l1, lf.l2, lf.l3, lf.l4, lf.l5, and lf.l 6. The LF.L0 contains control information data, including total number of buses, total number of alternating current lines, total number of transformer windings, number of direct current lines, number of generators, number of loads, number of areas and the like; l1 contains bus data, and each row represents a record of one bus, including bus detailed information, reference voltage, subareas, voltage upper and lower limits and the like; l2, the data of the alternating current line is contained, and each line represents an alternating current line record and comprises switching states at two sides of the alternating current line, detailed parameters of the alternating current line, a region to which the alternating current line belongs, line capacity, line name and the like; each line of lf.l3, lf.l4 represents a transformer and dc line record, respectively, including data similar to lf.l 3; the LF.L5 contains generator data, and each line represents a generator record and comprises an effective mark, a line number of a generator in the LF.L1, a node type, active power P, reactive power Q, a voltage amplitude value, a voltage phase angle, generator output upper and lower limits, a generator name and the like; lf.l6 contains load data, each row representing a load record, containing data similar to lf.l 5. After the multi-source tide data are unpacked, the data are integrated and stored in a Mysql database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load, so that subsequent calling is facilitated.
And calculating and expanding a load flow sample library by utilizing a multithread technology, wherein the load flow sample can be automatically generated from none to some, from few to many in the interaction process of the intelligent agent and the environment in the process of carrying out load flow intelligent adjustment by adopting a reinforcement learning method. If partial load flow sample data can be prepared in advance before model training, the efficiency of model training can be improved, and the time of model training is shortened. Therefore, in this stage, a multithreading technique is adopted, and the base trend data is selected from the database and then the trend sample is generated, as shown in fig. 3. Thread 1, thread 2 and thread 3 are three threads opened up for the sample generation program respectively, and each thread runs independently, so that each thread carries out load flow calculation operation independently. And the load flow calculation performed by each thread is independent, and the calculation results are summarized in the database to supplement the load flow sample library.
And step S2, constructing a power system reinforcement learning environment, and constructing a value function training model suitable for load flow adjustment in the power system reinforcement learning environment.
Further, the electric power system reinforcement learning environment includes: agent, environment, action, status, and reward; the power system itself participates in interaction with the intelligent agent as an environment, the state set comprises bus voltage level and line power flow level in the power system, the action set comprises increase or decrease of all controllable variables, and the return value is self-defined according to the power flow adjustment task.
The basic framework of reinforcement learning comprises five parts of an agent, an environment, an action, a state and a return, and is learning of mapping the environment state to the action, and the aim is to enable a controller to obtain the maximum accumulated return in the interaction process with the environment. In order to apply reinforcement learning to power flow adjustment of a power system, the reinforcement learning environment model suitable for power flow adjustment is constructed according to the characteristics of the power system, and is specifically shown in table 1:
TABLE 1 tidal current adjustment Environment model
In the power flow adjustment task, the power system itself participates in interaction with the intelligent agent as an environment; the state set comprises bus voltage level and line tide level in the power system; the action set comprises the increase or decrease of all controllable variables, for example, the increase of the active output of a certain generator is an action in the action set, and the decrease of the reactive output of another generator can also be regarded as an action in the action set; and the return value is self-defined according to the power flow adjustment task. In the present invention, the reward value is defined as (1):
the return value R contains three parts: r1,R2And R3。R1The output result given by the power flow convergence device is adopted in the inventionA deep neural network DNN comprising three hidden layers is used as a power flow astringer, wherein the output layer of the DNN uses a sigmoid activation function, and the output value ranges from 0 to 1 as shown in (2). When the output value f is less than p1Is greater than p2And at p1And p2In between, R1Different return values are returned.
R2 represents the unbalance amount of the power system, as shown in equation (3). Sigma QgenRepresents the reactive output of a generator in a certain area, sigma QloadRepresenting reactive load in the same area, BqAnd the ratio of the reactive power output to the reactive demand of the area is expressed, and the reactive balance degree of the area is reflected.
R3The penalty item is a penalty item in the reported value, and after the intelligent agent executes an action, if the output of the generator exceeds the limit, a negative value is returned; if not, R3Equal to 0. The total return value in the process of one interaction is composed of the sum of the three return values.
The relationship of the environmental state transition of the power system based on reinforcement learning is shown in FIG. 4, wherein SiIndicating the state, subscript i indicates a certain time scale, aiIs shown in state SiAction taken in the following, riIndicating the current reward resulting from taking the action, followed by SiTransfer to the next state Si+1. The total return value is (4), wherein gamma represents an attenuation coefficient, and a larger coefficient represents a larger influence of the accumulated return value on the state at the future time.
And step S3, training a power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate a power system power flow adjustment strategy.
In the embodiment of the invention, the characteristic of power system power flow adjustment is considered in the reinforcement learning algorithm, the SARSA algorithm is adopted to train the reinforcement learning model, a power system power flow adjustment strategy is generated, and the algorithm flow chart is shown in FIG. 5. The specific process is as follows:
1) randomly initializing all state action values Q (s, a);
2) setting the current round i to 1, if i < T:
a) initializing s as a first state, and selecting an action a through an epsilon-greedy algorithm;
b) executing the action a, and transferring the state s to the next state s', and returning a return value R;
c) selecting a next action a' by an epsilon-greedy algorithm;
d) updating the state action value Q (s, a) according to the following equation:
Q(s,a)=Q(s,a)+α(R+γQ(s',a')-Q(s,a))
e) setting s 'as a current state s and setting a' as a current action a;
f) if s is the termination state, the current iteration is ended and s is made s +1, otherwise b) is returned.
In summary, the database technology is used for integrating the power system load flow data, the power system comprehensive analysis program (PSASP) data is converted and analyzed, and the reinforcement learning method considering the power system characteristics is used for realizing the power system load flow adjustment, so that the power system does not converge the load flow adjustment to the convergence, and diversified converged load flow samples are formed.
According to the power system trend adjusting method based on reinforcement learning and multi-source data integration, provided by the embodiment of the invention, power system trend data are integrated by utilizing a database technology and a multi-thread technology, power system comprehensive analysis program data are converted and analyzed to form a large number of power system trend samples so as to expand a sample library, and then a power system reinforcement learning environment is constructed for realizing interaction between an intelligent agent and the environment, so that a power system trend adjusting strategy is generated, and non-convergent trends are adjusted to be convergent. Compared with the traditional manual power flow adjustment method, the power system power flow adjustment method based on reinforcement learning and multi-source data integration has the characteristics of intellectualization and automation, and can quickly expand a power flow sample library. Compared with the traditional manual method, the method is more efficient, is very suitable for mode analysis and calculation of the power system, and has wide application prospect.
Next, a power flow adjustment device for an electric power system based on reinforcement learning and multi-source data integration according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 6 is a schematic structural diagram of a power flow adjustment apparatus of an electrical power system based on reinforcement learning and multi-source data integration according to an embodiment of the present invention.
As shown in fig. 6, the power system flow adjusting apparatus based on reinforcement learning and multi-source data integration includes: a multi-source data integration module 601, a model construction module 602, and a model training module 603.
The multi-source data integration module 601 is configured to integrate the tidal current data of the power system by using a database technology and store the tidal current data in a database, extract basic tidal current data from the database, generate a tidal current sample for the basic tidal current data by using a multithreading technology, and store the generated tidal current sample in the database.
The model building module 602 is configured to build a power system reinforcement learning environment, and in the power system reinforcement learning environment, a value function training model suitable for load flow adjustment is built.
The model training module 603 is configured to train a power flow adjustment value function training model by using an SARSA algorithm according to power flow data in a database based on an intelligent adjustment strategy of a reinforcement learning algorithm, so as to generate a power flow adjustment strategy of an electric power system.
Further, in one embodiment of the present invention, the integrating and storing the power flow data of the power system into the database by using the database technology comprises: and unpacking the multi-source tide data in the electric power system analysis and synthesis program, and integrating and storing the data into a database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load.
Further, in one embodiment of the present invention, the electric power system reinforcement learning environment includes: agent, environment, action, status, and reward;
the power system itself participates in interaction with the intelligent agent as an environment, the state set comprises bus voltage level and line power flow level in the power system, the action set comprises increase or decrease of all controllable variables, and the return value is self-defined according to the power flow adjustment task.
Further, in one embodiment of the present invention, the reported value is:
R=R1+R2+R3
wherein R is1Is the output result given by the power flow convergence device, R2 represents the unbalance amount of the power system, R3Is a penalty term in the reward value.
Further, in an embodiment of the present invention, the model training module is specifically configured to:
1) randomly initializing all state action values Q (s, a);
2) setting the current round i to 1, if i < T:
a) initializing s as a first state, and selecting an action a through an epsilon-greedy algorithm;
b) executing the action a, and transferring the state s to the next state s', and returning a return value R;
c) selecting a next action a' by an epsilon-greedy algorithm;
d) updating the state action value Q (s, a) according to the following equation:
Q(s,a)=Q(s,a)+α(R+γQ(s',a')-Q(s,a))
e) setting s 'as a current state s and setting a' as a current action a;
f) if s is the termination state, the current iteration is ended and s is made s +1, otherwise b) is returned.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.
According to the power system trend adjusting device based on reinforcement learning and multi-source data integration, provided by the embodiment of the invention, power system trend data are integrated by utilizing a database technology and a multi-thread technology, power system comprehensive analysis program data are converted and analyzed to form a large number of power system trend samples so as to expand a sample library, and then a power system reinforcement learning environment is constructed for realizing interaction between an intelligent agent and the environment, so that a power system trend adjusting strategy is generated, and non-convergent trends are adjusted to be convergent. Compared with the traditional method for manually adjusting the trend, the method has the characteristics of intellectualization and automation, and can quickly expand the trend sample library. Therefore, compared with the traditional manual method, the method is more efficient, is very suitable for mode analysis and calculation of the power system, and has wide application prospect.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (10)
1. A power system power flow adjusting method based on reinforcement learning and multi-source data integration is characterized by comprising the following steps:
s1, integrating the tide data of the power system by using a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by using a multithreading technology, and storing the generated tide sample into the database;
s2, constructing a power system reinforcement learning environment, and constructing a value function training model suitable for load flow adjustment in the power system reinforcement learning environment;
and S3, training the power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate the power flow adjustment strategy of the power system.
2. The method of claim 1, wherein the integrating and storing power flow data of the power system into a database using a database technique comprises: and unpacking the multi-source tide data in the electric power system analysis and synthesis program, and integrating and storing the data into a database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load.
3. The method of claim 1, wherein the power system reinforcement learning environment comprises: agent, environment, action, status, and reward;
the power system itself participates in interaction with the intelligent agent as an environment, the state set comprises bus voltage level and line power flow level in the power system, the action set comprises increase or decrease of all controllable variables, and the return value is self-defined according to the power flow adjustment task.
4. The method of claim 3, wherein the reward value is:
R=R1+R2+R3
wherein R is1Is the output result given by the power flow convergence device, R2 represents the unbalance amount of the power system, R3Is a penalty term in the reward value.
5. The method according to claim 1, wherein the S3 further comprises:
1) randomly initializing all state action values Q (s, a);
2) setting the current round i to 1, if i < T:
a) initializing s as a first state, and selecting an action a through an epsilon-greedy algorithm;
b) executing the action a, and transferring the state s to the next state s', and returning a return value R;
c) selecting a next action a' by an epsilon-greedy algorithm;
d) updating the state action value Q (s, a) according to the following equation:
Q(s,a)=Q(s,a)+α(R+γQ(s',a')-Q(s,a))
e) setting s 'as a current state s and setting a' as a current action a;
f) if s is the termination state, the current iteration is ended and s is made s +1, otherwise b) is returned.
6. A power system trend adjusting device based on reinforcement learning and multi-source data integration is characterized by comprising:
the multi-source data integration module is used for integrating the tide data of the power system by utilizing a database technology and storing the tide data into a database, extracting basic tide data from the database, generating a tide sample for the basic tide data by utilizing a multithreading technology, and storing the generated tide sample into the database;
the model building module is used for building a power system reinforcement learning environment, and in the power system reinforcement learning environment, a value function training model suitable for load flow adjustment is built;
and the model training module is used for training the power flow adjustment value function training model by adopting an SARSA algorithm according to the power flow data in the database based on the intelligent adjustment strategy of the reinforcement learning algorithm so as to generate the power flow adjustment strategy of the power system.
7. The apparatus of claim 6, wherein the integrating and storing the power flow data of the power system into the database using the database technique comprises: and unpacking the multi-source tide data in the electric power system analysis and synthesis program, and integrating and storing the data into a database according to the organization forms of active power, reactive power, voltage amplitude, voltage phase angle and line information of a generator and a load.
8. The apparatus of claim 6, wherein the power system reinforcement learning environment comprises: agent, environment, action, status, and reward;
the power system itself participates in interaction with the intelligent agent as an environment, the state set comprises bus voltage level and line power flow level in the power system, the action set comprises increase or decrease of all controllable variables, and the return value is self-defined according to the power flow adjustment task.
9. The apparatus of claim 6, wherein the reward value is:
R=R1+R2+R3
wherein R is1Is the output result given by the power flow convergence device, R2 represents the unbalance amount of the power system, R3Is a penalty term in the reward value.
10. The apparatus of claim 6, wherein the model training module is specifically configured to:
1) randomly initializing all state action values Q (s, a);
2) setting the current round i to 1, if i < T:
a) initializing s as a first state, and selecting an action a through an epsilon-greedy algorithm;
b) executing the action a, and transferring the state s to the next state s', and returning a return value R;
c) selecting a next action a' by an epsilon-greedy algorithm;
d) updating the state action value Q (s, a) according to the following equation:
Q(s,a)=Q(s,a)+α(R+γQ(s',a')-Q(s,a))
e) setting s 'as a current state s and setting a' as a current action a;
f) if s is the termination state, the current iteration is ended and s is made s +1, otherwise b) is returned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011039205.2A CN112287972A (en) | 2020-09-28 | 2020-09-28 | Power system power flow adjusting method based on reinforcement learning and multi-source data integration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011039205.2A CN112287972A (en) | 2020-09-28 | 2020-09-28 | Power system power flow adjusting method based on reinforcement learning and multi-source data integration |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112287972A true CN112287972A (en) | 2021-01-29 |
Family
ID=74422683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011039205.2A Pending CN112287972A (en) | 2020-09-28 | 2020-09-28 | Power system power flow adjusting method based on reinforcement learning and multi-source data integration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112287972A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113746855A (en) * | 2021-09-09 | 2021-12-03 | 国网电子商务有限公司 | Data access method of energy industry cloud network and related equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120078436A1 (en) * | 2010-09-27 | 2012-03-29 | Patel Sureshchandra B | Method of Artificial Nueral Network Loadflow computation for electrical power system |
CN102521343A (en) * | 2011-12-09 | 2012-06-27 | 山东大学 | Transformation method of input data of simulation software of power system |
CN103455591A (en) * | 2013-08-30 | 2013-12-18 | 国家电网公司 | Standard data exchange interface method of parallel cooperative system |
CN111179121A (en) * | 2020-01-17 | 2020-05-19 | 华南理工大学 | Power grid emergency control method based on expert system and deep reverse reinforcement learning |
CN111209710A (en) * | 2020-01-07 | 2020-05-29 | 中国电力科学研究院有限公司 | Automatic adjustment method and device for load flow calculation convergence |
CN111626539A (en) * | 2020-03-03 | 2020-09-04 | 中国南方电网有限责任公司 | Power grid operation section dynamic generation method based on Q reinforcement learning |
-
2020
- 2020-09-28 CN CN202011039205.2A patent/CN112287972A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120078436A1 (en) * | 2010-09-27 | 2012-03-29 | Patel Sureshchandra B | Method of Artificial Nueral Network Loadflow computation for electrical power system |
CN102521343A (en) * | 2011-12-09 | 2012-06-27 | 山东大学 | Transformation method of input data of simulation software of power system |
CN103455591A (en) * | 2013-08-30 | 2013-12-18 | 国家电网公司 | Standard data exchange interface method of parallel cooperative system |
CN111209710A (en) * | 2020-01-07 | 2020-05-29 | 中国电力科学研究院有限公司 | Automatic adjustment method and device for load flow calculation convergence |
CN111179121A (en) * | 2020-01-17 | 2020-05-19 | 华南理工大学 | Power grid emergency control method based on expert system and deep reverse reinforcement learning |
CN111626539A (en) * | 2020-03-03 | 2020-09-04 | 中国南方电网有限责任公司 | Power grid operation section dynamic generation method based on Q reinforcement learning |
Non-Patent Citations (1)
Title |
---|
刘玉良 等: "《深度学习》", 30 November 2019 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113746855A (en) * | 2021-09-09 | 2021-12-03 | 国网电子商务有限公司 | Data access method of energy industry cloud network and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Stochastic optimal generation command dispatch based on improved hierarchical reinforcement learning approach | |
CN106327091B (en) | Multi-region asynchronous coordination dynamic economic dispatching method based on robust tie line plan | |
CN105846461B (en) | Control method and system for large-scale energy storage power station self-adaptive dynamic planning | |
WO2023060815A1 (en) | Energy storage capacity optimization configuration method for improving reliability of power distribution network | |
CN108471143A (en) | Micro-grid multi-energy method for optimizing scheduling based on positive and negative feedback particle cluster algorithm | |
CN109034587B (en) | Active power distribution system optimal scheduling method for coordinating multiple controllable units | |
El Helou et al. | Fully decentralized reinforcement learning-based control of photovoltaics in distribution grids for joint provision of real and reactive power | |
CN113872213A (en) | Power distribution network voltage autonomous optimization control method and device | |
CN114726009B (en) | Wind power plant group reactive power hierarchical optimization control method and system considering power prediction | |
CN118174355A (en) | Micro-grid energy optimization scheduling method | |
CN113675890A (en) | TD 3-based new energy microgrid optimization method | |
CN114285075A (en) | Micro-grid energy online optimization method based on distributed deep reinforcement learning | |
CN115841187A (en) | Method, device, equipment and storage medium for optimizing operation strategy of flexible power distribution network | |
CN104915788B (en) | A method of considering the Electrical Power System Dynamic economic load dispatching of windy field correlation | |
CN112287972A (en) | Power system power flow adjusting method based on reinforcement learning and multi-source data integration | |
CN115776138A (en) | Micro-grid capacity planning method and device considering multi-dimensional uncertainty and energy management strategy | |
CN110289642A (en) | A kind of power distribution network layering method for optimizing scheduling based on exponential penalty function | |
CN118381095B (en) | Intelligent control method and device for energy storage charging and discharging of new energy micro-grid | |
CN114386323A (en) | Power distribution network reactive power optimization method containing distributed power supply based on improved butterfly algorithm | |
CN109449968B (en) | Power electronic transformer and AC/DC source network load multi-current equipment integration method | |
CN112001578A (en) | Generalized energy storage resource optimization scheduling method and system | |
CN117595392A (en) | Power distribution network joint optimization method and system considering light Fu Xiaona and light storage and charge configuration | |
CN114447963A (en) | Energy storage battery power control method and system | |
CN112613229A (en) | Energy management method and model training method and device for hybrid power equipment | |
Ekneligoda et al. | Game theoretic optimization of DC micro-grids without a communication infrastructure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210129 |