CN116826762B - Intelligent power distribution network voltage safety control method, device, equipment and medium thereof - Google Patents

Intelligent power distribution network voltage safety control method, device, equipment and medium thereof Download PDF

Info

Publication number
CN116826762B
CN116826762B CN202311092375.0A CN202311092375A CN116826762B CN 116826762 B CN116826762 B CN 116826762B CN 202311092375 A CN202311092375 A CN 202311092375A CN 116826762 B CN116826762 B CN 116826762B
Authority
CN
China
Prior art keywords
network
voltage
value
intelligent
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311092375.0A
Other languages
Chinese (zh)
Other versions
CN116826762A (en
Inventor
穆朝絮
徐娜
史亚坤
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202311092375.0A priority Critical patent/CN116826762B/en
Publication of CN116826762A publication Critical patent/CN116826762A/en
Application granted granted Critical
Publication of CN116826762B publication Critical patent/CN116826762B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Medical Informatics (AREA)
  • Power Engineering (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a voltage safety control method, device and equipment for an intelligent power distribution network and a medium thereof. The method comprises the following steps: obtaining observed quantity of each power generation node in real time from an intelligent power distribution network, wherein the intelligent power distribution network comprises n power generation nodes, n is an integer greater than or equal to 2, and the observed quantity comprises: node voltage value, active power value and reactive power value; for each power generation node, the following operations are repeatedly performed: and under the condition that the node voltage value is not in the preset interval, regulating and controlling the node voltage value by utilizing an intelligent agent to obtain a target regulation strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent agent consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulation strategy, and the evaluation network and the target evaluation network are both used for optimizing the initial regulation strategy to obtain the target regulation strategy.

Description

Intelligent power distribution network voltage safety control method, device, equipment and medium thereof
Technical Field
The invention relates to the technical field of artificial intelligence and deep reinforcement learning, in particular to a method, a device, equipment and a medium for controlling voltage safety of an intelligent power distribution network.
Background
With the development of various emerging technologies, a large number of renewable distributed new energy sources and intelligent power electronic devices are connected into a power grid, so that the rapid development of the intelligent power grid is promoted, but the power grid is more complex, and the random fluctuation in operation is enhanced. Because the uncertainty of renewable energy sources leads to more variable stable operation conditions, a more rapid and stable voltage regulation means is needed to ensure local stability in case of sudden failure so as not to cause large-area cascading failures.
In the implementation process of the invention, the complex decision problem of intelligent power distribution network voltage safety evaluation and intelligent autonomous control is found, and certain defects still exist in the prior art, so that the safety operation problem of the intelligent power distribution network under sudden faults is not fully considered.
Disclosure of Invention
In view of the above problems, the invention provides a method, a device, equipment and a medium for controlling voltage safety of an intelligent power distribution network.
According to a first aspect of the present invention, there is provided a voltage safety control method for an intelligent power distribution network, including:
obtaining observed quantity of each power generation node in real time from an intelligent power distribution network, wherein the intelligent power distribution network comprises n power generation nodes, n is an integer greater than or equal to 2, and the observed quantity comprises: node voltage value, active power value and reactive power value;
For each power generation node, the following operations are repeatedly performed:
and under the condition that the node voltage value is not in the preset interval, regulating and controlling the voltage value by utilizing an intelligent agent to obtain a target regulation strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent agent consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulation strategy, and the evaluation network and the target evaluation network are both used for optimizing the initial regulation strategy to obtain the target regulation strategy.
According to an embodiment of the present invention, an agent is constructed based on a fully connected neural network comprising: an input layer, a hidden layer, and an output layer;
the intelligent power distribution network voltage safety control method further comprises the following steps: determining a mask layer according to a preset interval and constraint conditions of reactive compensation, wherein the mask layer is used for controlling monotonicity between input and output of a strategy network;
constructing a strategy network according to the input layer, the mask layer, the hidden layer and the output layer in sequence;
and respectively constructing an evaluation network and a target evaluation network according to the input layer, the hidden layer and the output layer in sequence.
According to the embodiment of the invention, the intelligent agent is obtained by training in advance according to an optimized objective function;
the intelligent power distribution network voltage safety control method further comprises the following steps:
determining a first objective function according to the voltage condition of the power generation node and the line loss condition of the power transmission line;
determining a second objective function according to the voltage value and the voltage condition of the preset interval;
an optimization objective function is determined based on the first objective function and the second objective function.
According to an embodiment of the invention, determining a first objective function based on a voltage condition of a power generation node and a line loss condition of a power transmission line comprises:
determining a first sub-objective function according to the voltage value and the voltage amplitude fluctuation range;
determining a second sub-objective function according to the line loss of the power transmission line;
the first objective function is determined based on the first sub-objective function and the second sub-objective function.
According to an embodiment of the present invention, the first sub-objective function is represented by formula (1):
(1)
wherein,representing the prize value obtained according to the first sub-objective function,/->Representing the lower limit value of the voltage amplitude fluctuation range, +.>Represents the upper limit value of the voltage amplitude fluctuation range, +.>Represent the firstiVoltage values of the individual power generation nodes;
The second sub-objective function is represented by equation (2):
(2)
wherein,representing the prize value obtained according to the second sub-objective function,/->Representing the line loss function of the transmission line,Llaplacian matrix corresponding to each power generation node of intelligent power distribution network and related to line impedance of power transmission line>Representing the voltage at the machine end of the access intelligent agent,Trepresenting a transpose;
the first objective function is represented by formula (3):
(3)
wherein,representing the total prize value obtained according to the first objective function +.>And->Coefficients representing the first sub-objective function and the second sub-objective function, respectively, and +.>,/> [0,1]。
According to an embodiment of the present invention, a method of pre-training includes:
initializing parameters corresponding to a strategy network, an evaluation network and a target evaluation network;
determining a training sample by simulating a voltage violation operation scene of the intelligent power distribution network;
determining a strategy network, an evaluation network and a loss function corresponding to the target evaluation network based on a deep reinforcement learning algorithm;
based on the training samples and the loss function, respectively updating parameters corresponding to the strategy network, the evaluation network and the target evaluation network;
and under the condition that the iteration times meet the preset times, obtaining the trained intelligent agent.
According to the embodiment of the invention, the training sample comprises a plurality of quaternions, and the quaternions consist of the sample observed quantity at the current moment, the sample reactive compensation quantity at the current moment, the sample target value at the current moment and the sample observed quantity at the next moment;
wherein, confirm training sample through the voltage violation operation scene of simulation intelligent power distribution network, include:
determining a sample observed quantity at the current moment by simulating a voltage violation operation scene of the intelligent power distribution network;
determining the sample reactive compensation quantity at the current moment according to the sample observed quantity at the current moment;
determining a sample target value according to the sample reactive compensation quantity at the current moment and the first target function;
and regulating and controlling the sample observed quantity at the current moment according to the sample target value to obtain the sample observed quantity at the next moment.
A second aspect of the present invention provides a voltage safety control device for an intelligent power distribution network, including: the acquisition module is used for acquiring the observed quantity of each power generation node in real time from the intelligent power distribution network, wherein the intelligent power distribution network comprises n power generation nodes, n is an integer greater than or equal to 2, and the observed quantity comprises: node voltage value, active power value and reactive power value; the processing module is used for repeatedly executing the following operations for each power generation node: and under the condition that the node voltage value is not in the preset interval, regulating and controlling the voltage value by utilizing an intelligent agent to obtain a target regulation strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent agent consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulation strategy, and the evaluation network and the target evaluation network are both used for optimizing the initial regulation strategy to obtain the target regulation strategy.
A third aspect of the present invention provides an electronic device comprising: one or more processors; and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method described above.
A fourth aspect of the invention also provides a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the above method.
The fifth aspect of the invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.
According to the embodiment of the invention, whether the voltage of each power generation node of the intelligent power distribution network stably operates can be determined by observing whether the voltage of each power generation node of the intelligent power distribution network is kept in the preset interval, and when the voltage exceeds the preset interval, namely the voltage cannot be kept in the stable operation, the intelligent power distribution network invokes an intelligent body output stability control strategy to enable the voltage of each power generation node to be restored to the preset interval. By the intelligent power distribution network voltage safety control method, the voltage is optimally controlled in real time, the intelligent power distribution network voltage safety control method can adapt to the characteristics of the intelligent power network structure such as the variability, the source end fluctuation and the load fluctuation, and the safety optimization strategy can be given by the trained intelligent body in a sub-second time, so that the safety operation capacity of the intelligent power network is greatly improved. The method can rapidly and accurately perform online control under the conditions of renewable energy sources and load fluctuation or dangerous faults, coordinate assets among different distributed energy sources and control voltage stability.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention with reference to the accompanying drawings, in which:
FIG. 1 (a) shows a physical architecture diagram of a smart distribution network according to an embodiment of the present invention;
FIG. 1 (b) shows a smart distribution network information physical model architecture diagram according to an embodiment of the present invention;
FIG. 2 illustrates a flow chart of a smart distribution network voltage security control method according to an embodiment of the present invention;
FIG. 3 is a flow chart of a smart distribution network voltage security control method according to another embodiment of the present invention;
FIG. 4 illustrates a distributed agent training architecture diagram in accordance with an embodiment of the present invention;
FIG. 5 (a) shows a UMSAC algorithm versus SAC algorithm prize value comparison graph in accordance with an embodiment of the present invention;
FIG. 5 (b) shows a UMSAC algorithm versus SAC algorithm voltage stability versus graph in accordance with an embodiment of the present invention;
FIG. 6 shows a block diagram of a smart distribution network voltage safety control device according to an embodiment of the present invention; and
fig. 7 shows a block diagram of an electronic device adapted to implement the smart distribution network voltage security control method according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the invention, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, are processed according to the regulations of related laws and regulations, and the public welfare is not violated.
In the technical scheme of the embodiment of the invention, the authorization or the consent of the user is obtained before the personal information of the user is obtained or acquired.
In the implementation process of the invention, the intelligent power grid is found to develop rapidly, large-scale wind power, photovoltaic and various distributed energy sources are integrated in the power distribution network, interaction between renewable energy source power generation is more complex, voltage fluctuation is more frequent, problems of delay voltage recovery and the like caused by various faults are more prominent, a voltage stability control scheme also generates great transition, and further coordination is needed to ensure voltage stability.
Because the intelligent power distribution network has the characteristics of high dimensionality, high nonlinearity and high time variability, an optimal control method and a control decision for the intelligent power distribution network are difficult. Aiming at the complex decision problem of intelligent power distribution network voltage safety evaluation and intelligent autonomous control, certain defects still exist in the prior art, and the safety operation problem of the intelligent power distribution network under sudden faults is not fully considered.
The application of deep reinforcement learning (deep reinforcement learning, DRL) technology in various fields of automatic driving, industrial automation, medical care, natural language processing and the like is considered. Deep reinforcement learning combines the perceptibility of Deep Learning (DL) and the decision capability of reinforcement learning (reinforcement learning, RL), controls based on real-time input data, and DL learns the internal rules and representation layers of sample data by characterizing a learning platform, describes the input and output relationships of a complex intelligent power distribution network system by using a deep neural network, and can autonomously and intelligently extract effective sample characteristics in a large number of observation data samples. Based on the voltage safety control method of the intelligent power distribution network, the voltage safety control method of the intelligent power distribution network is provided.
The embodiment of the invention provides a voltage safety control method for an intelligent power distribution network, which comprises the following steps: obtaining observed quantity of each power generation node in real time from an intelligent power distribution network, wherein the intelligent power distribution network comprises n power generation nodes, n is an integer greater than or equal to 2, and the observed quantity comprises: node voltage value, active power value and reactive power value; for each power generation node, the following operations are repeatedly performed: and under the condition that the node voltage value is not in the preset interval, regulating and controlling the voltage value by utilizing an intelligent agent to obtain a target regulation strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent agent consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulation strategy, and the evaluation network and the target evaluation network are both used for optimizing the initial regulation strategy to obtain the target regulation strategy.
FIG. 1 (a) shows a physical architecture diagram of a smart distribution network according to an embodiment of the present invention; fig. 1 (b) shows a smart distribution network information physical model architecture diagram according to an embodiment of the present invention.
The conventional passive power distribution network has no active power supply capability, and along with the continuous development of distributed energy sources such as photovoltaic power, wind power and the like, the construction of the intelligent power distribution network gradually tends to be perfect, as shown in fig. 1 (a). Compared with the traditional passive power distribution network, the intelligent power distribution network can effectively resist unreliability, is high in power supply reliability, avoids large-scale power failure, and ensures power consumption of users.
As shown in fig. 1 (b), according to the voltage safety control method of the intelligent power distribution network of this embodiment, the transient performance of node voltage of the intelligent power distribution network is mainly optimized, the intelligent power distribution network is constructed as a voltage control system of the intelligent power distribution network which is distributed radially, the physical structure of the physical layer corresponding to the system is composed of a power transmission line, a transformer, a load, a distributed power source and the like, the information layer of the system abstracts the intelligent power distribution network into a topological structure, the active output of the distributed power source in the intelligent power distribution network is uniformly abstracted into information nodes, links such as information processing and information transmission are abstracted into information branches, a reactive power compensator is embedded into a control loop of a distributed power inverter, each node for carrying out reactive voltage regulation on the information nodes is regarded as an intelligent body, and the intelligent power distribution network intelligent body is assumed to have the capability of supporting neighborhood communication in an area.
It should be understood that the number of power lines, transformers, loads, distributed power sources, and agents in fig. 1 (b) is merely illustrative. There may be any number of power lines, transformers, loads, distributed power supplies, and agents, as desired for implementation.
The voltage safety control method of the intelligent power distribution network according to the embodiment of the invention will be described in detail below based on the above described scenario through fig. 2 to 5.
Fig. 2 shows a flowchart of a smart distribution network voltage security control method according to an embodiment of the present invention.
As shown in fig. 2, the smart distribution network voltage safety control method 200 of this embodiment includes operations S210 to S220.
In operation S210, obtaining, in real time, observed quantities of the power generation nodes from the smart power distribution network, where the smart power distribution network includes n power generation nodes, n is an integer greater than or equal to 2, and the observed quantities include: node voltage value, active power value and reactive power value.
According to the embodiment of the invention, the reactive power compensator can be embedded into the inverter control loop to regulate and control the voltage value. The power generation nodes in the intelligent power distribution network can be represented by 1, 2 … and i … n, and the total number of the nodes is n. The relationship between the reactive voltage and the power generation node can be determined according to the relationship between the node voltage and the apparent power of the power generation node and the relationship between the active power, the reactive power and the apparent power.
In operation S220, the following operations are repeatedly performed for each power generation node:
and under the condition that the node voltage value is not in the preset interval, regulating and controlling the node voltage value by utilizing an intelligent agent to obtain a target regulation strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent agent consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulation strategy, and the evaluation network and the target evaluation network are both used for optimizing the initial regulation strategy to obtain the target regulation strategy.
According to an embodiment of the present invention, the preset interval may be determined according to a voltage amplitude fluctuation range of the power generation node. For example, whenCan be 0.95%>When the value can be 1.05, the preset interval can be [0.9,1.05 ]]。
According to the embodiment of the invention, reactive compensation can be performed by using the intelligent agent, so that the voltage value is kept in a preset interval, and the voltage of the intelligent power distribution network safely and stably operates. The node voltage value can be input into the strategy network, and the initial reactive compensation quantity can be output. And respectively inputting the node voltage value and the initial reactive compensation quantity into an evaluation network and a target evaluation network, and respectively outputting evaluation index values, namely Q values corresponding to the initial reactive compensation quantity. And determining the target reactive compensation amount under the condition that the evaluation index meets the threshold value. And generating a target regulation strategy according to the target reactive compensation quantity.
According to the embodiment of the invention, whether the voltage of each power generation node of the intelligent power distribution network stably operates can be determined by observing whether the voltage of each power generation node of the intelligent power distribution network is kept in the preset interval, and when the voltage exceeds the preset interval, namely the voltage cannot be kept in the stable operation, the intelligent power distribution network invokes the intelligent power generation node output stability control strategy to restore the voltage of each power generation node to the preset interval. By the intelligent power distribution network voltage safety control method, the voltage is optimally controlled in real time, the intelligent power distribution network voltage safety control method can adapt to the characteristics of the intelligent power network structure such as the variability, the source end fluctuation and the load fluctuation, and the safety optimization strategy can be given by the trained intelligent body in a sub-second time, so that the safety operation capacity of the intelligent power network is greatly improved. The method can rapidly and accurately perform online control under the conditions of renewable energy sources and load fluctuation or dangerous faults, coordinate assets among different distributed energy sources and control voltage stability.
According to the embodiment of the invention, the safety optimization control strategy based on the design of the monotonic neural network fully considers the safety and robustness problems of the intelligent power distribution network when the intelligent power distribution network faces sudden faults. The physical correlation between the voltage and reactive compensation is also considered, and the stability of the closed loop system is ensured.
According to an embodiment of the present invention, the agent is built based on a fully connected neural network, which may include: an input layer, a hidden layer, and an output layer.
The intelligent power distribution network voltage safety control method can further comprise the following steps: determining a mask layer according to a preset interval and constraint conditions of reactive compensation, wherein the mask layer is used for controlling monotonicity between input and output of a strategy network; constructing a strategy network according to the input layer, the mask layer, the hidden layer and the output layer in sequence; and respectively constructing an evaluation network and a target evaluation network according to the input layer, the hidden layer and the output layer in sequence.
According to an embodiment of the invention, the constraint of reactive compensation can be seen as a constraint of an input-output monotonic problem. Common convolution network connection can be adopted between hidden layers in the constructed strategy network. The hidden layer can enable the fully connected neural network to have better expression capability. And a mask layer is added between the input layer and the hidden layer, so that a monotonically increasing rule between input and output can be ensured. The evaluation network and the target evaluation network may employ a common fully-connected neural network.
For example, the input node voltage of the policy networkAnd output motion amount +.>The relation of (i.e., the reactive compensation amount) may be as shown in the following formula (4):
(4)
wherein,function representing a strategy network fitted according to a fully connected neural network, +.>Indicating the parameters that the policy network needs to learn.
According to the embodiment of the invention, knowledge in the reactive compensation field is considered in the structural design of the strategy network, the design of the strategy network structure is guided based on the voltage preset interval and the logic constraint between the voltage preset interval and the reactive compensation, a neural network with monotonic constraint is constructed, the physical behavior of the voltage reactive compensation is simulated, the search space of parameters is reduced, and the usability of a model is improved.
According to the embodiment of the invention, the agent is trained in advance according to the optimized objective function.
The intelligent power distribution network voltage safety control method can further comprise the following steps:
determining a first objective function according to the voltage condition of the power generation node and the line loss condition of the power transmission line; determining a second objective function according to the voltage value and the voltage condition of the preset interval; an optimization objective function is determined based on the first objective function and the second objective function.
According to the embodiment of the invention, the intelligent power distribution network voltage safety control problem can be modeled as a Markov decision process, and can be described by four-dimensional tuples sThe observed quantity of each power generation node of the intelligent power distribution network is represented, each power generation unit can be used as a scattered intelligent body, and node voltage, active power and reactive power are selected>For observational quantity, wherein->Representing the generator bus voltage vector at the generating node i as the main observance, < >>The active power and the reactive power of the node i can be obtained according to optimization calculation and used for power flow calculation in the intelligent power distribution network environment, and power flow convergence of the intelligent power distribution network after the intelligent agent takes action is ensured. />The reactive compensation method and the device have the advantages that the reactive compensation is used for guaranteeing voltage stability, and when the node voltage is deviated, the intelligent body can take action to generate the action quantity, namely the reactive compensation quantity. />Representing the state transition probability. />Representing the prize value, i.e. the value of the first objective function, obtained by interaction with the smart distribution network environment. The rewarding value can be information fed back to the intelligent agent for the intelligent power distribution network environment, and the larger the rewarding value is, the better the performance is.
The optimization objective of the first objective function may be to minimize distribution losses on the transmission line and to ensure that the voltage amplitude and reactive power injection remain within the constraint interval.
The optimization objective of the second objective function may be to maximize the entropy change of the node voltage caused after encouraging a degree of random exploration of the new scheduling policy, minimizing the penalty of node voltage overrun. The first objective function and the second objective function may be superimposed to obtain an optimized objective function.
According to the embodiment of the invention, when the intelligent agent is trained, the voltage condition of the power generation nodes and the line loss condition of the power transmission line are taken into consideration, and the node voltage value condition of each power generation node obtained in real time in the intelligent power distribution network is taken into consideration, so that the intelligent agent can learn a better strategy.
According to an embodiment of the present invention, determining the first objective function according to the voltage condition of the power generation node and the line loss condition of the power transmission line may include:
determining a first sub-objective function according to the voltage value and the voltage amplitude fluctuation range; determining a second sub-objective function according to the line loss of the power transmission line; the first objective function is determined based on the first sub-objective function and the second sub-objective function.
According to an embodiment of the invention, a first sub-objective function may be used to calculate a prize value for the node voltage. The second sub-objective function may be used to calculate a prize value taking line losses into account.
According to the embodiment of the invention, the rewarding mechanism under the condition of voltage deviation is determined according to the voltage condition of the power generation node; and determining a rewarding mechanism under the condition of line loss according to the line loss of the power transmission line, wherein the rewarding mechanism is used for training the intelligent agent and is beneficial to the intelligent agent to learn a better strategy.
According to an embodiment of the present invention, the first sub-objective function is represented by formula (1):
(1);
wherein,representing the prize value obtained according to the first sub-objective function,/->Representing the lower limit value of the voltage amplitude fluctuation range, +.>Represents the upper limit value of the voltage amplitude fluctuation range, +.>Represent the firstiThe voltage value of each power generation node.
The second sub-objective function is represented by equation (2):
(2);
wherein,representing the prize value obtained according to the second sub-objective function,/->Representing the line loss function of the transmission line,Llaplacian matrix corresponding to each power generation node of intelligent power distribution network and related to line impedance of power transmission line>Representing the voltage at the machine end of the access intelligent agent,Trepresenting the transpose.
The first objective function is represented by formula (3):
(3);
wherein,representing the total prize value obtained according to the first objective function +.>And->Coefficients representing the first sub-objective function and the second sub-objective function, respectively, and +.>,/> [0,1]。
According to the embodiment of the invention, a first sub-objective function is determined according to the upper limit value, the lower limit value and the node voltage value of the voltage amplitude fluctuation range; and determining a second sub-objective function according to the line loss function of the power transmission line, finally obtaining a first objective function, determining a reward value obtained by interaction with the environment of the intelligent power distribution network by using the function, and helping training the intelligent body to obtain a better regulation strategy so as to ensure that the voltage of the intelligent power distribution network runs safely and stably.
According to the embodiment of the invention, the optimization objective function is different from other deep reinforcement learning, the optimization objective comprises maximizing the expected reward value and pursuing the maximization of the information entropy value, and the method can enable an intelligent agent to fully explore the observation space, explore more feasible schemes to avoid sinking into local optimum, and improve the anti-interference capability.
For example, optimizing an objective functionJ(π)Can be represented by the following formula (5):
(5);
wherein,representation oftThe moment is in the observance +.>When the intelligent agent takes action to generate action quantity +.>Prize value obtained later,/->Representation is based on optimized scheduling policyπAfter the selection action, the desired value of the subsequently available prize sum, < >>Representation oftThe moment is in the observance +.>When the intelligent agent takes action to generate action quantity +.>Obeying an optimized scheduling policyπDistribution of->The rewards obtained by each agent can be averaged; />Representation oftThe moment is in the observance +.>When the intelligent agent optimizes the entropy value of the scheduling strategy pi; />Representing the temperature coefficient, may represent the degree to which random exploration of new strategies is encouraged; />To optimize penalty terms in the objective function, expressed intAction amount generated by action taken by the intelligent agent at moment +.>A penalty value for voltage overrun of each node; / >Representing penalty coefficients;πrepresenting an optimized scheduling policy.
Based on the above equation (5), the optimal scheduling policyCan be expressed as shown in the following formula (6):
(6)。
according to an embodiment of the present invention, a method of pre-training may include:
initializing parameters corresponding to a strategy network, an evaluation network and a target evaluation network; determining a training sample by simulating a voltage violation operation scene of the intelligent power distribution network; determining a strategy network, an evaluation network and a loss function corresponding to the target evaluation network based on a deep reinforcement learning algorithm; based on the training samples and the loss function, respectively updating parameters corresponding to the strategy network, the evaluation network and the target evaluation network; and under the condition that the iteration times meet the preset times, obtaining the trained intelligent agent.
According to the embodiment of the invention, training samples can be input into a strategy network and sample strategies can be output; respectively inputting the sample strategy into an evaluation network and a target evaluation network, and respectively outputting an evaluation result and a target evaluation result; based on the evaluation result, the target evaluation result and the evaluation loss function, updating parameters of the evaluation network and the target evaluation network by using a random gradient method, wherein the evaluation loss function is determined according to a deep reinforcement learning algorithm.
Wherein parameters of the policy network may be expressed asThe method comprises the steps of carrying out a first treatment on the surface of the The parameters corresponding to the evaluation network and the target evaluation network can be expressed as +.>. The deep reinforcement learning algorithm may include a SAC algorithm. The policy network, the evaluation network and the loss corresponding to the target evaluation network can be determined according to the SAC algorithmA function. SAC algorithm adopts soft value function evaluation strategy pi, and is applied to soft Q functionUpdating parameters using a function approximator, wherein +.>Is shown intThe moment is in the observance +.>When the intelligent agent takes action to generate action quantity +.>And the optimal value which can be obtained is used for controlling the action amount according to the optimal strategyEvaluation value of the quality evaluation. The soft Q function is updated by minimizing the bellman residual, as shown in the following equation (7):
(7);
wherein,representing a random selection of +.>The expected value of the obtained prize sum; />Is shown intTime +1 is at observance ∈ ->Time of dayProbability of satisfaction of agentpA desired value of the state of (2);as a function of value, expressed intTime +1 is at observance ∈ ->According to the optimal strategy, the obtained value is implicitly parameterized by the soft Q function parameter, and the parameter update is performed based on a random gradient method, as shown in the following formula (8):
(8);
Wherein,representing a discount factor; />Representation oft+Soft Q function at time 1.
The policy network parameters are learned and updated by minimizing KL divergence, and the method is shown as the following formula (9):
(9);
wherein in formula (9)Is shown intAction amount generated by action taken by the intelligent agent at moment +.>And according to the parameters of the policy network +.>Selecting all possible optimal scheduling strategies +.>The expected value of the sum of rewards which can be obtained later; />Observed quantity of each power generation node of intelligent power distribution network is shown to be +.>The action quantity of the lower output is +.>When optimizing the scheduling policy.
The optimization solution in the above equation (9) is calculated from the probability gradient in the following equation (10):
(10);
wherein,is shown intThe moment is in the observance +.>When the intelligent agent takes action to generate action quantity +.>And the optimum value that can be obtained in the following according to the optimum strategy is for the action amount +.>An evaluation value for evaluating the quality;ϵ/>representing input noise based on->Measuring movement quantity +.>And realizing unbiased gradient estimation.
It should be noted that two super-parameters in the objective function are optimized at the beginning of trainingAnd->Is randomly acquired, but with the improvement of the scheduling strategyTwo super-parameters will produce unpredictable changes, in order to find a random scheduling strategy that yields the maximum expected benefit and meets the minimum expected entropy constraint and minimum penalty, entropy terms and penalty terms are used as constraints, since tThe scheduling strategy at the moment only affects the future target value, the maximum expected benefit is converted into a dual problem, the final step is carried out to recursively optimize the objective function with entropy constraint, and the automatically-adjusted temperature coefficient and penalty coefficient can be obtained, wherein the temperature coefficient and the penalty coefficient are shown in the following formulas (11) - (12):
(11);
(12);
wherein,、/>representing an automatically updated super-parameter optimal solution, wherein the optimal target of the super-parameter variable is the same as the optimal target; />Is shown intAction amount generated by action taken by the intelligent agent at moment +.>And selecting an optimal scheduling strategy +.>The expected value of the sum of rewards which can be obtained later; />Is shown intObserved quantity of each power generation node of moment intelligent power distribution network is ins t The action quantity of the lower output is +.>Optimizing a scheduling strategy in time; />And->The initial entropy value and the initial penalty value are respectively given, and the initial entropy value and the initial penalty value can be manually given according to actual conditions and experience in the training process; />Representation oftA temperature coefficient corresponding to the moment,Representation oftThe penalty coefficient corresponding to the moment can be used as the super parameter.
According to the embodiment of the invention, the unconstrained monotonic deep neural network architecture is incorporated into the deep reinforcement learning algorithm, so that the continuous monotonic characteristic between reactive power and voltage can be effectively simulated, and the training speed is improved.
According to an embodiment of the present invention, the training samples may include a plurality of quaternions, which are composed of the observed quantity of the sample at the current time, the reactive compensation quantity of the sample at the current time, the target value of the sample at the current time, and the observed quantity of the sample at the next time.
Wherein, through simulating the voltage violation operation scene of intelligent power distribution network, confirm training sample, can include:
determining a sample observed quantity at the current moment by simulating a voltage violation operation scene of the intelligent power distribution network; determining the sample reactive compensation quantity at the current moment according to the sample observed quantity at the current moment; determining a sample target value according to the sample reactive compensation quantity at the current moment and the first target function; and regulating and controlling the sample observed quantity at the current moment according to the sample target value to obtain the sample observed quantity at the next moment.
According to embodiments of the invention, voltage violation operating scenarios may include scenarios where distributed energy sources generate large amounts of power while under-loaded and heavily-loaded power supplies, and the like. The sample observed quantity at the current moment can be input into the strategy network to perform greedy learning, and the sample reactive compensation quantity at the current moment is output. The determined training samples may be stored in an experience pool for training of the agent.
According to the embodiment of the invention, the constructed intelligent agent can be trained, and the interference is randomly added in the training process through simulation, so that the intelligent power distribution network can quickly generate a voltage safety regulation decision. Meanwhile, the trained intelligent body can be tested, and the intelligent body can output a real-time decision scheme in the testing process, so that the voltage of the intelligent power distribution network can be safely and stably operated.
FIG. 3 is a flow chart of a smart distribution network voltage security control method according to another embodiment of the present invention; fig. 4 shows a distributed agent training architecture diagram according to an embodiment of the present invention.
As shown in fig. 3, the voltage safety control method of the smart distribution network of this embodiment may include: by adding new energy and load fluctuation scenes to the simulated intelligent power distribution network environment, observed quantity of each power generation node is obtained in real timeAnd simulating the actual running condition. The power flow can be calculated according to the observed quantity, and whether the power flow converges or not can be calculated. In the case of convergence of the tide, the node voltages +.>Whether or not to meet->. In case of satisfaction, no action is performed. And under the condition of unsatisfied condition, the intelligent agent is utilized for regulation and control, so as to generate a decision. And according to the decision, reactive compensation is performed, for example, a static reactive compensator (SVG compensator) and capacitance adjustment are performed, so that the capacitance value and the real-time output value of the static reactive compensator can be changed, and the safe and stable operation of the intelligent power grid is ensured.
Wherein the agent may be trained based on a steady voltage training scheme of a monotonic strategy network. As in fig. 3, a voltage violation scenario may be simulated, training samples generated by greedy learning of the neural network, and stored in an experience pool. The historical voltage violation scene and the reactive power optimization scheduling strategy can be used for generating priori knowledge, the intelligent body is trained by utilizing the data, continuous offline iterative training is carried out on the intelligent body so as to ensure that the intelligent body can obtain the maximum rewarding value in a decision period, the trained intelligent body is applied to the simulated intelligent power distribution network environment, and the observed quantity of the safe operation of the power grid and the corresponding reactive power optimization scheduling strategy are output in real time.
As the number of the intelligent agents of each intelligent power distribution network is considered to be the number of the distributed power sources connected into the intelligent power distribution network, a large amount of data volume can be increased and training time can be prolonged along with the increase of the number of nodes of the intelligent power distribution network and the number of the distributed power sources connected into the intelligent power distribution network. Thus, training agents using the distributed agent training architecture shown in FIG. 4 may be considered. Note that the model learning in fig. 4 can be regarded as agent learning. Each agent can update its own local scheduling strategy to the latest learning strategy based on the optimal scheduling strategy, and operate k steps in its environment, store in the experience pool, and transmit training samples to the agents for learning after operating k steps.
According to the embodiment of the invention, the distributed intelligent agent is adopted to solve the problem of voltage stability of the nodes connected to the same intelligent power distribution network, and the data utilization rate and training efficiency are improved. By adopting a real intelligent power distribution network scene, trend convergence test can be carried out in each training, the real intelligent power distribution network environment is fully simulated, and the reliability of simulation test results is high.
It should be noted that, in the running process of the simulated intelligent power distribution network environment designed by the invention, when the following two conditions occur, the intelligent power grid running is immediately terminated, and the first condition is that the power grid is unbalanced in fig. 3, so that the power flow calculation cannot be converged; second, e.g. due to actions performed by the agent causing the voltage to be outside a controllable range
FIG. 5 (a) shows a UMSAC algorithm versus SAC algorithm prize value comparison graph in accordance with an embodiment of the present invention; fig. 5 (b) shows a graph of the comparison of the voltage stability of the UMSAC algorithm with the SAC algorithm according to an embodiment of the present invention.
According to an embodiment of the present invention, the agents involved may be regarded as unconstrained monotonic flexible action-assessment algorithms, abbreviated as UMSAC. 200 rounds can be set in the training process of the intelligent agent, the highest step number of each training round can be 60 time steps, each time step corresponds to 5 minutes of sampling time, and the voltage is ensured to keep running stably at different moments of the day. On the basis, the invention can perform grouping random test, randomly select fault scenes and input each intelligent agent, and calculate the average rewarding value, voltage stability, action cost and the like of different intelligent agents under different scenes.
The rewarding value can be information fed back to the intelligent agent for the intelligent power distribution network environment, and the larger the rewarding value is, the better the performance is. The action cost can be the property brought by the reactive compensator and the like to the whole power grid system when the reactive compensation is carried out on the voltage. The voltage performance retention rate can fully embody the power supply quality of the intelligent power distribution network under different algorithms.
Fig. 5 (a) shows that the rewarding value obtained for training the agent, and the node voltage deviation value and the intelligent power distribution network line loss are smaller as the rewarding value is larger, the UMSAC algorithm is obviously better than the SAC algorithm. Fig. 5 (b) is a schematic diagram of testing a trained intelligent agent, wherein a voltage violation scene is randomly added in the testing process, the intelligent agent can give a control strategy in a sub-second level, and the test result shows that the UMSAC algorithm can well ensure the safe and stable operation of a power grid when the voltage of the intelligent power distribution network is safely regulated and controlled.
According to the embodiment of the invention, aiming at the voltage deviation problem caused by renewable energy fluctuation and load demand change in the operation process of a large amount of intelligent power distribution network system connected with distributed energy sources, a distributed intelligent power distribution network voltage safety optimization control technology based on deep reinforcement learning is provided so as to reduce voltage violations and minimize network loss. The distributed energy sources of all nodes can perform active scheduling according to the day-ahead optimal active scheduling result, all nodes observe the terminal voltage based on the scheduling result, and an intelligent power distribution network simulation environment and an interactive intelligent agent are constructed aiming at the voltage stability problem. Training the intelligent agent based on UMSAC algorithm, learning the optimal voltage control strategy, and detecting the local voltage and adjusting the reactive power of the inverter end to quickly recover the illegal voltage. Through simulation, the intelligent agent can interact with the intelligent power grid environment in real time, and gives a voltage compensation decision in sub-second, so that the voltage stability is ensured to have a good control effect.
Based on the intelligent power distribution network voltage safety control method, the invention further provides an intelligent power distribution network voltage safety control device. The device will be described in detail below in connection with fig. 6.
Fig. 6 shows a block diagram of a voltage safety control device for a smart distribution network according to an embodiment of the present invention.
As shown in fig. 6, the intelligent power distribution network voltage safety control apparatus 600 of this embodiment includes an acquisition module 610 and a processing module 620.
The obtaining module 610 is configured to obtain, in real time, observed quantities of the power generation nodes from the smart power distribution network, where the smart power distribution network includes n power generation nodes, n is an integer greater than or equal to 2, and the observed quantities include: node voltage value, active power value and reactive power value. In an embodiment, the obtaining module 610 may be configured to perform the operation S210 described above, which is not described herein.
The processing module 620 is configured to repeatedly perform, for each power generation node, the following operations: and under the condition that the node voltage value is not in the preset interval, regulating and controlling the voltage value by utilizing an intelligent agent to obtain a target regulation strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent agent consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulation strategy, and the evaluation network and the target evaluation network are both used for optimizing the initial regulation strategy to obtain the target regulation strategy. In an embodiment, the processing module 620 may be configured to perform the operation S220 described above, which is not described herein.
According to an embodiment of the present invention, an agent is constructed based on a fully connected neural network including: an input layer, a hidden layer, and an output layer.
Wherein, the intelligent power distribution network voltage safety control device 600 may further include: the device comprises a first determining module, a first constructing module and a second constructing module.
The first determining module is used for determining a mask layer according to a preset interval and constraint conditions of reactive compensation, wherein the mask layer is used for controlling monotonicity between input and output of the strategy network.
The first construction module is used for constructing a strategy network according to the input layer, the mask layer, the hidden layer and the output layer in sequence.
The second construction module is used for constructing an evaluation network and a target evaluation network according to the input layer, the hidden layer and the output layer in sequence.
According to the embodiment of the invention, the agent is trained in advance according to the optimized objective function.
Wherein, the intelligent power distribution network voltage safety control device 600 may further include: the system comprises a second determining module, a third determining module and a fourth determining module.
The second determining module is used for determining a first objective function according to the voltage condition of the power generation node and the line loss condition of the power transmission line.
The third determining module is used for determining a second objective function according to the voltage value and the voltage condition of the preset interval.
The fourth determination module is used for determining an optimized objective function based on the first objective function and the second objective function.
Any of the plurality of modules of the acquisition module 610 and the processing module 620 may be combined in one module to be implemented, or any of the plurality of modules may be split into a plurality of modules, according to an embodiment of the present invention. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the invention, at least one of the acquisition module 610 and the processing module 620 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable way of integrating or packaging the circuits, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the acquisition module 610 and the processing module 620 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
Fig. 7 shows a block diagram of an electronic device adapted to implement the smart distribution network voltage security control method according to an embodiment of the present invention.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present invention includes a processor 701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flow according to an embodiment of the invention.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations of the method flow according to an embodiment of the present invention by executing programs in the ROM 702 and/or the RAM 703. Note that the program may be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in one or more memories.
According to an embodiment of the invention, the electronic device 700 may further comprise an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.
According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to an embodiment of the invention, the computer-readable storage medium may include ROM 702 and/or RAM 703 and/or one or more memories other than ROM 702 and RAM 703 described above.
Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method shown in the flowcharts. The program code means for causing a computer system to carry out the methods provided by embodiments of the present invention when the computer program product is run on the computer system.
The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 701. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 701. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
According to embodiments of the present invention, program code for carrying out computer programs provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or in assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the invention and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the invention. In particular, the features recited in the various embodiments of the invention and/or in the claims can be combined in various combinations and/or combinations without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.
The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims (10)

1. The intelligent power distribution network voltage safety control method is characterized by comprising the following steps of:
obtaining observed quantity of each power generation node in real time from an intelligent power distribution network, wherein the intelligent power distribution network comprises n power generation nodes, n is an integer greater than or equal to 2, and the observed quantity comprises: node voltage value, active power value and reactive power value;
For each of the power generation nodes, repeating the following operations:
under the condition that the node voltage value is not in a preset interval, regulating and controlling the node voltage value by utilizing an intelligent body to obtain a target regulating and controlling strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent body consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulating and controlling strategy, the evaluation network and the target evaluation network are both used for optimizing the initial regulating and controlling strategy to obtain the target regulating and controlling strategy, the intelligent body is used for carrying out reactive compensation on the power generation node according to the node voltage value, the active power value and the reactive power value, regulating and controlling the node voltage value through the reactive compensation, and the intelligent body is constructed based on a fully-connected neural network, and the fully-connected neural network comprises: an input layer, a hidden layer, and an output layer;
the method further comprises the steps of:
determining a mask layer according to the preset interval and the constraint condition of reactive compensation, wherein the mask layer is used for controlling monotonicity between the input and the output of the strategy network;
Constructing the strategy network according to the input layer, the mask layer, the hidden layer and the output layer in sequence;
the intelligent agent is trained in advance according to an optimization objective function, and the optimization objective of the optimization objective function comprises maximization of an expected excitation value and maximization of an information entropy value; the training samples trained in advance comprise a plurality of quaternions, and the quaternions consist of sample observed quantity at the current moment, sample reactive compensation quantity at the current moment, sample target value at the current moment and sample observed quantity at the next moment; the sample observed quantity at the current moment is determined by simulating a voltage violation operation scene of the intelligent power distribution network, the sample reactive compensation quantity at the current moment is determined according to the sample observed quantity at the current moment, the sample target value at the current moment is determined according to the sample reactive compensation quantity at the current moment and a first target function, and the sample observed quantity at the next moment is obtained by regulating and controlling the sample observed quantity at the current moment according to the sample target value; the first objective function is determined based on the voltage condition of the power generation node and the line loss condition of the power transmission line.
2. The method according to claim 1, wherein the method further comprises:
and respectively constructing the evaluation network and the target evaluation network according to the input layer, the hidden layer and the output layer in sequence.
3. The method according to claim 1, wherein the method further comprises:
determining a second objective function according to the voltage value and the voltage condition of the preset interval;
the optimization objective function is determined based on the first objective function and the second objective function.
4. A method according to claim 3, wherein said determining a first objective function based on the voltage condition of the power generation node and the line loss condition of the power transmission line comprises:
determining a first sub-objective function according to the voltage value and the voltage amplitude fluctuation range;
determining a second sub-objective function according to the line loss of the power transmission line;
and determining the first objective function according to the first sub-objective function and the second sub-objective function.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
the first sub-objective function is represented by formula (1):
(1)
wherein,representing a prize value derived from said first sub-objective function,/ >A lower limit value representing the fluctuation range of the voltage amplitude, < >>An upper limit value representing the fluctuation range of the voltage amplitude, < >>Represent the firstiThe voltage values of the power generation nodes;
the second sub-objective function is represented by formula (2):
(2)
wherein,representing a prize value derived from said second sub-objective function,/>Representing the line loss function of the transmission line,Lrepresenting Laplacian matrix corresponding to each power generation node of the intelligent power distribution network, and associating the Laplacian matrix with the line impedance of the power transmission line>Representing the terminal voltage across the agent,Trepresenting a transpose;
the first objective function is represented by formula (3):
(3)
wherein,representing the total prize value, +_obtained according to said first objective function>And->Coefficients representing the first sub-objective function and the second sub-objective function, respectively, and +.>, /> [0,1]。
6. A method according to claim 3, wherein the pre-training method comprises:
initializing parameters corresponding to the strategy network, the evaluation network and the target evaluation network;
determining a training sample by simulating a voltage violation operation scene of the intelligent power distribution network;
determining a strategy network, an evaluation network and a loss function corresponding to the target evaluation network based on a deep reinforcement learning algorithm;
Based on the training samples and the loss function, respectively updating parameters corresponding to the strategy network, the evaluation network and the target evaluation network;
and under the condition that the iteration times meet the preset times, obtaining the trained intelligent agent.
7. The method of claim 6, wherein the determining training samples by simulating a voltage violation operating scenario of the smart distribution network comprises:
determining a sample observed quantity at the current moment by simulating the voltage violation operation scene of the intelligent power distribution network;
determining the sample reactive compensation quantity at the current moment according to the sample observed quantity at the current moment;
determining the sample target value according to the sample reactive compensation quantity at the current moment and the first target function;
and regulating and controlling the sample observed quantity at the current moment according to the sample target value to obtain the sample observed quantity at the next moment.
8. A smart distribution network voltage safety control device, the device comprising:
the acquisition module is used for acquiring observed quantity of each power generation node in real time from the intelligent power distribution network, wherein the intelligent power distribution network comprises n power generation nodes, n is an integer greater than or equal to 2, and the observed quantity comprises: node voltage value, active power value and reactive power value;
The processing module is used for repeatedly executing the following operations for each power generation node:
under the condition that the node voltage value is not in a preset interval, regulating and controlling the voltage value by utilizing an intelligent body to obtain a target regulation strategy so as to ensure that the voltage of the intelligent power distribution network safely and stably operates, wherein the intelligent body consists of a strategy network with monotonic constraint, an evaluation network and a target evaluation network, the strategy network is used for generating an initial regulation strategy, the evaluation network and the target evaluation network are both used for optimizing the initial regulation strategy to obtain the target regulation strategy, the intelligent body is used for regulating and controlling the node voltage value through reactive compensation according to the node voltage value, the active power value and the reactive power value, and the intelligent body is constructed based on a fully-connected neural network which comprises the following components: an input layer, a hidden layer, and an output layer;
the apparatus further comprises:
the first determining module is used for determining a mask layer according to the preset interval and the constraint condition of reactive compensation, wherein the mask layer is used for controlling monotonicity between the input and the output of the strategy network;
The first construction module is used for constructing the strategy network according to the input layer, the mask layer, the hidden layer and the output layer in sequence;
the intelligent agent is trained in advance according to an optimization objective function, and the optimization objective of the optimization objective function comprises maximization of an expected excitation value and maximization of an information entropy value; the training samples trained in advance comprise a plurality of quaternions, and the quaternions consist of sample observed quantity at the current moment, sample reactive compensation quantity at the current moment, sample target value at the current moment and sample observed quantity at the next moment; the sample observed quantity at the current moment is determined by simulating a voltage violation operation scene of the intelligent power distribution network, the sample reactive compensation quantity at the current moment is determined according to the sample observed quantity at the current moment, the sample target value at the current moment is determined according to the sample reactive compensation quantity at the current moment and a first target function, and the sample observed quantity at the next moment is obtained by regulating and controlling the sample observed quantity at the current moment according to the sample target value; the first objective function is determined based on the voltage condition of the power generation node and the line loss condition of the power transmission line.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.
CN202311092375.0A 2023-08-29 2023-08-29 Intelligent power distribution network voltage safety control method, device, equipment and medium thereof Active CN116826762B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311092375.0A CN116826762B (en) 2023-08-29 2023-08-29 Intelligent power distribution network voltage safety control method, device, equipment and medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311092375.0A CN116826762B (en) 2023-08-29 2023-08-29 Intelligent power distribution network voltage safety control method, device, equipment and medium thereof

Publications (2)

Publication Number Publication Date
CN116826762A CN116826762A (en) 2023-09-29
CN116826762B true CN116826762B (en) 2023-12-19

Family

ID=88122475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311092375.0A Active CN116826762B (en) 2023-08-29 2023-08-29 Intelligent power distribution network voltage safety control method, device, equipment and medium thereof

Country Status (1)

Country Link
CN (1) CN116826762B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117526443B (en) * 2023-11-07 2024-04-26 北京清电科技有限公司 Power system-based power distribution network optimization regulation and control method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112564189A (en) * 2020-12-15 2021-03-26 深圳供电局有限公司 Active and reactive power coordinated optimization control method
CN113078641A (en) * 2021-04-29 2021-07-06 国网山东省电力公司经济技术研究院 Power distribution network reactive power optimization method and device based on evaluator and reinforcement learning
CN115313403A (en) * 2022-07-22 2022-11-08 浙江工业大学 Real-time voltage regulation and control method based on deep reinforcement learning algorithm
CN116243727A (en) * 2023-03-17 2023-06-09 厦门大学 Unmanned carrier countermeasure and obstacle avoidance method for progressive deep reinforcement learning
CN116629461A (en) * 2023-07-25 2023-08-22 山东大学 Distributed optimization method, system, equipment and storage medium for active power distribution network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112564189A (en) * 2020-12-15 2021-03-26 深圳供电局有限公司 Active and reactive power coordinated optimization control method
CN113078641A (en) * 2021-04-29 2021-07-06 国网山东省电力公司经济技术研究院 Power distribution network reactive power optimization method and device based on evaluator and reinforcement learning
CN115313403A (en) * 2022-07-22 2022-11-08 浙江工业大学 Real-time voltage regulation and control method based on deep reinforcement learning algorithm
CN116243727A (en) * 2023-03-17 2023-06-09 厦门大学 Unmanned carrier countermeasure and obstacle avoidance method for progressive deep reinforcement learning
CN116629461A (en) * 2023-07-25 2023-08-22 山东大学 Distributed optimization method, system, equipment and storage medium for active power distribution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度确定策略梯度算法的主动配电网协调优化;龚锦霞;刘艳敏;;电力系统自动化(第06期);全文 *

Also Published As

Publication number Publication date
CN116826762A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
Zhang et al. Dynamic energy conversion and management strategy for an integrated electricity and natural gas system with renewable energy: Deep reinforcement learning approach
Shayeghi et al. Multi-machine power system stabilizers design using chaotic optimization algorithm
EP3568810B1 (en) Action selection for reinforcement learning using neural networks
Nie et al. Optimizing the post-disaster control of islanded microgrid: A multi-agent deep reinforcement learning approach
CN116826762B (en) Intelligent power distribution network voltage safety control method, device, equipment and medium thereof
CN114217524B (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
US20210133376A1 (en) Systems and methods of parameter calibration for dynamic models of electric power systems
CN114362187B (en) Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
Zhang et al. A novel deep reinforcement learning enabled sparsity promoting adaptive control method to improve the stability of power systems with wind energy penetration
CN115833101B (en) Power scheduling method, device, electronic equipment and storage medium
Xie et al. Distributional deep reinforcement learning-based emergency frequency control
CN115293052A (en) Power system active power flow online optimization control method, storage medium and device
CN112930541A (en) Determining a control strategy by minimizing delusional effects
CN114784823A (en) Micro-grid frequency control method and system based on depth certainty strategy gradient
CN116629461B (en) Distributed optimization method, system, equipment and storage medium for active power distribution network
US20230344242A1 (en) Method for automatic adjustment of power grid operation mode base on reinforcement learning
Zhang et al. Deep reinforcement learning for load shedding against short-term voltage instability in large power systems
CN115149544A (en) Soft Actor-critical algorithm-based power distribution network continuous reactive voltage optimization method
Zeng et al. Distributed deep reinforcement learning-based approach for fast preventive control considering transient stability constraints
Dehnavi et al. Retracted: New Deep Learning-Based Approach for Wind Turbine Output Power Modeling and Forecasting
Zhang et al. Physical-model-free intelligent energy management for a grid-connected hybrid wind-microturbine-PV-EV energy system via deep reinforcement learning approach
Hayou et al. Lora+: Efficient low rank adaptation of large models
CN117833263A (en) New energy power grid voltage control method and system based on DDPG
CN116436003B (en) Active power distribution network risk constraint standby optimization method, system, medium and equipment
Stulov et al. Learning model of generator from terminal data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant