CN117154845A - Power grid operation adjustment method based on generation type decision model - Google Patents

Power grid operation adjustment method based on generation type decision model Download PDF

Info

Publication number
CN117154845A
CN117154845A CN202311103915.0A CN202311103915A CN117154845A CN 117154845 A CN117154845 A CN 117154845A CN 202311103915 A CN202311103915 A CN 202311103915A CN 117154845 A CN117154845 A CN 117154845A
Authority
CN
China
Prior art keywords
model
power grid
strategy
decision
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311103915.0A
Other languages
Chinese (zh)
Inventor
周号益
朱天晨
仇越
孙庆赟
姜春阳
李建欣
胡春明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202311103915.0A priority Critical patent/CN117154845A/en
Publication of CN117154845A publication Critical patent/CN117154845A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Power Engineering (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Geometry (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention realizes a power grid operation adjustment method based on a generated decision model. Selecting a transducer model as a decision model of a causal self-attention mechanism, constructing a power grid intelligent decision method frame based on the causal self-attention mechanism, adopting a constructed power grid dispatching model, designing two parts of a reward distribution model and a strategy model, and outputting the action at the current moment to realize the real-time generation of a dispatching strategy of a power grid system; constructing a power grid simulation environment module to simulate the actual running condition of a power grid; on the basis, the strategy network and the rewards distribution network are updated and optimized in an efficient and alternative mode through a double-layer optimization algorithm. Finally, the power adjustment and topology adjustment scheme of the power grid system is generated in high efficiency and real time by inputting the state and rewarding feedback from the power grid environment, so that the human expert is assisted in scheduling decision, the safe and stable operation of the power grid system is realized, and the scheduling decision cost and the energy consumption are reduced.

Description

Power grid operation adjustment method based on generation type decision model
Technical Field
The invention relates to the technical field of information, in particular to a power grid operation adjustment method based on a generation type decision model.
Background
The grid is an interconnected system that delivers electricity from a production end (power station) to a consumer end (building, factory, etc.). The stable operation of the power grid system is a dynamic balance process, in the process, frequent, severe and unpredictable supply and demand changes, extreme weather, equipment faults and other abnormal events exist, and external means are needed to be intervened for regulation and control so as to avoid the power failure event caused by unbalance of the power grid system and even safety accidents, and the social economy and public safety are seriously influenced.
The traditional power system dispatching method mainly comprises a manual regulation and control method based on expert experience, and a mathematical model-based building and solving method. However, with the rising and continuous development of a novel power system featuring the expansion of the power grid scale and the transformation of the energy structure, the operation scheduling task of the power grid system presents the characteristics of decision dimension expansion and decision difficulty upgrading, and the traditional method based on expert experience driving and mathematical model driving is limited due to the problems of limited professional knowledge, poor expandability, difficult dynamic optimization and the like, and is not suitable for a complex novel power system scheduling decision scene.
The intelligent power grid dispatching method based on data driving is gradually raised along with the development of big data and artificial intelligent technology, provides a new paradigm for power grid operation decision, and is an important jigsaw for completing intelligent transformation of the power system number in the information age background. The reinforcement learning method is used as a branch of machine learning and artificial intelligence technology in the decision field, the basic principle is that an intelligent body continuously learns according to obtained rewards and punishments in training, finally makes a high-level decision according to learning experience, and the reinforcement learning and deep learning technology is fused with the development of big data and artificial intelligence technology to show a powerful capability exceeding that of human experts in complex and continuous decision tasks such as go, electronic games, robot control and the like. Therefore, the reinforcement learning related technology is a mainstream method for scheduling decisions of the data-driven power grid system in the future.
The existing grid system dispatching method based on reinforcement learning can be mainly divided into a behavior cloning method based on expert examples and an online reinforcement learning method driven by rewards. The former learns the scheduling policy by the agent mimicking the high quality expert grid system decision scheduling example. However, the method ignores the fact that the non-spurious errors in expert examples can mislead the learning of the intelligent agent, meanwhile, the expert examples cannot guarantee the comprehensiveness, because the situation that the expert examples are not involved often exists in actual scheduling tasks, and the mobility of the intelligent agent to the unknown situation cannot be guaranteed through blind cloning of expert strategies. The latter interacts with the grid simulation environment through agents to learn the grid scheduling strategy from the rewarding feedback of the environment in a manner of continuous exploration and trial-and-error. However, this approach is highly dependent on dense reward signal feedback from the grid environment, however for highly complex new power systems, the impact of scheduling decisions often has "hysteresis" and "non-timeliness", resulting in delays and sparsity of the reward signals, and thus in poor agent learning efficiency, even creating false correlations between strategy and scheduling objectives. For example, serious influences caused by one scheduling error can be exposed after power flow is not converged or large-scale line breakage and power failure are caused after multiple rounds of simulation.
In summary, a new method for designing a power grid dispatching system is needed at present to solve the spurious and non-comprehensive problems of expert examples and the delay and sparseness problems of feedback of grid environment rewarding signals.
Decision Transformer by taking a GPT-2 (one of the converterler models) model as a decision network, establishing deep association among environmental states, scheduling strategies and rewarding feedback sequences through an attention mechanism, and providing a new thought for solving the system task of novel power grid scheduling by using a novel reinforcement learning modeling mode based on sequence modeling.
Most of the methods based on manual features or traditional machine learning have a great improvement in accuracy.
The existing power grid system scheduling decision method based on reinforcement learning has a plurality of defects and shortcomings.
First, many online reinforcement learning methods rely on frequent interactions of an agent with a grid system to perform trial-and-error learning, which generally results in low model training efficiency and requires high cost and technical effort to build a grid system simulation environment to support model training;
secondly, because of the instability of the novel power system and the unpredictability of the power demand, the learning mode based on exploration and interaction can also lead to larger variance of model strategy gradient, thereby leading to oscillation of learning curve, slow convergence and reduced learning efficiency.
Third, reinforcement learning methods are often highly dependent on dense reward signal feedback from the grid environment, however for highly complex new power systems, the impact of scheduling decisions often has "hysteresis" and "non-timeliness", resulting in delays and sparsity of the reward signals, thus resulting in poor agent learning efficiency and even creating false associations between policies and scheduling objectives. For example, serious influences caused by one scheduling error can be exposed after power flow is not converged or large-scale line breakage and power failure are caused after multiple rounds of simulation.
Disclosure of Invention
Therefore, the invention firstly provides a power grid operation adjustment method based on a generated decision model, a transducer model is selected as a decision model of a causal self-attention mechanism, a power grid intelligent decision method framework is built based on the causal self-attention mechanism, a power grid dispatching model is built, two parts of a reward distribution model and a strategy model are designed, the reward signal is adaptively redistributed by means of the reward distribution network, and the strategy model receives a sequence after reward correctionAs input, and output action A at the current time t The scheduling strategy of the power grid system is generated in real time; the power grid online dispatching strategy mainly comprises a power dispatching strategy and a topology dispatching strategy, wherein the power dispatching strategy mainly comprises the adjustment of power of generating sets such as thermal power, new energy and the like in a system, the adjustment of charging and discharging energy of energy storage equipment and the adjustment of adjustable load power, and the topology dispatching strategy mainly comprises the decision of disconnection and connection of each line in a power grid system and the decision of connection of each branch line and bus in a transformer substation; constructing a power grid simulation environment module to simulate the actual running condition of a power grid, and receiving action A t As a power and topology adjustment scheduling strategy, taking the influence of power demand change and environmental random factors into consideration, and obtaining the next state S according to the bottom tide convergence principle simulation calculation t+1 Rewards r of grid environment t+1 The method comprises the steps of performing reciprocating operation until the power grid dispatching task is finished; on the basis, the strategy network and the rewards distribution network are updated and optimized in an efficient and alternative mode through a double-layer optimization algorithm.
The rewards distribution model is input into a state S of the power grid environment at a certain moment t And action A t Outputting the rewards f distributed by adopting the strategy under the state of the model t . General purpose medicineOverinput of a historical multi-step state sequence (S t-L, ......,S t-2 ,S t-1 ) And action sequence (A) t-L, ......,A t-2 ,A t-1 ) Obtaining a reconstructed prize distribution scheme (f) for the sequence t-L ,......,f t-2 ,f t-1 ) And thus calculate a new target prizeSequence corrected by rewarding distribution networkAs an input sequence to the transducer policy model.
The strategy model receives the rewards correction sequence after passing through the rewards distribution networkAnd deducing a next reasonable power grid dispatching adjustment scheme A by means of internal relation between a power grid dispatching strategy constructed by a self-attention mechanism and a power grid state and a target rewarding t The specific implementation mode is as follows:
(3) Calculating an embedding vector: first the model will input the sequenceMapping to a grid information sequence X under the embedded space, thereby uniformly representing the state, action and rewarding information of the grid system under the mapped embedded space and calculating the relation between the state, action and rewarding information of the grid system: x= (X 1 ,x 2 ,......),x k =E x (I),/>
(4) Calculate the attention: the transform strategy network adopts a GPT-2 model structure, and a plurality of decoders are stacked to form a multi-layer structure, wherein each Decoder consists of a multi-head self-attention layer with masks and a feedforward neural network, and the output of each layer is subjected to residual connection and layer normalization processingThe attention principle is shown as follows, wherein Q, K, V is query, key, vector subvectors of the power grid information sequence X, W Q WK and Wv are respectively linear transformation matrixes of three subvectors of a layer, M is a mask matrix, and the function of the linear transformation matrixes is to mask future information when the attention weight of the power grid information sequence is calculated, so that a causal mechanism is realized, and d k The function of the key sub-vector is to normalize the attention weight:
Q=XW Q
K=XW K
y=XW V
meanwhile, a multi-head mechanism is adopted, a plurality of attention heads are used for calculating a plurality of attention modes, and different attention heads model correlations between power grid information sequences from different characteristics and angles, such as correlations between time domain space and frequency domain space, and the like, so that details and complexity in power grid information are better captured.
The specific implementation mode of the double-layer optimization algorithm is as follows:
for training of a model, sampling a state S in an actual operation scheduling scene of a power grid real Prize r real And action A real The composed data example carries out offline training on the model, and calculates the action predicted value A output by the strategy model pred And action realism value A in data instance real The error between as a function of the loss of the policy model:
then dividing the data set into a training set and a verification set according to a certain proportion, and correcting the loss functions of the strategy model and the rewarding distribution model as follows:
wherein,and->Error of the strategy model on the verification set and the test set, phi and theta are parameters of the strategy model and the rewarding distribution model, gamma is total length of the decision sequence, lambda is weight factor of the regular term, and for improving training efficiency of the model, the outer layer optimization target is approximately replaced by the following method:
(1) The outer layer optimization target is approximately replaced by one-step gradient approximation:
(2) A step degree is unfolded through a chain type derivation rule:
(3) The vector product of the first and second order gradients is further replaced by taylor expansion:
the invention has the technical effects that:
by inputting state and rewarding feedback from the power grid environment, a power adjustment and topology adjustment scheme of the power grid system is generated efficiently in real time, scheduling decisions are assisted by human experts, safe and stable operation of the power grid system is realized, and scheduling decision cost and energy consumption are reduced.
In particular, has the following advantages:
1. and a causal self-attention-based transducer model is used as a decision network to establish deep association and long-sequence dependence among power grid environment states, scheduling strategies and rewarding feedback, so that unstable challenges such as instability and power demand fluctuation of the novel power system are overcome.
2. Based on the offline data driving model training, the high dependence of the online training process on the power grid environment interaction is avoided, and the scheduling strategy of the power grid system is explored in a more efficient and stable mode.
3. And carrying out self-adaptive redistribution on the reward signals delayed by the grid system through a reward distribution network, reconstructing the reward feedback of the grid environment delay sparseness through more reasonable reward distribution, assisting in strategy model learning and decision making, and optimizing the learning efficiency of the strategy model.
4. And decoupling the scheduling strategy problem and the rewarding distribution problem by designing a double-layer optimization algorithm so as to ensure the training efficiency of the strategy model and the rewarding distribution model.
Drawings
FIG. 1 is a causal self-attention model based grid intelligent decision method framework;
FIG. 2 rewards distribution network;
FIG. 3Transformer policy network architecture;
Detailed Description
The following is a preferred embodiment of the present invention and a technical solution of the present invention is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.
The invention provides a power grid operation adjustment method based on a generated decision model. And constructing a strategy model based on a causal self-attention mechanism, and carrying out self-adaptive reassignment on the reward signals by means of a reward distribution network. On the basis, the strategy network and the rewards distribution network are updated and optimized in an efficient and alternative mode through a double-layer optimization algorithm.
The transducer model was chosen as the decision model for our causal self-attention mechanism. The transducer model is a neural network model based on an attention mechanism, which learns the relationship between different positions in an input sequence through a multi-head self-attention mechanism, so as to realize modeling and prediction of the sequence. The transducer model was originally used for natural language processing tasks such as machine translation, and then significant results have been achieved in the fields of computer vision, speech recognition, and the like. The Tma model has a great potential in scheduling decision tasks in a continuous power grid system with long time-space dependence because of the natural advantage of the Tma model for long decision sequence processing based on a self-attention sequence modeling mechanism.
Causal self-attention model-based intelligent power grid decision-making method framework
The framework adopted by the invention is shown in figure 1, and comprises two large modules, namely a power grid dispatching model and a power grid simulation environment, wherein the power grid dispatching model is responsible for generating a dispatching strategy of a power grid system in real time, and the power grid simulation environment simulates the actual running condition of the power grid, and carries out simulation calculation in real time according to the dispatching strategy and random events. The interaction of the grid scheduling model and the grid simulation environment mainly extends around three concepts of state, action and rewards.
(1) Status: in a power system dispatching task, a state is defined as S= (X, T), wherein X is an attribute state and represents the active power P, the reactive power Q, the voltage V and other attributes of each element such as a generator, a load, energy storage, a network cable and the like in a power grid system, and the power system further comprises other information such as network cable loss, motor maintenance, shutdown faults and the like; t represents a topological structure of a power grid system at a certain moment;
(2) The actions are as follows: in the power system dispatching task, the action represents a quantitative dispatching strategy of the power grid system, which is expressed as A= (delta P, delta T), wherein delta P represents adjustment of power of each unit of the power grid system, and delta T represents adjustment of topology structure of the power grid system;
(3) Rewarding: in the power grid simulation environment, the comprehensive real-time evaluation of the scheduling strategy is performed in a form of rewarding the numerical feedback based on the running state of the power grid system by integrating a plurality of aspects such as environmental protection, economy, safety and the like. The bonus function of the present system is defined as follows, where r a To r e Respectively represents line crossing punishment, new energy consumption punishment, unit operation cost punishment, balance unit out-of-limit punishment, reactive power out-of-limit punishment and node voltage line crossing punishment, wherein punishment represents that the punishment is negative, and a a To a e Weight coefficients representing rewards.
r=a a r a +a b r b +a c r c +a d r d +a d r d +a e r e
In the actual operation scheduling process, the model realizes real-time scheduling of the power grid system by continuous interaction with the power grid simulation environment, and the real-time scheduling is realized by (R) t ,S t ,A t ) The form of "bonus-status-action" triples maintains current and historical multi-step information sequences, where S t 、A t Respectively represent the state and the action at the time t, R t =R t-1 -r t-1 Indicating the target prize at time t. At any time t, the power grid dispatching model receives the state S of the power grid simulation environment at the current time t Rewards r of grid environment t And incorporate it into a historical multi-step information sequence, the new sequence τ t =(R t-L ,S t-L ,A t-L ,......,R t-1 ,S t-1 ,A t-1 ,R t ,S t ) Will be used as input to the grid scheduling model at time t, where L represents the historical multi-step train length of the decision model, R t =R t-1 -r t-1 Indicating the target prize at time t.
The power grid dispatching model is mainly divided into a reward distribution model and a strategy model, wherein the reward distribution network corrects the reward signals and optimizes the distribution of rewards by redistributing the reward signals in the historical sequence, and obtains a redistributed reward correction sequence Policy network receives rewardsModified sequence->As input, and output action A at the current time t . Grid simulation environment receiving action A t As a power and topology adjustment strategy, taking the influence of power demand change and environmental random factors into consideration, and obtaining the next state S according to the bottom tide convergence principle simulation calculation t+1 Rewards r of grid environment t+1 And the process is carried out repeatedly until the power grid dispatching task is finished.
Rewards distribution network
The reward distribution network is mainly responsible for redistributing sparse delayed reward signals from the power grid environment in the historical multi-step sequence, so that corrected reward signal input is provided for the strategy network.
As shown in FIG. 2, the input to the bonus distribution network is the state S of the grid environment at a certain time t And action A t Outputting the rewards f distributed by adopting the strategy under the state of the model t . By inputting a history of multi-step state sequences (S t-L ,......,S t-2 ,S t-1 ) And action sequence (A) t-L ,......,A t-2 ,A t-1 ) Obtaining a reconstructed prize distribution scheme (f) for the sequence t-L ,......,f t-2 ,f t-1 ) And thus calculate a new target prizeSequence corrected by rewarding distribution networkAs an input sequence to the transducer policy model.
Transformer policy network
The strategy network of the invention receives the rewards correction sequence after passing through the rewards distribution networkAnd by means of self-attentive mechanismsThe built internal relation between the power grid dispatching strategy and the power grid state and target rewards deduces the next reasonable power grid dispatching adjustment scheme A t The specific principle is as follows.
(5) Calculating an embedding vector: first the model will input the sequenceMapping to a grid information sequence X under the embedded space, thereby uniformly representing the state, action and rewarding information of the grid system under the mapped embedded space and calculating the relation between the state, action and rewarding information.
(6) Calculate the attention: the Transformer strategy network of the present invention adopts a GPT-2 model structure, which forms a Multi-layer structure as shown in fig. 3 by stacking a plurality of decoders, wherein each Decoder is composed of a Masked Multi-Head Self-Attention layer (mask-Attention), a feedforward neural network (Feed Forward Network), and outputs of each layer are subjected to residual connection (Residual Connections) and layer normalization (Layer Normalization).
The attention principle of the model is shown in the following formula, wherein Q, K, V is query, key, vector subvectors of a power grid information sequence X and W respectively Q 、W K 、W V Then a linear transformation matrix is generated for each of the three sub-vectors of the hierarchy. M is a mask matrix, and the function of M is to mask future information when the attention weight of the power grid information sequence is calculated, so that a causal mechanism is realized. d, d k The key sub-vector is the dimension of the key sub-vector, and the function of the key sub-vector is to normalize the attention weight.
Q=XW Q
K=XW K
y=XW V
Meanwhile, the model adopts a multi-head mechanism, a plurality of attention heads are used for calculating a plurality of attention modes, and different attention heads model the correlation between power grid information sequences from different characteristics and angles, such as correlation of time domain space and frequency domain space, and the like, so that details and complexity in power grid information are better captured.
Double-layer optimization algorithm
For training of a model, sampling a state S in an actual operation scheduling scene of a power grid real Prize r real And action A real The composed data example carries out offline training on the model, and calculates the action predicted value A output by the strategy model pred And action realism value A in data instance real The error between as a function of the loss of the policy model:
to ensure efficient alternate training and updating of policy and reward distribution networks, we designed a two-layer optimization algorithm to decouple scheduling policy and reward distribution problems. Specifically, we divide the data set into training and validation sets at a certain ratio, and correct the loss functions of the strategy model and the reward distribution model as follows:
wherein,and->Respectively, policy models are in verification set and testError in the set, phi and theta are parameters of the strategy model and the rewarding distribution model respectively, Γ is the total length of the decision sequence, and λ is the weight factor of the regularization term. To improve the training efficiency of the model, the outer layer optimization target is approximately replaced by the following method:
(1) We approximate the replacement of the outer layer optimization objective by a one-step gradient approximation:
(2) A step degree is unfolded through a chain type derivation rule:
(3) The vector product of the first and second order gradients is further replaced by taylor expansion:

Claims (4)

1. a power grid operation adjustment method based on a generated decision model is characterized by comprising the following steps of: selecting a transducer model as a decision model of a causal self-attention mechanism, constructing a power grid intelligent decision method framework based on the causal self-attention mechanism, adopting a constructed power grid scheduling model, designing two parts of a reward distribution model and a strategy model, carrying out self-adaptive redistribution on a reward signal by means of a reward distribution network, and receiving a sequence tau' after reward correction by the strategy model t As input, and output action A at the current time t The online scheduling strategy of the power grid system is generated in real time; specifically, the power grid online dispatching strategy comprises a power dispatching strategy and a topology dispatching strategy, wherein the power dispatching strategy mainly comprises adjustment of power of different generator sets in a system, adjustment of charging and discharging energy power of energy storage equipment and adjustment of adjustable load power, and the topology dispatching is implemented by the power dispatching strategyThe strategy mainly comprises a decision of disconnection and connection of each line in the power grid system and a decision of connection of each branch line and a bus in the transformer substation; constructing a power grid simulation environment module to simulate the actual running condition of a power grid, and receiving action A t As an online scheduling strategy of power and topology, taking the influence of power demand change and environmental random factors into consideration, and obtaining the next state S according to the simulation calculation of the bottom tide convergence principle t+1 Rewards r of grid environment t+1 The method comprises the steps of performing reciprocating operation until the power grid dispatching task is finished; on the basis, the strategy network and the rewarding distribution network are updated and optimized in an efficient and alternative mode through a double-layer optimization algorithm, and finally the decision of disconnection and connection of each line in the power grid system, the decision of energy power adjustment and scheduling strategy and the decision of connection of each branch line and a bus in the transformer substation are realized.
2. A method of grid operation adjustment based on a generative decision model as claimed in claim 1, wherein: the rewards distribution model is input into a state S of the power grid environment at a certain moment t And action A t Outputting the rewards f distributed by adopting the strategy under the state of the model t By inputting a history of multi-step state sequences (S t-L ,……,S t-2 ,S t-1 ) And action sequence (A) t-L ,……,A t-2 ,A t-1 ) Obtaining a reconstructed prize distribution scheme (f) for the sequence t-L ,……,f t-2 ,f t-1 ) And thus calculate a new target prize R t 、=R t-1 -f t-1 Sequence tau' corrected by rewarding distribution network t =(R` t-L ,S t-L ,A t-L ,……,R` t-1 ,S t-1 ,A t-1 ,R` t 、,S t ) As an input sequence to the transducer policy model.
3. A method of grid operation adjustment based on a generative decision model as claimed in claim 2, wherein: the strategy model receives the rewards correction sequence tau' after passing through the rewards distribution network t And by means ofThe internal relation between the power grid dispatching strategy constructed by the self-attention mechanism and the power grid state and the target rewards infers the next reasonable power grid dispatching adjustment scheme A t The specific implementation mode is as follows:
(1) Calculating an embedding vector: first the model will input the sequence τ t Mapping to a grid information sequence X under the embedded space, thereby uniformly representing the state, action and rewarding information of the grid system under the mapped embedded space and calculating the relation between the state, action and rewarding information of the grid system: x= (X 1 ,x 2 ,……),x k =E x (I),I∈τ` t
(2) Calculate the attention: the converter strategy network adopts a GPT-2 model structure, and forms a multi-layer structure by stacking a plurality of decoders, wherein each Decoder consists of a multi-head self-attention layer with a mask and a feedforward neural network, the output of each layer is subjected to residual connection and layer normalization processing, the attention principle is as follows, Q, K, V is query, key, vector subvectors of a power grid information sequence X, and W is a power grid information sequence X Q 、W K 、W V The linear transformation matrix of three sub-vectors of the layer is respectively generated, M is a mask matrix, and the mask matrix is used for masking future information when the attention weight of the power grid information sequence is calculated, so that a causal mechanism is realized, and d k The function of the key sub-vector is to normalize the attention weight:
Q=XW Q
K=XW K
V=XW V
meanwhile, a multi-head mechanism is adopted, a plurality of attention heads are used for calculating a plurality of attention modes, and different attention heads model correlations between power grid information sequences from different characteristics and angles, such as correlations between time domain space and frequency domain space, and the like, so that details and complexity in power grid information are better captured.
4. A method of grid operation adjustment based on a generative decision model as claimed in claim 3, wherein: the specific implementation mode of the double-layer optimization algorithm is as follows:
for training of a model, sampling a state S in an actual operation scheduling scene of a power grid real Prize r real And action A real The composed data example carries out offline training on the model, and calculates the action predicted value A output by the strategy model pred And action realism value A in data instance real The error between as a function of the loss of the policy model:
then dividing the data set into a training set and a verification set according to a certain proportion, and correcting the loss functions of the strategy model and the rewarding distribution model as follows:
wherein,and->Error of the strategy model on the verification set and the test set, phi and theta are parameters of the strategy model and the rewarding distribution model, gamma is total length of the decision sequence, lambda is weight factor of the regular term, and for improving training efficiency of the model, the outer layer optimization target is approximately replaced by the following method:
(1) The outer layer optimization target is approximately replaced by one-step gradient approximation:
(2) A step degree is unfolded through a chain type derivation rule:
(3) The vector product of the first and second order gradients is further replaced by taylor expansion:
CN202311103915.0A 2023-08-30 2023-08-30 Power grid operation adjustment method based on generation type decision model Pending CN117154845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311103915.0A CN117154845A (en) 2023-08-30 2023-08-30 Power grid operation adjustment method based on generation type decision model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311103915.0A CN117154845A (en) 2023-08-30 2023-08-30 Power grid operation adjustment method based on generation type decision model

Publications (1)

Publication Number Publication Date
CN117154845A true CN117154845A (en) 2023-12-01

Family

ID=88911319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311103915.0A Pending CN117154845A (en) 2023-08-30 2023-08-30 Power grid operation adjustment method based on generation type decision model

Country Status (1)

Country Link
CN (1) CN117154845A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117973233A (en) * 2024-03-29 2024-05-03 合肥工业大学 Converter control model training and oscillation suppression method based on deep reinforcement learning
CN118037968A (en) * 2023-12-27 2024-05-14 南京南瑞水利水电科技有限公司 Hydropower GIS situation simulation deduction method and system based on illusion engine

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118037968A (en) * 2023-12-27 2024-05-14 南京南瑞水利水电科技有限公司 Hydropower GIS situation simulation deduction method and system based on illusion engine
CN117973233A (en) * 2024-03-29 2024-05-03 合肥工业大学 Converter control model training and oscillation suppression method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
CN117154845A (en) Power grid operation adjustment method based on generation type decision model
CN114118375B (en) Sequential diagram transducer-based continuous dynamic network characterization learning method
CN114362187B (en) Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
CN110264012B (en) Renewable energy power combination prediction method and system based on empirical mode decomposition
CN111917134B (en) Power distribution network dynamic autonomous reconstruction method and system based on data driving
CN115940294B (en) Multi-stage power grid real-time scheduling strategy adjustment method, system, equipment and storage medium
CN113141012A (en) Power grid power flow regulation and control decision reasoning method based on deep deterministic strategy gradient network
CN112651519A (en) Secondary equipment fault positioning method and system based on deep learning theory
Chen et al. A multivariate grey RBF hybrid model for residual useful life prediction of industrial equipment based on state data
CN114429248A (en) Transformer apparent power prediction method
CN117220318B (en) Power grid digital driving control method and system
CN114384931A (en) Unmanned aerial vehicle multi-target optimal control method and device based on strategy gradient
CN112101651B (en) Electric energy network coordination control method, system and information data processing terminal
Li et al. Multiagent deep meta reinforcement learning for sea computing-based energy management of interconnected grids considering renewable energy sources in sustainable cities
Abiyev Fuzzy wavelet neural network for prediction of electricity consumption
CN114707613B (en) Layered depth strategy gradient network-based power grid regulation and control method
CN116300755A (en) Double-layer optimal scheduling method and device for heat storage-containing heating system based on MPC
Huang et al. Short-term tie-line power prediction based on CNN-LSTM
Chiaberge et al. Mixing fuzzy, neural and genetic algorithms in an integrated design environment for intelligent controllers
CN114298429A (en) Power distribution network scheme aided decision-making method, system, device and storage medium
CN114372418A (en) Wind power space-time situation description model establishing method
Qin et al. Data-based reinforcement learning with application to wind turbine pitch control
Obert et al. Efficient distributed energy resource voltage control using ensemble deep reinforcement learning
Patil et al. A Comprehensive Analysis of Artificial Intelligence Integration in Electrical Engineering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination