CN115360772A

CN115360772A - Power system active safety correction control method, system, equipment and storage medium

Info

Publication number: CN115360772A
Application number: CN202210289577.3A
Authority: CN
Inventors: 王一迪; 於益军; 李立新; 刘金波; 马晓忱; 李理; 吕闫; 唐俊刺; 李铁; 李桐; 徐瑕龄; 韩巍; 罗雅迪; 孙博; 刘蒙; 张�浩; 曹坤; 王淼; 狄方春; 张�杰
Original assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Tianjin Electric Power Co Ltd; State Grid Jibei Electric Power Co Ltd; State Grid Liaoning Electric Power Co Ltd; Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Tianjin Electric Power Co Ltd; State Grid Jibei Electric Power Co Ltd; State Grid Liaoning Electric Power Co Ltd; Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-11-18
Anticipated expiration: 2042-03-23
Also published as: CN115360772B

Abstract

The invention discloses a method, a system, equipment and a storage medium for controlling active safety correction of a power system, wherein the method comprises the following steps: acquiring real-time operation data of the power system; the real-time operation data of the power system is input into the trained intelligent agent to obtain an active safety correction control scheme of the power system, and then active safety correction control is carried out on the power system according to the active safety correction control scheme of the power system.

Description

Power system active safety correction control method, system, equipment and storage medium

Technical Field

The invention belongs to the technical field of smart power grids, and relates to an active safety correction control method, system, equipment and a storage medium, in particular to an active safety correction control method, system, equipment and a storage medium for a power system.

Background

The task of the power system is to supply the users with sufficient electrical energy of a quality that is in accordance with regulations without interruption. With the improvement of the interconnection degree of a power grid, the application of a new technology and the access of large-scale new energy sources with randomness, intermittence and time variability, the fluctuation of system power and the change of tide are greatly increased, a power system is increasingly complex, the form and the characteristics of the power grid face deep changes, the safety and stability level are mutually restricted, meanwhile, the improvement of the economic operation requirement of the power grid and the constraint conditions brought by the marketization of the power force the operation point of the power system to be at the safe edge, and the factors cause the new safety problem of the power system. The safety correction of the power system provides theoretical basis and technical support for ensuring the maximum economy and safety of the power grid under severe conditions. The power system operation personnel need a good system safety analysis tool and provide a safe operation strategy so as to improve the safe operation level of the system. Under the condition that the structure of a power transmission network of an electric power system is not changed, two methods are available for the active safety correction of the traditional electric power system:

(1) And (3) a sensitivity analysis method. And selecting a generator with higher sensitivity to one or a group of target transmission section power, and adjusting the transmission power of the target section.

(2) And (5) optimizing a planning method. The method comprises the steps of converting an active safety correction problem of the power system into an optimization planning problem, taking the minimum number of adjustment elements or the minimum adjustment quantity as an active safety correction optimization target of the system, and solving by using a mathematical planning method under the condition of various safety constraints of the system.

Because the sensitivity method has poor calculation accuracy, sometimes the sawing phenomenon occurs, and the machine set is adjusted repeatedly; the mathematical programming method needs too many devices to be adjusted, has slow calculation speed and may have the problem of calculation non-convergence. Because the active safety correction control is used in real time, the traditional research method is difficult to meet the requirements of calculation speed and calculation precision at the same time, and the requirements on precision are usually sacrificed for high calculation speed in the active safety correction control of an actual power system.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method, a system, equipment and a storage medium for controlling active safety correction of a power system, wherein the method, the system, the equipment and the storage medium can realize active safety correction of the power system and have the characteristics of high speed and high precision.

In order to achieve the purpose, the invention adopts the following technical scheme:

in one aspect of the present invention, the present invention provides a method for controlling active safety correction of an electrical power system, including:

acquiring real-time operation data of the power system;

and inputting the real-time operation data of the power system into the trained intelligent agent to obtain an active safety correction control scheme of the power system, and then performing active safety correction control on the power system according to the active safety correction control scheme of the power system.

The invention further improves the active safety correction control method of the power system, which comprises the following steps:

before the inputting the real-time operation data of the power system into the trained intelligent agent, the method further comprises:

establishing a sorting experience playback pool, wherein the sorting experience playback pool comprises a successful experience playback area and a failed experience playback area;

extracting samples from the categorized experience playback pool;

and training the intelligent agent by using the extracted sample to obtain the trained intelligent agent.

The state space s in the process of training the agent by using the extracted samples is as follows:

wherein ,

for the active generator j output of the ith sample,

for the line power flow of the ith sample,

j =1, \8230;, n for the load power of the ith sample _gen ，k＝1,…,n _line ，m＝1,…,n _load ，n _gen Number of generators, n _line Number of branches, n _load For the number of loads, M is the sample takenThe number of the cells.

The action space in the process of training the intelligent agent by using the extracted samples is as follows:

the action space is a controllable variable in load flow calculation, the continuous variable of the action space is a generator output adjustment quantity, and the action space a = [ delta P ] of the intelligent agent at t moment _G1 ,…,ΔP _Gj ]Wherein, Δ P _Gj Is an adjustment value of the generator power.

The reward function in the process of training the intelligent agent by using the extracted samples is as follows:

R＝ν ₁ r ₁ +ν ₂ r ₂ +ν ₃ r ₃ ；

wherein ,r₁ 、r ₂ 、r ₃ Respectively, the reward value of the line out-of-limit condition, the reward value of the generator output out-of-limit constraint, the reward value of the generator cost, v ₁ 、v ₂ 、v ₃ Are weight coefficients.

Reward value r for line out-of-limit conditions ₁ Comprises the following steps:

wherein ,n_line Is the number of branches of the power grid, I _i and T_i The current and thermal limits of branch i, respectively, are constant.

Reward value r of generator output out-of-limit constraint ₂ Comprises the following steps:

wherein ,n_gen Number of generators, P _Gi For the output of the generator, P _Gimax Represents the upper limit of the active power output of the generator, P _Gimin And represents the lower limit of the active output of the generator.

Cost r of generator ₃ The reward value of (c) is:

wherein n is the number of the units, P _Gi The output of the generator is shown as a, b and c, and d is the start-stop coefficient of the unit.

In a second aspect of the present invention, the present invention provides an active safety correction control system for an electrical power system, including:

the acquisition module is used for acquiring real-time operation data of the power system;

and the control module is used for inputting the real-time operation data of the power system into the trained intelligent agent to obtain an active safety correction control scheme of the power system, and then performing active safety correction control on the power system according to the active safety correction control scheme of the power system.

In another aspect, the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the power system active safety correction control method when executing the computer program.

In a fourth aspect of the present invention, the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the power system active safety correction control method.

The invention has the following beneficial effects:

the active safety correction control method, the system, the equipment and the storage medium of the power system are used for realizing the active safety correction control of the power system in a deep reinforcement learning mode without depending on a specific physical model during specific operation, so that the problems of calculation precision and calculation time caused by repeated adjustment are avoided.

Further, the categorized experience replay pool comprises a successful experience replay region and a failed experience replay region, so as to improve the efficiency of the intelligent agent training.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a flow chart of agent training;

fig. 3 is a system configuration diagram of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments, and do not limit the scope of the disclosure of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

There is shown in the drawings a schematic block diagram of a disclosed embodiment in accordance with the invention. The figures are not drawn to scale, wherein certain details are exaggerated and some details may be omitted for clarity of presentation. The shapes of the various regions, layers and their relative sizes, positional relationships are shown in the drawings as examples only, and in practice deviations due to manufacturing tolerances or technical limitations are possible, and a person skilled in the art may additionally design regions/layers with different shapes, sizes, relative positions, according to the actual needs.

As is known, safety corrections: in the static field, potential out-of-limit phenomena are eliminated through rearrangement of controllable variables in the system, and the potential out-of-limit phenomena can be generally divided into two sub-problems of active safety correction and reactive safety correction.

Active safety correction control: and when the branch active power of the system is out of limit, adjusting the active power of the generator to redistribute the power flow, and eliminating the branch power out of limit.

And (3) exceeding the limit of the tide: the phenomenon that the branch current exceeds the rated value.

The output of the generator is as follows: and the generator set inputs active power to the power grid.

Example one

Referring to fig. 1, the active safety correction control method for the power system according to the present invention includes the following steps:

1) Collecting multi-scenario data of a power grid, wherein the multi-scenario data comprises information of power flow out-of-limit, a topological structure of a power system, line parameters, generator parameters, load parameters and power grid power flow information.

2) Carrying out load flow calculation according to the multi-scene data of the power grid to obtain real-time operation data of the power grid;

3) And judging whether the power grid has a power flow out-of-limit condition, when the power flow out-of-limit condition exists, obtaining an active safety correction control scheme of the power system by using the trained intelligent body according to the operation data of the power grid, and performing active safety correction control on the power system according to the active safety correction control scheme of the power system, otherwise, performing active safety correction on the power grid.

The specific process of training the agent is as follows:

31 Building a classification experience playback pool;

the state of each step-action-reward value-next state(s) _t ，a _t ，r，s _t+1 ) The method comprises the steps of combining a group of experiences and storing the group of experiences in a classification experience playback pool, wherein the classification experience playback pool is used for solving the problems of data correlation and non-static distribution generated when an agent interacts with the environment;

32 TD3 agents are trained offline;

extracting a plurality of groups of experiences, namely samples, from the classified experience playback pool, training the TD3 intelligent agent by using each group of extracted experiences, and recalculating the power grid flow after each action in the training process until all the flow out-of-limit conditions are eliminated or the iteration times are met;

33 Obtaining a trained TD3 agent;

through accumulated return, the TD3 intelligent agent can adapt to various out-of-limit conditions, the trained TD3 intelligent agent can make real-time decision on an online running power system, and an active safety correction control scheme aiming at power flow out-of-limit is provided.

34 Aiming at the problems of the power grid, performing active safety correction control on the running power system by using the trained TD3 agent;

when the situation of power flow out-of-limit exists, the trained TD3 intelligent agent is used for outputting an active safety correction control scheme, the power grid carries out active safety correction control according to the active safety correction control scheme, an adjustment scheme is given on line in the primary interaction process, when the power grid operation data are in a reasonable range, namely the operation working condition is met, the power grid can operate safely, and active safety correction is not needed.

The modeling process of the active safety correction control decision is as follows:

311 Obtain a state space;

the state is an observable variable of the intelligent agent to the unit and a surrounding power grid, the state space considers factors influencing decision as much as possible, and the continuous action variables of the state space comprise generator power generation power, load power and line tide value and are characterized as follows:

wherein ,

the active output of the generator j for the ith sample,

for the line active power flow of the ith sample,

j =1, \ 8230;, n is the load power of the ith sample _gen ，k＝1,…,n _line ，m＝1,…,n _load ，n _gen Number of generators, n _line Number of branches, n _load For the number of loads, M is the number of samples.

312 Determine an action space;

313 The reward function in the agent training process is:

R＝ν ₁ r ₁ +ν ₂ r ₂ +ν ₃ r ₃ (2)

wherein ,r₁ 、r ₂ 、r ₃ And r ₄ Respectively, the reward value of the line out-of-limit condition, the reward value of the generator output out-of-limit constraint, the reward value of the generator cost and the reward value of the active power balance, v ₁ 、v ₂ 、v ₃ And v ₄ Is a weight coefficient;

wherein the reward value r of the line out-of-limit condition ₁ Comprises the following steps:

wherein ,n_line Is the number of branches of the grid, I _i and T_i The current and thermal limits of the branch i are respectively, epsilon is a constant, and epsilon generally takes the value of 0 or 1, so as to avoid the condition that the denominator is 0.

wherein ,n_gen Number of generators, P _Gi For generator output, P _Gimax Representing the upper limit of the active power output, P, of the generator _Gimin And represents the lower limit of the active output of the generator.

Cost r of generator ₃ The reward value of (c) is:

wherein n is the number of units, P _Gi The output of the generator is shown as a, b and c, and d is the start-stop coefficient of the unit.

The classification experience playback pool in the step 31) is divided into a successful experience playback area and a failed experience playback area, and the specific classification method comprises the following steps:

the conventional TD3 algorithm uses an empirical playback zone for solving the problem of correlation and static distribution between data to(s) _t ，a _t ，r，s _t+1 ) The sequence is stored in an experience playback area as a unit, and when the sequence is updated every time, the Actor network and the Critic network randomly extract a part of samples from the sequence for optimization, wherein the random sampling mode causes low training efficiency and poor algorithm convergence. The classification experience playback pool is divided into a successful experience playback area and a failed experience playback area, when the trend after the successful safety correction is not out of limit, the task is successful, the experience is stored in the successful experience playback area and is recorded as T _S (ii) a When the trend is out of limit, the task fails, the failure experience is stored in a failure experience playback area and is recorded as T _f Since there is a time delay in the rewarding process of reinforcement learning, it is stored in T _S Some of the experience immediately before failure will also be related to failure, and therefore this part of the experience is taken from T _S Extracted according to the proportion of eta.

The improved sampling mode is as follows:

at each update step, when successful experience is obtained, then(s) _t ，a _t ，r，s _t+1 ) Is stored in T _S Performing the following steps; when it is a failed experience, then(s) _t ，a _t ，r，s _t+1 ) Is stored at T _f Simultaneously from T in the proportion of eta _f Extracting the failure experience.

And sampling from the reconstructed experience playback area to obtain a small batch of training data, updating the parameters of the current network by TD3 through a gradient ascending and gradient descending algorithm, and updating the parameters of the target network by a sliding average method, so that the parameters of the target network are changed slowly to improve the learning stability.

In step 32), the specific method for off-line training the TD3 agent is as follows:

referring to fig. 2, the td3 agent is composed of a current network and a target network, wherein the current network and the target network can be divided into a policy network for executing actions and a value network for evaluating the quality of the actions. Each strategy network and each value network are all a fully-connected neural network, each fully-connected neural network consists of an input layer, an output layer and a plurality of hidden layers, an Actor part corresponds to two networks and respectively comprises an Actor network and an Actor _ target network, and a Critic part comprises four neural networks which respectively comprise a Critic _1 network, a Critic _ target1 network, a Critic _2 network and a Critic _ target2 network.

The Actor network is a policy network, acquires the action required to be taken currently according to the current state, inputs the action as state quantity and outputs the action quantity; the Critic network is a current value network, the value of the output action of the Actor network is evaluated to generate the updated gradient of the Actor network, input state quantity and action quantity and output the value of executing the input action in the current environment state; the Actor _ target network is a target strategy network and is used for selecting an optimal next action according to a next state sampled in the experience playback pool and updating network parameters; the Critic _ target1 network and the Critic _ target2 network are target value networks and are used for calculating target values and updating network parameters.

The Actor network parameter of the fitting strategy is theta, and the input is the current state s _t Outputting the action a of the generator _t The hidden layer of the network uses Relu activation function to carry out nonlinear change, the output layer uses Sigmoid activation function, and the network parameters are updated by deterministic strategy network gradient theorem:

wherein ,

for the gradient of the objective function with respect to theta, N is the amount of data randomly taken from the successful experience playback pool, mu and Q represent the Actor network and Critic network, respectively, i represents the sample number, s _i The state characterizing vector representing the ith sample, and a represents the action at the current time.

The parameter of the Actor _ target network is theta', and the input is the next state s _t+1 Outputting an action a in the next state _t 。

The Critic _1 network and Critic _2 network parameters of the fitting state action value function are w respectively ₁ and w₂ Input is the current state s _t And the actually performed action a _t The output is a state action value Q _w1 and Q_w2 The hidden layer of the network uses Relu activation function to carry out nonlinear change, the output layer uses Tanh activation function, and the network parameters are updated by a small batch random gradient descent method:

wherein ,

is the gradient of the loss function, y _i And a _i The sample time differential target value and action are taken for the ith, respectively.

The parameters of the Critic _ target1 network and the Critic _ target2 network are w' ₁ and w′₂ Input is the next state s _t+1 And behavior of next state of target policy network output

Outputs a next-state operation value Q' _w1 And Q' _w2 . Since the TD3 agent selects one of the two target networks having a smaller Q value to prevent the Q value from being overestimated, the time difference target value y of equation (9) is used when updating both the critical _ target1 network and the critical _ target2 network, and the loss function shown in equation (10) is shared:

wherein r is reward, gamma is discount coefficient, mu' is Actor _ target network, w _i And w' _i The weights of the ith sample in the Critic network and the Critic _ target network are respectively.

After updating Critic twice, the Actor updates, and the parameters of the policy network and the value network respectively obtain the parameters of the target policy network and the target value network through sliding average:

wherein ,

is the output of the Actor _ target network.

At regular intervals of training, the current value network will extract quantitative samples from the classification experience playback pool according to(s) _t ,a _t ) Obtaining the value at s by using the current value network _t Under the state of executing a _t Q value of action, and according to s _t+1 Obtaining approximate target y of Q on the target network, updating the value network by minimizing loss function by gradient descent method to make Q value approximate to y, and updating the strategy network by maximizing target function by strategy gradient ascent algorithm to make the model more biased to the action of outputting higher Q value。

The method has the advantages that the deep neural network and the TD3 algorithm are utilized, the global optimal decision is automatically learned from the interaction with the environment through reinforcement learning, the optimal strategy is quickly given on line, and the requirements of online active safety correction on the calculation speed and the calculation precision are met, in addition, the TD3 algorithm adopts parallel calculation, the calculation time is greatly shortened under the condition that the hardware condition allows, and the method is very suitable for the quick optimization solution of the active safety correction control problem of the power grid; the invention is not influenced by a system model and has better expandability; the method is suitable for the active safety correction strategy decision under multiple scenes, and the TD3 intelligent agent is trained under the operation data of the multi-scene power grid, so that the trained intelligent agent can cope with various power flow out-of-limit conditions.

Example two

Referring to fig. 3, the active safety correction control system of the power system according to the present invention includes:

the acquisition module 1 is used for acquiring real-time operation data of the power system;

and the control module 2 is used for inputting the real-time operation data of the power system into the trained intelligent agent to obtain an active safety correction control scheme of the power system, and then performing active safety correction control on the power system according to the active safety correction control scheme of the power system.

Further, the control module 2 includes:

the building module 21 is configured to build a categorized experience playback pool, where the categorized experience playback pool includes a successful experience playback area and a failed experience playback area;

an extraction module 22 for extracting samples from the categorized experience playback pool;

and the training module 23 is configured to train the agent by using the extracted sample, so as to obtain the trained agent.

All relevant contents of each step related to the embodiment of the active safety correction control method for the power system can be cited to the functional description of the functional module corresponding to the active safety correction control system for the power system in the embodiment of the present application, and are not described herein again.

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

EXAMPLE III

A computer device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the power system active safety correction control method when executing the computer program, wherein the memory may include a memory, such as a high speed random access memory, and may further include a nonvolatile memory, such as at least one disk memory; the processor, the network interface and the memory are connected with each other through an internal bus, wherein the internal bus can be an industrial standard system structure bus, a peripheral component interconnection standard bus, an extended industrial standard structure bus and the like, and the bus can be divided into an address bus, a data bus, a control bus and the like. The memory is used for storing programs, and particularly, the programs can comprise program codes which comprise computer operation instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

Example four

A computer readable storage medium storing a computer program which when executed by a processor implements the steps of the power system active safety correction control method, in particular, but not limited to, for example, volatile memory and/or non-volatile memory. The volatile memory may include Random Access Memory (RAM) and/or cache memory (cache), among others. The non-volatile memory may include a Read Only Memory (ROM), hard disk, flash memory, optical disk, magnetic disk, and the like.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. An active safety correction control method for a power system is characterized by comprising the following steps:

acquiring real-time operation data of the power system;

2. The power system active safety correction control method according to claim 1, wherein before inputting the real-time operation data of the power system into the trained agent, the method further comprises:

extracting samples from the categorized experience playback pool;

3. The power system active safety correction control method according to claim 2, wherein the state space s in the process of training the agent by using the extracted samples is:

wherein ,

for the active generator j output of the ith sample,

for the line active power flow of the ith sample,

j =1, \8230;, n for the load power of the ith sample _gen ，k＝1,…,n _line ，m＝1,…,n _load ，n _gen Number of generators, n _line Number of branches, n _load For the number of loads, M is the number of samples taken.

4. The power system active safety correction control method according to claim 2, wherein the reward function in the process of training the agent by using the extracted samples is as follows:

R＝ν ₁ r ₁ +ν ₂ r ₂ +ν ₃ r ₃ ；

wherein ,r₁ 、r ₂ 、r ₃ Respectively as the reward value of line out-of-limit condition, the reward value of generator output out-of-limit constraint, the reward value of generator cost, v ₁ 、ν ₂ 、v ₃ Are weight coefficients.

5. The power system active safety correction control method according to claim 4, characterized in that the reward value r for line out-of-limit condition ₁ Comprises the following steps:

wherein ,n_line The number of the branches of the power grid,I _i and T_i The current and thermal limits of branch i, respectively, and epsilon is a constant.

6. The power system active safety correction control method according to claim 4, characterized in that the reward value r of the generator output out-of-limit constraint ₂ Comprises the following steps:

wherein ,n_gen Number of generators, P _Gi For the output of the generator, P _Gimax Representing the upper limit of the active power output, P, of the generator _Gimin And represents the lower limit of the active output of the generator.

7. The power system active safety correction control method according to claim 4, characterized in that the generator cost r ₃ The reward value of (c) is:

8. An active safety correction control system for a power system, comprising:

the acquisition module (1) is used for acquiring real-time operation data of the power system;

and the control module (2) is used for inputting the real-time operation data of the power system into the trained intelligent agent to obtain an active safety correction control scheme of the power system, and then performing active safety correction control on the power system according to the active safety correction control scheme of the power system.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the power system active safety correction control method according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, and the computer program, when being executed by a processor, implements the steps of the power system active safety correction control method according to any one of claims 1 to 7.