CN114784823A - Micro-grid frequency control method and system based on depth certainty strategy gradient - Google Patents

Micro-grid frequency control method and system based on depth certainty strategy gradient Download PDF

Info

Publication number
CN114784823A
CN114784823A CN202210399513.9A CN202210399513A CN114784823A CN 114784823 A CN114784823 A CN 114784823A CN 202210399513 A CN202210399513 A CN 202210399513A CN 114784823 A CN114784823 A CN 114784823A
Authority
CN
China
Prior art keywords
network
strategy
micro
action
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210399513.9A
Other languages
Chinese (zh)
Inventor
刘智伟
刘香港
池明
刘骁康
叶林涛
王燕舞
肖江文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202210399513.9A priority Critical patent/CN114784823A/en
Publication of CN114784823A publication Critical patent/CN114784823A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/24Arrangements for preventing or reducing oscillations of power in networks
    • H02J3/241The oscillation concerning frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Power Engineering (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a micro-grid frequency control method and system based on a depth certainty strategy gradient, and belongs to the field of power system frequency control. The method comprises the steps of taking frequency deviation and integral of the micro-grid system as training data, and training an intelligent body by adopting a double-delay depth deterministic strategy gradient algorithm; the trained intelligent body is applied to a micro-grid system with new energy, the state information of the current system is input into an AC framework, the optimal action is selected, the optimal action is converted into an actual instruction to be used for the opening of a regulator valve of a synchronous generator, and the frequency of the micro-grid is controlled. The invention utilizes a model-free deep reinforcement learning algorithm to train the intelligent body to adaptively learn the frequency change of the power grid, and because the micro-grid containing new energy has the characteristics of randomness and intermittence, the invention does not need to rely on an ideal mathematical model with larger deviation with the real environment, only needs the input and reward values of the system to carry out continuous learning iteration, and has better control effect on the micro-grid.

Description

Micro-grid frequency control method and system based on depth certainty strategy gradient
Technical Field
The invention relates to the field of power system frequency control, in particular to a micro-grid frequency control method and system based on a depth deterministic strategy gradient.
Background
With the development of frequency control and new energy of an electric power system and the continuous introduction of the new energy in the electric power system, a novel frequency control method is required to meet the challenge, and the frequency in the electric power system is stabilized within a specified range of 50 +/-0.2 Hz, so that a large number of scientific researchers are stimulated to carry out research work on related problems of frequency control of the electric power system.
However, due to the complexity of the power system environment, most researchers use a simple linear approximation to simulate the grid environment when designing the controller, which results in some non-describable grid characteristics being mathematically modeled or linearized, resulting in a large error. On the other hand, with the continuous development of the strategy of 'double carbon', a certain proportion of new energy power generation equipment is introduced into a microgrid system, and the introduction of the equipment causes the randomness and intermittence of the system, so that the frequency control of the system has certain difficulty.
In the current frequency control method, the reason that the mathematical modeling is carried out on the power system too simply, the characteristics of new energy in the system are not considered, or the traditional control method cannot adaptively learn the change of system parameters and the like is adopted, so that the realization of a novel method which is independent of a system model and can adaptively adjust the parameter change in the system is particularly important.
The microgrid system containing new energy power generation equipment comprises modules such as wind power generation equipment, photovoltaic power generation equipment, a battery energy storage system, an alternating current connecting line, a synchronous generator, an electric load and the like, and wind power and photovoltaic plants have high randomness, so that the control effect is not good, the synchronous generator is mainly controlled, frequency signals of the microgrid system can be detected through a sensor, and the difference between the frequency signals and nominal frequency is made to obtain frequency deviation and integral signals of a power grid as observable states.
Therefore, the novel frequency control method for the microgrid system, which is introduced in combination with the above points, has important significance in order to effectively cope with the change of system model parameters without considering the complexity of the model.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a micro-grid frequency control method and system based on a depth certainty strategy gradient, aiming at providing a frequency control method of a novel micro-grid system based on depth reinforcement learning, which has the characteristic that an intelligent body can continuously learn by adopting an off-line training mode without establishing an accurate model under the condition that the micro-grid system containing wind power, photoelectric equipment and energy storage equipment with randomness and intermittence cannot be accurately modeled.
In order to achieve the above object, in one aspect, the present invention provides a method for controlling a frequency of a micro-grid based on a depth deterministic strategy gradient, which is an action decision method for performing frequency control by using a dual-delay depth deterministic strategy gradient algorithm training agent to replace a conventional controller, and specifically includes the following steps:
the method is characterized in that a new energy environment comprising a wind driven generator, a photovoltaic power generation system and a battery energy storage system is modeled, the environment is set up in order to obtain data of the system as input of an intelligent agent and obtain the feedback state of the system environment as the quality of the training degree of the intelligent agent, and the method is model-free. Defining a training environment corresponding to a micro-grid simulation model under an AC framework, measuring frequency deviation (difference value between a current frequency value and a system nominal frequency value) of the micro-grid system and integral of the frequency deviation through a sensor to carry out state observation, taking the state observation as observation input of an intelligent agent, and training the intelligent agent by adopting a double-delay depth deterministic strategy gradient algorithm.
The trained intelligent agent is applied to a micro-grid system with new energy, the state information of the current micro-grid system is input into a performer critic frame, the strategy network in the micro-grid system is guaranteed to select the optimal action under the evaluation of the Q value of the evaluation network, the optimal action is converted into an actual instruction through a back-end controller and is used for the opening of a regulator valve of a synchronous generator, the micro-grid system is controlled to restore the balance again when the power is unbalanced, the frequency is kept within the specified range of 50 +/-0.2 Hz, and the stable operation of the system is guaranteed.
Further, in order to ensure the derivation correctness of the steps of the method, the method is established on the basis of the following mathematical theory:
the markov decision process is the basic mathematical model for reinforcement learning. MDP can be represented as tuples
Figure BDA0003599120100000031
Wherein
Figure BDA00035991201000000312
Is a space of states that is,
Figure BDA0003599120100000032
is the space of the motion, and the motion space,
Figure BDA0003599120100000033
is a transfer function which gives each state
Figure BDA0003599120100000034
Transition to the probability matrix of the other state,
Figure BDA0003599120100000035
is the reward function, and 0 ≦ γ ≦ 1 is the discount factor.
The policy of the agent is a mapping from states to selection probabilities for each action:
Figure BDA0003599120100000036
usually denoted as pi. Strong strength (S)The goal of learning is to maximize long-term return GtTo search for an optimization strategy pi*I.e. by
Figure BDA0003599120100000037
Wherein k is 1,2,3tIndicating the reward at time t which determines the return value including current and future. Provided γ < 1, R (k) is bounded, GtIs bounded.
The micro-grid frequency control problem obviously meets the Markov assumption, so that the micro-grid frequency control problem can be considered as a Markov decision process, the frequency deviation of a system and the integral of the frequency deviation of the system are used as state observation values, the action output by an intelligent body is the finally selected control action, in order to enable the whole system to continuously operate, a novel micro-grid system with new energy power generation equipment is built, the observation values are used as the input of the intelligent body, and the output action of the intelligent body is applied to the opening of a regulator valve of a synchronous generator.
Furthermore, the invention adopts a deep reinforcement learning method based on an AC framework, wherein the AC framework comprises two networks, one is a strategy network expressed as pi, and the other is an evaluation network expressed as Qπ(s, a) represents, defined as:
Figure BDA0003599120100000038
wherein
Figure BDA0003599120100000039
Representing the expected value of the random variable at a given strategy pi, and t representing the time step, 0.01s is selected. At this stage, the state action cost function Q (s, a) is calculated by the Time Difference (TD) method, i.e., Q(s)t,at)←Q(st,at)+δtWherein
Figure BDA00035991201000000310
Is the error in the TD and is,
Figure BDA00035991201000000311
the TD error can be used to evaluate the last selected action a. Based on the above tableA performer reviews a family framework, and aiming at the characteristic that the frequency control problem of a micro-grid system is a continuous action space, the invention trains an intelligent agent by adopting a double-delay depth certainty strategy gradient algorithm. The algorithm comprises the following specific steps:
1) initializing the parameters θ of the two evaluation master networksi(i ═ 1,2), and a policy master network parameter Φ;
2) initializing evaluation target network thetai'(i ═ 1,2) and a policy target network Φ';
3) initializing the experience pool (storing past learning experience, sampling the data in the experience pool, having more data to train the subsequent steps or update the network parameters)
4) And (3) cyclic training:
a. observing the state s ═ Δ f, [ integral ] Δ f of the current system through a sensor, giving the state s ═ Δ f to an agent, and performing the action a ═ Δ P through an attenuated [ epsilon ] -greedy strategy under the strategy pic) The action is selected and applied to the current system, the state s '═ Δ f' of the current system at the next moment after the current action is taken is continuously observed, and the reward value given to the agent by the action is calculated through the reward function.
b. Carrying out batch processing random sampling in an experience pool, and updating and evaluating parameters of the main network by a minimization loss function, wherein the loss function formula is as follows:
Figure BDA0003599120100000041
where γ represents a discount factor and Q '(s', pi (s '| Φ)) represents a state action value that can be obtained by taking action a' ═ pi (s '| Φ) by strategy pi in the s' ═ Δ f ', (Δ f') state.
c. If the current loop step is a multiple of 2, gradient by deterministic strategy
Figure BDA0003599120100000042
Updating parameters of the strategy network and the target evaluation network. The updating mode is soft updating:
φ′←τφ+(1-τ)φ′
θi′←τθi+(1-τ)θi
after training is finished, whether the intelligent agent explores the optimal action or not is judged by observing the convergence condition of the reward function. And then putting the strategy learned by the intelligent agent into a novel micro-grid system simulation environment, observing the frequency change condition under various disturbances, and verifying the control effect of the intelligent agent.
In the micro-grid frequency control method based on the depth certainty strategy gradient, a frequency control method without a model is provided, actions are continuously explored through offline training, actions are continuously learned until a proper action decision method is learned, and a proper control action is made when the frequency of the system changes.
Further, strategy pi of strategy network in performer critic frame adopts optimal strategy pi of action with maximum evaluation function Q*Defined as:
π*=argmin(Q1(s,a|θ1),Q2(s,a|θ2))
wherein Qi(s,a|θi) I ═ 1,2 show that the action a taken in the master network pair state s is evaluated to yield the maximum state action value, argmin (Q)1,Q2) Function representation Q1,Q2The smaller value of (d).
Furthermore, the attenuated epsilon-greedy strategy adopts a larger epsilon at the initial training stage to ensure the exploratory property of the action, gradually attenuates the epsilon at the later training stage to ensure the selection of the optimal action, and the specific attenuation mode is as follows:
ε←0.99*ε
epsilon represents the greedy policy coefficient;
the exploration noise is Gaussian distribution noise with a pruning function:
Figure BDA0003599120100000051
wherein clip represents the pruning function for Gaussian noise
Figure BDA0003599120100000052
Pruning is performed so that the final noise range is kept at (-c, c).
In another aspect, the present invention provides a micro-grid frequency control system based on a depth deterministic strategy gradient, including: a computer-readable storage medium and a processor;
the computer readable storage medium is used for storing executable instructions;
the processor is used for reading executable instructions stored in the computer-readable storage medium and executing the method for controlling the frequency of the microgrid based on the depth deterministic strategy gradient.
Through the technical scheme, compared with the prior art, the invention has the following beneficial effects:
1. in the traditional frequency control method, mathematical modeling needs to be carried out on a power system, the method is improved and utilized on the basis of the traditional DDPG, the phenomenon that Q estimation is too high is avoided by utilizing a double-evaluation network, a target network and an action network are updated by utilizing delay, the exploratory property and the utilization property of actions are ensured, complex mathematical models which contain various power generation equipment, electrical elements and the like and are difficult to model do not need to be considered, and the related control method has higher adaptability;
2. compared with the existing control method, the method provided by the invention is used for learning based on the observable state of the system, and the algorithm adopts a strategy gradient algorithm, so that the problem of continuous action space is well solved, and the trained intelligent agent has better dynamic performance when being put into the power system environment for frequency control.
3. Compared with the traditional control method, the method can save the calculation force for designing the controller, and particularly does not need to consider the change condition of the model when the model in the power grid is changed greatly.
Drawings
FIG. 1 is a diagram of a microgrid system including new energy devices according to the present invention;
FIG. 2 is a detailed block diagram of a policy network in the agent training process provided by the present invention;
FIG. 3 is a detailed block diagram of an evaluation network during the agent training process provided by the present invention;
FIG. 4 is a block diagram of a dual delay deterministic policy gradient algorithm provided by the present invention;
FIG. 5 is a diagram of a reward function for the agent training process of the present invention;
FIG. 6 is a graph of the frequency change of the inventive control method under a step disturbance;
FIG. 7 is a continuous perturbative graph of a simulation environment according to the present invention;
fig. 8 is a frequency variation diagram of the control method provided by the present invention under the condition of continuous disturbance.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides a micro-grid frequency control method based on depth certainty strategy gradient, which is an action decision method for carrying out frequency control by replacing a traditional controller with a double-delay depth certainty strategy gradient algorithm training intelligent agent, and specifically comprises the following steps:
the method is characterized in that a new energy environment comprising a wind driven generator, a photovoltaic power generation system and a battery energy storage system is modeled, the environment is set up in order to obtain data of the system as input of an intelligent agent and obtain the feedback state of the system environment as the quality of the training degree of the intelligent agent, and the method is model-free. Defining a training environment corresponding to a micro-grid simulation model under an AC framework, measuring the frequency deviation (the difference between the current frequency value and the system nominal frequency value) of the micro-grid system and the integral of the frequency deviation through a sensor to carry out state observation, taking the state observation as the observation input of the intelligent agent, and training the intelligent agent by adopting a double-delay depth deterministic strategy gradient algorithm.
The trained intelligent agent is applied to a micro-grid system with new energy, the state information of the current micro-grid system is input into a performer critic frame, the strategy network in the micro-grid system is guaranteed to select the optimal action under the evaluation of the Q value of the evaluation network, the optimal action is converted into an actual instruction through a back-end controller and is used for the opening of a regulator valve of a synchronous generator, the micro-grid system is controlled to restore the balance again when the power is unbalanced, the frequency is kept within the specified range of 50 +/-0.2 Hz, and the stable operation of the system is guaranteed.
Fig. 1 is a structural diagram of a microgrid system containing new energy devices, and a specific mathematical model of the microgrid system is as follows.
(1) The wind power generator model is as follows:
Figure BDA0003599120100000071
wherein G isWTGIs the transfer function of the wind turbine, KWTGIs the gain constant, T, of the wind turbineWTGIs the wind time constant, Δ PwindAnd Δ PWTGRespectively, the deviation of the wind turbine input and output.
(2) Photovoltaic power generation model:
Figure BDA0003599120100000072
wherein G isPVIs the transfer function of photovoltaic power generation, KPVIs the gain constant, T, of photovoltaic power generationPVIs the photovoltaic power generation time constant, Δ PpvAnd Δ PPVRespectively, the deviation of the photovoltaic power generation input and output.
(3) The battery energy storage model is as follows: the battery energy storage system has a charging mode and a discharging mode, and the transfer function is 1/(1+ sT)BES) Is a simplified description of a battery energy storage system, where Δ PBESIs the deviation in power.
(4) Synchronous generator model: from governor-turbine model 1/(1+ sT)g)(1+sTt) Is represented by the formula, wherein TgAnd TtThe governor time constant and the turbine time constant, respectively. The generator is controlled by a control action Δ PcAnd controlling the opening of the valve to regulate and control.
The detailed hyper-parameter design in the intelligent agent training process is as follows:
the AC framework according to the present invention is to approximate the mapping relationship of data by using a neural network, and the policy a network and the evaluation C network refer to fig. 2 and 3, respectively. Fig. 2 shows the main structure of the policy network, where the frequency deviation and its integral s ═ Δ f, — (Δ f, — Δ f) are used as input values of input neurons, and after passing through 32 hidden layer neurons, action Δ P is controlled by 1 output neuron outputc. FIG. 3 shows the main structure of the evaluation network, denoted by (Δ f, [ integral ] Δ f, Δ Pc) After 32, 16, 8 hidden neurons and three layers of relu activation functions are respectively used as input states of the input neurons, the output neurons output corresponding state action values Q. The main training process is shown in fig. 4:
1) initializing a parameter θ of two evaluation master networksi(i ═ 1,2), and a policy master network parameter Φ;
2) initializing evaluation target network parameter thetai'(i ═ 1,2) and a policy target network parameter phi';
3) initializing an experience pool (used for storing past learning experiences and sampling learning);
4) and (3) cyclic training:
a. observing the state s ═ f (Δ f,. DELTA.f) of the current system through a sensor, giving the state s ═ f to an agent, and selecting actions by adding exploration noise under a strategy pi to ensure the exploratory property of the actions. The state of the current system at the next time after the current action is taken, s ', is continuously observed (Δ f ', (:) Δ f ') and the reward value for the agent for this action is calculated.
b. Carrying out batch processing random sampling in an experience pool, and updating and evaluating parameters of a main network by a minimum loss function, wherein the loss function formula is as follows:
Figure BDA0003599120100000081
where γ represents a discount factor and Q '(s', pi (s '| Φ)) represents a state action value that can be obtained by taking action a' ═ pi (s '| Φ) by strategy pi in the s' ═ Δ f ', (Δ f') state.
c. If the current training times are multiples of 2, passing the deterministic strategy gradient
Figure BDA0003599120100000091
Updating parameters of the strategy network and the target evaluation network. The updating mode is soft updating:
φ′←τφ+(1-τ)φ′
θi′←τθi+(1-τ)θi
other hyper-parametric designs are shown in table 1.
TABLE 1
Hyper-parameter Appropriate value
Maximum number of exercises 500
Maximum training step/time 1000
Empirical pool size 64000000
Empirical pool batch size 64
Learning rate of policy network 0.0001
Evaluating learning rate of a network 0.0001
Discount factor 0.98
Repeating the above training steps until the training is finished, observing the reward function and performing convergence analysis, as shown in fig. 5, when the training steps reach 450 times, it can be seen that the reward value has converged to about 0, and analyzing the reward function R ═ en|Δf|It can be seen that when the control target Δ f → 0, the reward value is 0, so fig. 5 shows that the reward function has converged, i.e. the agent policy reaches the optimal state.
After training is finished, the trained intelligent agent model is stored and put into a novel micro-grid system for real-time control so as to observe the control effect, and FIG. 6 shows that the trained intelligent agent can enable the frequency of the micro-grid system to be recovered and stabilized when wind power plants, photovoltaic power plants and load side disturbance occur respectively at 10s, 20s and 30 s; fig. 7 shows a continuous disturbance, which is a more practical embodiment of randomness and intermittency of energy storage devices such as wind power and photovoltaic devices existing in a novel power system, and in the case of the continuous disturbance, the frequency change of the system is as shown in fig. 8, and the frequency fluctuates continuously due to the continuous disturbance, but can be guaranteed to be 50 ± 0.1Hz under the continuous real-time control of the intelligent agent, and meets the frequency requirement range of 50 ± 0.2Hz specified by the power grid; in addition, fig. 8 also includes a comparison of the control effect of the present invention with that of other controllers under the same disturbance, and the present invention is superior to the conventional PI controller and H ∞ controller in dynamic characteristics such as overshoot, regulation time, and peak time.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. The method for controlling the frequency of the micro-grid based on the depth certainty strategy gradient is characterized by comprising the following steps of:
s1, defining a training environment corresponding to a microgrid simulation model under an AC framework of a performer critic, taking the measured frequency deviation and integral of the frequency deviation of a microgrid system as training data of an intelligent agent, and training the intelligent agent by adopting a double-delay depth certainty strategy gradient algorithm;
s2, applying the trained intelligent body to a micro-grid system with new energy, inputting state information of the current micro-grid system into a performer critic framework, ensuring that a strategy network in the framework selects the best action under the evaluation of the Q value of the evaluation network, converting the strategy network into an actual instruction through a rear controller to be used for the opening of a regulator valve of the synchronous generator, controlling the micro-grid system to restore balance again when the power is unbalanced, and maintaining the frequency within a preset range to ensure that the system runs stably.
2. The method of claim 1, wherein the training environment comprises:
and (3) setting the observation state quantity required by the intelligent agent: including frequency deviation Δ f and integral of frequency deviation ^ Δ f, the state space is represented as: s ═ Δ f, — (Δ f —) Δ f;
setting the action of the intelligent agent: the motion space is represented as: a ═ Δ Pc),ΔPcIndicating a valve opening to the synchronous generator regulator;
setting a reward function of the intelligent agent in a training process: r ═ en|Δf|Wherein e is a natural index, n is a constant term, | Δ f | represents the system frequencyAbsolute value of rate deviation;
establishing a state action value evaluation function:
Figure FDA0003599120090000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003599120090000012
gamma is a discounting factor and k is a constant term.
3. The method of claim 2, wherein the training of the agent using the dual delay depth deterministic policy gradient algorithm comprises the steps of:
s11, initializing parameter theta of two evaluation main networksiAnd a policy master network parameter phi, i ═ 1,2 with random parameters, initializing the evaluation target network thetai'and strategy target network phi', initializing an experience pool; the strategy network comprises a strategy main network and a strategy target network, the evaluation network comprises an evaluation main network and an evaluation target network, the main network comprises an evaluation main network 1 and an evaluation main network 2, the evaluation target network comprises an evaluation target network 1 and an evaluation target network 2, the evaluation network adopts a double Q network, and a network with a smaller Q value in the two networks is selected during calculation;
s12, observing the state s of the current system, namely (delta f, integral delta f), giving the state s to an agent, adding exploration noise under a strategy pi of a strategy network in an AC framework, and then adopting an attenuated epsilon-greedy strategy to perform action a, namely (delta P)c) Observing the state of the current system at the next time after the current action is taken, and calculating the reward value for this action, a set of data (Δ f, [ integral ] Δ f, Δ f ', [ integral ] Δ f', Δ P)cR) is stored in a pool of experience, where r is the reward value calculated by the reward function, Δ PcThe specific action value of the opening of the currently selected control valve is obtained;
s13. carry out batch random sampling in experience pool by sampling state set s ═ Δ f and action set a ═ Δ Pc) As input to evaluate the host network, an output Q (s, a | θ) is obtainedi) And by minimizing error equations
Figure FDA0003599120090000021
Updating parameters that evaluate the primary network, where γ represents a discounting factor, and Q '(s', pi (s '| phi)) represents a state action value that can be obtained by taking action a' ═ pi (s '| phi) by policy pi in the s' ═ (Δ f ', _ Δ f') state;
s14, gradient by deterministic strategy
Figure FDA0003599120090000022
Updating the parameters of the strategy main network so as to update the parameters of the strategy target network and the evaluation target network:
φ′←τφ+(1-τ)φ′
θi′←τθi+(1-τ)θi
where τ is the update parameter.
4. The method as claimed in claim 3, wherein the strategy π of the strategy network in the performer critic frame in S12 adopts an optimal strategy π of evaluating the action of the function Qmax*Defined as:
π*=argmin(Q1(s,a|θ1),Q2(s,a|θ2))
wherein Q isi(s,a|θi) And i is 1 and 2, the action a taken in the state s of the evaluation main network pair is calculated to obtain the maximum state action value argmin (Q)1,Q2) Function representation Q1,Q2Medium and small values.
5. The method according to claim 3, wherein the attenuated epsilon-greedy strategy adopts epsilon at the initial training stage to ensure exploratory performance of actions, gradually attenuates epsilon at the later training stage to ensure selection of optimal actions, and the specific attenuation mode is as follows:
ε←0.99*ε
epsilon represents the greedy policy coefficient;
the exploration noise is Gaussian distribution noise with a pruning function:
Figure FDA0003599120090000031
wherein clip represents a pruning function for Gaussian noise
Figure FDA0003599120090000032
Pruning is performed so that the final noise range remains at (-c, c).
6. A microgrid frequency control system based on a depth deterministic strategy gradient is characterized by comprising: a computer-readable storage medium and a processor;
the computer readable storage medium is used for storing executable instructions;
the processor is used for reading executable instructions stored in the computer-readable storage medium and executing the method for controlling the frequency of the microgrid based on the depth deterministic strategy gradient according to any one of claims 1 to 5.
CN202210399513.9A 2022-04-15 2022-04-15 Micro-grid frequency control method and system based on depth certainty strategy gradient Pending CN114784823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210399513.9A CN114784823A (en) 2022-04-15 2022-04-15 Micro-grid frequency control method and system based on depth certainty strategy gradient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210399513.9A CN114784823A (en) 2022-04-15 2022-04-15 Micro-grid frequency control method and system based on depth certainty strategy gradient

Publications (1)

Publication Number Publication Date
CN114784823A true CN114784823A (en) 2022-07-22

Family

ID=82429223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210399513.9A Pending CN114784823A (en) 2022-04-15 2022-04-15 Micro-grid frequency control method and system based on depth certainty strategy gradient

Country Status (1)

Country Link
CN (1) CN114784823A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809597A (en) * 2022-11-30 2023-03-17 东北电力大学 Frequency stabilization system and method for reinforcement learning emergency DC power support
CN115903457A (en) * 2022-11-02 2023-04-04 曲阜师范大学 Low-wind-speed permanent magnet synchronous wind driven generator control method based on deep reinforcement learning
CN116345578A (en) * 2023-05-26 2023-06-27 南方电网数字电网研究院有限公司 Micro-grid operation optimization scheduling method based on depth deterministic strategy gradient
CN116436029A (en) * 2023-03-13 2023-07-14 华北电力大学 New energy station frequency control method based on deep reinforcement learning
CN117477607A (en) * 2023-12-28 2024-01-30 国网江西综合能源服务有限公司 Three-phase imbalance treatment method and system for power distribution network with intelligent soft switch

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115903457A (en) * 2022-11-02 2023-04-04 曲阜师范大学 Low-wind-speed permanent magnet synchronous wind driven generator control method based on deep reinforcement learning
CN115903457B (en) * 2022-11-02 2023-09-08 曲阜师范大学 Control method of low-wind-speed permanent magnet synchronous wind driven generator based on deep reinforcement learning
CN115809597A (en) * 2022-11-30 2023-03-17 东北电力大学 Frequency stabilization system and method for reinforcement learning emergency DC power support
CN115809597B (en) * 2022-11-30 2024-04-30 东北电力大学 Frequency stabilization system and method for reinforcement learning of emergency direct current power support
CN116436029A (en) * 2023-03-13 2023-07-14 华北电力大学 New energy station frequency control method based on deep reinforcement learning
CN116436029B (en) * 2023-03-13 2023-12-01 华北电力大学 New energy station frequency control method based on deep reinforcement learning
CN116345578A (en) * 2023-05-26 2023-06-27 南方电网数字电网研究院有限公司 Micro-grid operation optimization scheduling method based on depth deterministic strategy gradient
CN116345578B (en) * 2023-05-26 2023-09-15 南方电网数字电网研究院有限公司 Micro-grid operation optimization scheduling method based on depth deterministic strategy gradient
CN117477607A (en) * 2023-12-28 2024-01-30 国网江西综合能源服务有限公司 Three-phase imbalance treatment method and system for power distribution network with intelligent soft switch
CN117477607B (en) * 2023-12-28 2024-04-12 国网江西综合能源服务有限公司 Three-phase imbalance treatment method and system for power distribution network with intelligent soft switch

Similar Documents

Publication Publication Date Title
Hua et al. Optimal energy management strategies for energy Internet via deep reinforcement learning approach
CN114784823A (en) Micro-grid frequency control method and system based on depth certainty strategy gradient
CN110535146B (en) Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning
CN112186743B (en) Dynamic power system economic dispatching method based on deep reinforcement learning
CN113363998B (en) Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN114362196B (en) Multi-time-scale active power distribution network voltage control method
Yi et al. An improved two-stage deep reinforcement learning approach for regulation service disaggregation in a virtual power plant
CN113935463A (en) Microgrid controller based on artificial intelligence control method
CN115986728A (en) Power grid situation prediction method considering uncertainty factors and terminal
CN116760047A (en) Power distribution network voltage reactive power control method and system based on safety reinforcement learning algorithm
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
CN115345380A (en) New energy consumption electric power scheduling method based on artificial intelligence
CN115588998A (en) Graph reinforcement learning-based power distribution network voltage reactive power optimization method
Zhang et al. Physical-model-free intelligent energy management for a grid-connected hybrid wind-microturbine-PV-EV energy system via deep reinforcement learning approach
CN117200213A (en) Power distribution system voltage control method based on self-organizing map neural network deep reinforcement learning
CN115001002B (en) Optimal scheduling method and system for solving problem of energy storage participation peak clipping and valley filling
CN114048576B (en) Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid
CN116565876A (en) Robust reinforcement learning distribution network tide optimization method and computer readable medium
CN114400675B (en) Active power distribution network voltage control method based on weight mean value deep double-Q network
Li et al. Multiagent deep meta reinforcement learning for sea computing-based energy management of interconnected grids considering renewable energy sources in sustainable cities
CN111639463B (en) XGboost algorithm-based frequency characteristic prediction method for power system after disturbance
Kang et al. Power flow coordination optimization control method for power system with DG based on DRL
CN117394461B (en) Supply and demand cooperative regulation and control system and method for comprehensive energy system
CN117973644B (en) Distributed photovoltaic power virtual acquisition method considering optimization of reference power station

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination