CN116247742A - Power system source network load storage joint regulation and control method and device based on deep reinforcement learning - Google Patents

Power system source network load storage joint regulation and control method and device based on deep reinforcement learning Download PDF

Info

Publication number
CN116247742A
CN116247742A CN202310233509.XA CN202310233509A CN116247742A CN 116247742 A CN116247742 A CN 116247742A CN 202310233509 A CN202310233509 A CN 202310233509A CN 116247742 A CN116247742 A CN 116247742A
Authority
CN
China
Prior art keywords
reinforcement learning
storage
source
power system
scheduling scheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310233509.XA
Other languages
Chinese (zh)
Inventor
龙云
梁雪青
刘璐豪
卢有飞
吴任博
张扬
赵宏伟
陈明辉
张少凡
邹时容
蔡燕春
刘璇
汪希玥
柯德平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202310233509.XA priority Critical patent/CN116247742A/en
Publication of CN116247742A publication Critical patent/CN116247742A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a power system source network load storage joint regulation and control method, a device, computer equipment and a storage medium based on deep reinforcement learning, wherein the method comprises the following steps: according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation; designing a reinforcement learning architecture of source network and load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source load storage scheduling scheme through learning to realize the source network and load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards. The invention designs a reinforcement learning architecture, and utilizes deep reinforcement learning to train an intelligent body, and learns to correct a source load storage scheduling scheme so as to eliminate inaccurate prediction and unbalanced power.

Description

Power system source network load storage joint regulation and control method and device based on deep reinforcement learning
Technical Field
The invention relates to the technical field of power dispatching, in particular to a power system source network load storage joint regulation and control method, a device, computer equipment and a storage medium based on deep reinforcement learning.
Background
With the development of a novel power system, the proportion of renewable energy sources at the source side is increased, and the randomness, the intermittence and the anti-peak shaving performance enable the power supply reliability to be lower than that of a conventional power supply; the peak output is emerging on the load side, and power imbalance caused by unexpected power changes places higher demands on the system regulation capability. The energy storage is used as a flexible load with bidirectional regulation capability, and more schedulable resources are provided for the power grid. How to perform joint scheduling on source network load storage, reserve enough margin and enough reserve for an uncertainty scene, so as to improve the economy and safety of the system, and is a current hot spot problem.
At present, the source network and the load storage are jointly scheduled to build a mathematical model, then a traditional optimization algorithm is used for solving an optimal solution, random planning or robust optimization is adopted when the source network and the load storage are subjected to randomness, the source network and the load storage are difficult to achieve proper balance between economy and robustness, and real-time decision faces the problem of low solving efficiency.
The deep reinforcement learning algorithm combines the excellent characterization capability of deep learning and the excellent decision capability of reinforcement learning, can effectively solve the sequential decision problem in a nonlinear and uncertain complex scene, and naturally meets the requirement of a novel power system on regulation and control. However, the reinforcement learning model is difficult to train, is very easy to train and fails to give effective decisions in the face of high-dimensional states and action spaces of the power system.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a deep reinforcement learning-based power system source network charge storage joint regulation and control method, a device, computer equipment and a storage medium, wherein a source charge storage scheduling scheme is primarily provided according to prediction information to serve as a basis for intelligent body regulation and control; and designing a reinforcement learning framework on the basis of a source load storage scheduling scheme, and learning by an intelligent agent according to an observable value of the environment on the basis of the basic scheme to correct the output of a unit, the wind and light abandoning and the energy storage charging and discharging so as to eliminate inaccurate prediction and the power unbalance caused by sudden accidents in actual operation.
The first aim of the invention is to provide a power system source network charge storage joint regulation and control method based on deep reinforcement learning.
The second aim of the invention is to provide a power system source network load storage joint regulation device based on deep reinforcement learning.
A third object of the present invention is to provide a computer device.
A fourth object of the present invention is to provide a storage medium.
The first object of the present invention can be achieved by adopting the following technical scheme:
a deep reinforcement learning-based power system source network charge storage joint regulation and control method, comprising:
according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
designing a reinforcement learning architecture of the network-load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source-load storage scheduling scheme through learning to realize the source-network-load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
Further, the line tide margin rewards are:
Figure BDA0004121205770000021
wherein n is L P is the number of the line connecting wires l For line l tide, P l,max Is the maximum value of the power flow of the line l.
Further, the energy storage rewards are:
Figure BDA0004121205770000022
wherein, SOC is the current charge capacity of energy storage, SOC min 、SOC max Respectively the minimum and maximum charge capacity of energy storage; p (P) b And (5) energy storage charging and discharging power is stored after the decision of the intelligent agent.
Further, the reinforcement learning architecture includes:
environmental state quantity: the method is an abstract representation of the condition presented by the current power system environment, and is information which can be acquired and required by an intelligent agent;
action space: aiming at the adjustable unit, the new energy unit and the energy storage equipment respectively;
state transition: the next environmental state is determined only by the current environmental state and the action performed;
bonus function: for guiding the training of the agent to make decisions in the direction of maximizing the jackpot.
Further, the environmental state quantity is at the power grid state s at the time t t The description is as follows:
s t =(P disp ,P g ,P w,fc ,P s,fc ,P l,fc ,ΔP up,max ,ΔP down,max ,S l ,SOC,ρ,ρ cur )
P g (t)=P G (t)+P disp (t)
ΔP up,max =min(P Gmax -P G (t+1),P G,rampup )
ΔP down,max =min(P G (t+1)-P Gmin ,P G,rampdown )
Figure BDA0004121205770000031
wherein P is disp (t) is a rescheduling value at the moment t and is the accumulation of rescheduling actions of the intelligent agent unit; p (P) G (t) is the output of an adjustable generator at the moment t in a day-ahead source load storage scheduling scheme; p (P) g (t) is the output of the unit at the moment t in the final scheme of superposition of the source load storage scheduling scheme and the actions of the agent; p (P) w,fc ,P s,fc ,P l,fc The predicted values of wind power, photovoltaic and load are respectively obtained; ΔP up,max ,ΔP down,max The maximum adjustment spaces are respectively upward and downward for the unit; s is S l Is a line state; SOC is the current charge capacity of energy storage; p (P) Gmax 、P Gmin The upper limit and the lower limit of the output of the unit are respectively; p (P) G,rampup 、P G,rampdown The upper limit and the lower limit of the climbing of the unit are respectively set; ρ is the line load factor; p (P) L Is the line tide; p (P) L,max Is the maximum value of the line tide; ρ cur The new energy reduction ratio is the accumulation value of the actions of the intelligent agent.
Further, for the agent part states and actions, there is a state transition relationship as follows:
Figure BDA0004121205770000032
Figure BDA0004121205770000033
P W =ρ cur *P W,max
P S =ρ cur *P S,max
P b =P B +ΔP B
wherein P is W 、P S 、P b The wind power, the photovoltaic and the energy storage actual output after decision making are respectively carried out.
Furthermore, the source charge storage scheduling scheme is solved before the day, and when the source charge storage scheduling scheme is used for scheduling in real time in the day, the trained intelligent body rapidly outputs action values including unit output correction quantity, wind power and photovoltaic reduction proportion and energy storage charge and discharge power according to the environmental state quantity provided in each time step, so that source charge storage joint regulation and control are realized.
Further, the environment is treated as follows in the training of the agent:
when the line load rate is not in the set range, the soft overload is considered, and the line is disconnected after m time steps; wherein m is a set value;
when the line load rate is greater than the maximum value of the set range, the hard overload is considered, and the line is immediately disconnected;
the automatic reconnection is carried out n time steps after the circuit is disconnected; wherein n is a set value, and n > m;
setting the probability of line breakage of each time step;
wherein the time step is different from the time step in the source payload scheduling scheme: the time interval/granularity is smaller. The second object of the invention can be achieved by adopting the following technical scheme:
a deep reinforcement learning-based power system source network charge storage joint regulation and control device, the device comprising:
the scheduling scheme generation module is used for preliminarily generating a source charge storage scheduling scheme with economy as a target according to the power system prediction information, wherein the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
the scheduling scheme correction module is used for designing a reinforcement learning architecture of the source network charge storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source charge storage scheduling scheme through learning to realize the source network charge storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
The third object of the present invention can be achieved by adopting the following technical scheme:
the computer equipment comprises a processor and a memory for storing a program executable by the processor, wherein the power system source network load storage joint regulation method is realized when the processor executes the program stored by the memory.
The fourth object of the present invention can be achieved by adopting the following technical scheme:
a storage medium storing a program, which when executed by a processor, implements the above-mentioned power system source network load storage joint regulation method.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a power system source network charge storage scheduling decision method by means of the excellent decision capability of traditional reinforcement learning on an uncertainty scene, and in order to improve the training efficiency of reinforcement learning agents, a source charge storage scheduling scheme is primarily given according to prediction information and is used as a basis for the adjustment and control of the agents; designing a reinforcement learning framework on the basis of a source load storage scheduling scheme, and training an intelligent body through deep reinforcement learning; the intelligent agent corrects the source load storage scheduling scheme according to the observable value of the environment so as to eliminate inaccurate prediction and the power unbalance caused by sudden accidents in actual operation. In addition, the rewarding function in the reinforcement learning architecture reserves enough margin and enough reserve for an uncertainty scene by defining line tide margin and energy storage charge and discharge rewarding guiding agent, so that the safety of system operation is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a power system source network load storage joint regulation method based on deep reinforcement learning in embodiment 1 of the present invention.
Fig. 2 is a schematic diagram of a power system source network load storage joint control method based on deep reinforcement learning according to embodiment 1 of the present invention.
Fig. 3 is a schematic diagram of a source payload scheduling scheme according to embodiment 1 of the present invention.
FIG. 4 is a schematic diagram of the agent control scheme of example 1 of the present invention.
Fig. 5 is a schematic diagram of an energy storage charging and discharging scheme of embodiment 1 of the present invention.
Fig. 6 is a block diagram of a power system source network load storage joint regulation device based on deep reinforcement learning according to embodiment 2 of the present invention.
Fig. 7 is a block diagram showing the structure of a computer device according to embodiment 3 of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention. It should be understood that the description of the specific embodiments is intended for purposes of illustration only and is not intended to limit the scope of the present application.
Example 1:
as shown in fig. 1 and 2, the power system source network load storage joint regulation and control method based on deep reinforcement learning provided in this embodiment includes the following steps:
s101, a source load storage scheduling scheme is initially generated as a basic scheme according to prediction information.
According to the method, a source charge storage scheduling scheme is initially generated through mathematical modeling according to prediction information, and the source charge storage scheduling scheme is used as a basic scheme for subsequent correction.
The power system arranges the power output and energy storage charging and discharging scheme of the unit according to grid information, new energy and load prediction information, and according to the start-stop arrangement of the unit, with economy as a main target and 1h as a time interval.
(1) And controlling the target.
The control targets are as follows:
Figure BDA0004121205770000051
wherein C is G (t) is the total cost of the adjustable generator set at time t; c (C) B And (t) is the total scheduling cost at the time of energy storage t.
(1-1) Adjustable Generator Total cost:
C G (t)=aP G (t) 2 +bP G (t)+c
wherein P is G And (t) is the output of the adjustable generator at the moment t, and a, b and c are the comprehensive cost coefficients of the generator respectively.
(1-2) energy storage scheduling total cost.
In this embodiment, an energy storage system formed of a storage battery is taken as an example, and cost analysis is performed. The total cost comprises maintenance and loss cost, and the whole and the charge and discharge power are in secondary relation:
C B (t)=ω B P B (t) 2
wherein: p (P) B (t) is the charge and discharge power of the energy storage, the value of the charge and discharge power is positive, and the energy storage is in a charging state and can be regarded as a load; otherwise, the power generation device is in a discharge state and can be regarded as power generation equipment; omega B Is a cost factor of the stored energy output.
(2) Constraint conditions.
(2-1) active power balance constraint:
P w (t)+P s (t)+P G (t)=P l (t)+P B (t)
wherein: p (P) w (t)、P s (t)、P l And (t) wind power, photovoltaic power and load prediction power are divided.
(2-2) adjustable unit output and climbing constraint:
P Gmin <P G (t)<P Gmax
P G,rampdown <P G (t)-P G (t-1)<P G,rampup
wherein: p (P) Gmax 、P Gmin Respectively the upper and lower limits of the output of the unit, P G,rampup 、P G,rampdown The upper limit and the lower limit of the climbing of the unit are respectively defined.
(2-3) energy storage charge and discharge limit value:
P B,pro <P B (t)<P B,ab
wherein P is B,pro To store maximum discharge power, P B,ab And storing the maximum charging power.
(2-4) energy storage operation constraint:
Figure BDA0004121205770000061
Figure BDA0004121205770000062
wherein: SOC (t) is the charge capacity of energy storage, E (t) is the energy storage energy value at time t, E max For storing the maximum energy value. The state value of the energy storage charge capacity is related to the energy storage output at the last moment; η (eta) ab 、η pro Respectively the charge and discharge efficiency of the energy storage. Meanwhile, when the energy storage is operated, due to capacity limitation, the upper limit and the lower limit of the SOC are constrained as shown in the following formula; meanwhile, in order to enable the next optimization to be continuous, the energy storage is required to be consistent with the initial value at the end of the optimization period, and the following constraint exists:
SOC min ≤SOC(t)≤SOC max
Soc(0)=Soc(T)
wherein SOC is min 、SOC max Respectively the minimum and maximum charge capacity of energy storage; t is the optimal run length, which in this example is referred to as 24 hours on day 1.
(2-5) rotating the standby constraint.
In order to ensure that the system has certain spare capacity to cope with the occurrence of an uncertainty scene in the day, the rotation spare constraint needs to be satisfied during the running:
Figure BDA0004121205770000071
wherein epsilon is the rotational reserve rate.
The embodiment aims at economy, realizes power balance, and generates a scheduling scheme of 1h time steps by a mathematical modeling method as a basic value of each scheduling unit. The time interval in the embodiment is not limited to 1h.
The modeling method in the embodiment is a basic scheme of unit output and energy storage charge and discharge, and a deterministic modeling method is adopted, so that a reference is provided for intelligent body training. Other algorithms such as random planning, robust optimization and the like are adopted, so that the design and training of the reinforcement learning framework are not affected.
S102, designing a reinforcement learning framework of source network load storage joint scheduling.
And (3) correcting the scheduling scheme generated by the data modeling by training an intelligent agent through a reinforcement learning algorithm to obtain a source network load storage joint scheduling decision scheme with a time interval of 5 minutes. The scheduling scheme generated in step S101 is hereinafter referred to as a base scheme.
According to the requirements formed by five-tuple (S, A, P, R, gamma) of the Markov decision process in reinforcement learning, the power system environment modeling focuses on the setting of the state S, the action A, the reward R and the discount factor gamma. In this embodiment, in a specific embodiment, a continuous decision of 5min time intervals within a day is considered, that is, 288 steps in each scene, and an appropriate γ=0.99 is selected.
(1) Environmental state quantity.
The environment state quantity is an abstract representation of the situation presented by the current power system environment, and is information which can be acquired and required by an intelligent agent. Grid state s at time t t Can be described as:
s t =(P disp ,P g ,P w,fc ,P s,fc ,P l,fc ,ΔP up,max ,ΔP down,max ,S l ,SOC,ρ,ρ cur )
P g (t)=P G (t)+P disp (t)
ΔP up,max =min(P Gmax -P G (t+1),P G,rampup )
ΔP down,max =min(P G (t+1)-P Gmin ,P G,rampdown )
Figure BDA0004121205770000081
wherein P is disp Is P disp In short terms of (t),representing the rescheduling value at the time t, which is the accumulation of rescheduling actions of the intelligent agent unit, P g Is P g The shorthand of (t) indicates the output of the unit at the moment t, (note that the subscript is the uppercase G, namely the basic scheduling scheme, and the lowercase G is the final scheme of superposition of the basic scheme and the actions of the agent); p (P) w,fc ,P s,fc ,P l,fc Respectively wind power, photovoltaic and load predicted value and delta P up,max ,ΔP down,max The maximum adjustment spaces are respectively upward and downward for the unit; s is S l The state of the line is 0-1 variable, 0 is line disconnection, and l is line connection. SOC is the current charge capacity of energy storage, ρ is the line load rate, P L For the flow of lines, P L,max Is the maximum value of the line tide; ρ cur The new energy reduction ratio is the accumulation value of the actions of the intelligent agent.
(2) An action space.
The action is a control variable of the current time step, and the action space defined in the embodiment comprises three types, namely an adjustable unit, a new energy unit and energy storage equipment.
a t =[ΔP disp ,Δρ cur ,ΔP B ]
Wherein: ΔP disp For rescheduling power of a unit, Δρ cur Is a new energy unit, comprises the reduction proportion of wind power and photovoltaic, delta P B And rescheduling the output for the stored energy.
(3) And (5) state transition.
From the markov decision process definition, the next environmental state is determined only by the current environmental state and the action performed:
Figure BDA0004121205770000082
Figure BDA0004121205770000083
p represents the state s as the state transition function of the environment t Take action a =s t After =a state transitions to the next state s t+1 Probability of =s'. Because various uncertainties and high nonlinearity exist in the power system, the environment, namely the state transfer process, of interaction between the reinforcement learning agent and the power system is formed by a power flow simulator, and the power flow simulator calculates power flow of a power grid, outputs line power, line current, unit output and the like according to the environment state and the adjustment quantity given by the agent, and feeds back rewards.
For the agent part states and actions, the following state transition relationships exist:
Figure BDA0004121205770000091
Figure BDA0004121205770000092
P W =ρ cur *P W,max
P S =ρ cur *P S,max
P b =P B +ΔP B
wherein P is W 、P S 、P b The wind power, the photovoltaic and the energy storage actual output after decision making are respectively carried out.
(4) A bonus function.
Rewards r t For guiding the training of the agent to make decisions in the direction of maximizing the jackpot. The goal of daily security is met by designing a reward function.
(4-1) setting a positive reward +1 for each time step survivor for basic power supply and power balance;
(4-2) taking system line margin (only connecting lines are considered) into consideration, setting line load rate rewards, wherein the value range is [0,1], and the smaller the line load rate is, the larger the obtained rewards are:
Figure BDA0004121205770000093
wherein, the liquid crystal display device comprises a liquid crystal display device,n L for the number of connecting lines, P l For line l tide, P l,max Is the maximum value of the power flow of the line l.
(4-3) in order to encourage the energy storage to have sufficient reserve, i.e., discharge when the charge capacity is large, charge when the charge capacity is small, set an energy storage reward:
Figure BDA0004121205770000094
the rewards obtained in a single time step are the sum of the three rewards.
In the embodiment, the rewarding function guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining the line load flow margin and the energy storage charging and discharging rewards, so that the safety of system operation is improved.
S103, utilizing an intra-day intelligent agent to assist in regulating and controlling the basic scheme.
(1) And training an intelligent body.
The agent needs to make an effective decision in various uncertainty environments, and in order to enable the strategy generated by the agent to be oriented to the uncertainty environments, the agent has certain robustness, so that the operation safety of the power system is ensured, and in the embodiment, the following treatment is performed on the environments:
(1-1) when the line load rate is greater than 1 and less than 1.35, the soft overload is considered to be disconnected after 4 time steps (namely 20 min);
(1-2) when the line load rate is greater than 1.35, considering that the hard overload exists, and immediately disconnecting;
(1-3) automatically reconnecting 16 time steps after the line is disconnected;
(1-4) each time step has a probability of 1% that a line disconnection occurs;
the time interval of 5min is taken as one time step, 288 steps are taken in total, and one day is taken as one training round. And (3) following a reinforcement learning algorithm and a training process, and training the intelligent body for a plurality of rounds.
(2) The intelligent agent assists in the regulation and control process.
The basic scheme can be solved before the day, and when the intelligent agent is scheduled in real time in the day, the trained intelligent agent can provide environment state quantity according to each time step and rapidly output action values, wherein the action values comprise machine set output correction quantity, wind power/photovoltaic reduction proportion and energy storage charge and discharge power, so that source network charge and storage joint regulation and control are realized.
In the embodiment, the intelligent agent is trained through a deep reinforcement learning method, safety is taken as a target, and the adjustment decision is learned and corrected.
In this embodiment, the test system selects an improved ieee 14 node system, which includes a renewable energy unit including wind power generation and photovoltaic power generation, and an energy storage system, and the maximum charge and discharge power is 15MW. The basic scheduling scheme is solved by adopting a Gurobi solver, and the reinforcement learning algorithm selects the SAC algorithm to train the intelligent agent. One scene is selected, as shown in fig. 2, a 1h resolution regulation and control scheme is obtained according to the step S101, a training-completed intelligent body is obtained according to the frame designed in the step S102, and the unit output and the energy storage charge and discharge can be continuously corrected on the basis of the basic scheme, as shown in fig. 3 and 4. The method has the advantages that the reinforcement learning method is adopted to realize the active correction mode for the power system source network load storage joint scheduling, and the basic scheme can be quickly corrected at 5min time step intervals in real time.
Those skilled in the art will appreciate that all or part of the steps in a method implementing the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium.
It should be noted that although the method operations of the above embodiments are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all illustrated operations be performed in order to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
Example 2:
as shown in fig. 6, the present embodiment provides a power system source network load storage joint regulation device based on deep reinforcement learning, which includes a scheduling scheme generating module 601 and a scheduling scheme correcting module 602, wherein:
the scheduling scheme generating module 601 is configured to initially generate a source load storage scheduling scheme according to the power system prediction information, with economy as a target, where the source load storage scheduling scheme is used as a basis for intelligent body regulation;
the scheduling scheme correction module 602 is configured to design a reinforcement learning architecture of the source network load storage joint scheduling, learn to correct a source load storage scheduling scheme by using deep reinforcement learning training agents with safety as a target, and implement the source network load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
Specific implementation of each module in this embodiment may be referred to embodiment 1 above, and will not be described in detail herein; it should be noted that, the apparatus provided in this embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules, so as to perform all or part of the functions described above.
Example 3:
the present embodiment provides a computer device, which may be a computer, as shown in fig. 7, and is connected through a system bus 701, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium 706 and an internal memory 707, where the nonvolatile storage medium 706 stores an operating system, a computer program, and a database, and the internal memory 707 provides an environment for the operating system and the computer program in the nonvolatile storage medium, and when the processor 702 executes the computer program stored in the memory, the power system source network load storage joint control method of the foregoing embodiment 1 is implemented as follows:
according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
designing a reinforcement learning architecture of the network-load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source-load storage scheduling scheme through learning to realize the source-network-load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
Example 4:
the present embodiment provides a storage medium, which is a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the power system source network load storage joint regulation method of the foregoing embodiment 1, and is as follows:
according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
designing a reinforcement learning architecture of the network-load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source-load storage scheduling scheme through learning to realize the source-network-load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
The computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The above-mentioned embodiments are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can make equivalent substitutions or modifications according to the technical solution and the inventive concept of the present invention within the scope of the present invention disclosed in the present invention patent, and all those skilled in the art belong to the protection scope of the present invention.

Claims (10)

1. The utility model provides a power system source network lotus storage joint regulation and control method based on deep reinforcement learning, which is characterized in that the method comprises the following steps:
according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
designing a reinforcement learning architecture of the network-load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source-load storage scheduling scheme through learning to realize the source-network-load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
2. The power system source network load storage joint regulation method according to claim 1, wherein the line load flow margin rewards are:
Figure FDA0004121205760000011
wherein n is L P is the number of the line connecting wires l For line l tide, P l,max Is the maximum value of the power flow of the line l.
3. The power system source network charge storage joint regulation and control method according to claim 1, wherein the energy storage rewards are:
Figure FDA0004121205760000012
wherein, SOC is the current charge capacity of energy storage, SOC min 、SOC max Respectively the minimum and maximum charge capacity of energy storage; p (P) b And (5) energy storage charging and discharging power is stored after the decision of the intelligent agent.
4. The power system source network charge storage joint regulation method of claim 1, wherein the reinforcement learning architecture comprises:
environmental state quantity: the method is an abstract representation of the condition presented by the current power system environment, and is information which can be acquired and required by an intelligent agent;
action space: aiming at the adjustable unit, the new energy unit and the energy storage equipment respectively;
state transition: the next environmental state is determined only by the current environmental state and the action performed;
bonus function: for guiding the training of the agent to make decisions in the direction of maximizing the jackpot.
5. The power system source network charge storage combined regulation and control method according to claim 4, wherein the power grid state s of the environment state quantity at the time t t The description is as follows:
s t =(P disp ,P g ,P w,fc ,P s,fc ,P l,fc ,ΔP up,max ,ΔP down,max ,S l ,SOC,ρ,ρ cur )
P g (t)=P G (t)+P disp (t)
ΔP up,max =min(P Gmax -P G (t+1),P G,rampup )
ΔP down,max =min(P G (t+1)-P Gmin ,P G,rampdown )
Figure FDA0004121205760000021
wherein P is disp (t) is a rescheduling value at the moment t and is the accumulation of rescheduling actions of the intelligent agent unit; p (P) G (t) is the output of an adjustable generator at the moment t in a day-ahead source load storage scheduling scheme; p (P) g (t) is the output of the unit at the moment t in the final scheme of superposition of the source load storage scheduling scheme and the actions of the agent; p (P) w,fc ,P s,fc ,P l,fc The predicted values of wind power, photovoltaic and load are respectively obtained; ΔP up,max ,ΔP down,max The maximum adjustment spaces are respectively upward and downward for the unit; s is S l Is a line state; SOC is the current charge capacity of energy storage; p (P) Gmax 、P Gmin The upper limit and the lower limit of the output of the unit are respectively; p (P) G,rampup 、P G,rampdown The upper limit and the lower limit of the climbing of the unit are respectively set; ρ is the line load factor; p (P) L Is the line tide; p (P) L,max Is the maximum value of the line tide; ρ cur The new energy reduction ratio is the accumulation value of the actions of the intelligent agent.
6. The method for combined regulation and control of power system source network charge storage according to claim 5, wherein for the states and actions of the intelligent body parts, the following state transition relations exist:
Figure FDA0004121205760000022
Figure FDA0004121205760000023
P W =ρ cur *P W,max
P S =ρ cur *P S,max
P b =P B +ΔP B
wherein P is W 、P S 、P b The wind power, the photovoltaic and the energy storage actual output after decision making are respectively carried out.
7. The method for combined regulation and control of source network charge storage of a power system according to claim 5, wherein the source charge storage scheduling scheme is solved before the day, and when the power system is scheduled in real time in the day, the trained intelligent body rapidly outputs action values including unit output correction quantity, wind power and photovoltaic reduction ratio and energy storage charge and discharge power according to the environmental state quantity provided in each time step, so that combined regulation and control of source network charge storage is realized.
8. The power system source network charge storage joint regulation and control method according to any one of claims 1 to 7, wherein the environment is treated as follows in the training of an agent:
when the line load rate is not in the set range, the soft overload is considered, and the line is disconnected after m time steps; wherein m is a set value;
when the line load rate is greater than the maximum value of the set range, the hard overload is considered, and the line is immediately disconnected;
the automatic reconnection is carried out n time steps after the circuit is disconnected; wherein n is a set value, and n > m;
setting the probability of line breakage of each time step;
wherein the time step is smaller than the time step in the source payload scheduling scheme.
9. Deep reinforcement learning-based power system source network load storage joint regulation and control device is characterized by comprising:
the scheduling scheme generation module is used for preliminarily generating a source charge storage scheduling scheme with economy as a target according to the power system prediction information, wherein the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
the scheduling scheme correction module is used for designing a reinforcement learning architecture of the source network charge storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source charge storage scheduling scheme through learning to realize the source network charge storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
10. A storage medium storing a program, wherein the program when executed by a processor implements the power system source network load storage joint control method according to any one of claims 1 to 8.
CN202310233509.XA 2023-03-13 2023-03-13 Power system source network load storage joint regulation and control method and device based on deep reinforcement learning Pending CN116247742A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310233509.XA CN116247742A (en) 2023-03-13 2023-03-13 Power system source network load storage joint regulation and control method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310233509.XA CN116247742A (en) 2023-03-13 2023-03-13 Power system source network load storage joint regulation and control method and device based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116247742A true CN116247742A (en) 2023-06-09

Family

ID=86633026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310233509.XA Pending CN116247742A (en) 2023-03-13 2023-03-13 Power system source network load storage joint regulation and control method and device based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116247742A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117526443A (en) * 2023-11-07 2024-02-06 北京清电科技有限公司 Novel power system-based power distribution network optimization regulation and control method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117526443A (en) * 2023-11-07 2024-02-06 北京清电科技有限公司 Novel power system-based power distribution network optimization regulation and control method and system
CN117526443B (en) * 2023-11-07 2024-04-26 北京清电科技有限公司 Power system-based power distribution network optimization regulation and control method and system

Similar Documents

Publication Publication Date Title
Liu et al. Optimal sizing of a wind-energy storage system considering battery life
US20240097443A1 (en) Source-network-load-storage coordination dispatching method in background of coupling of renewable energy sources
CN112039069A (en) Double-layer collaborative planning method and system for power distribution network energy storage and flexible switch
CN110729770A (en) Active power distribution network load fault recovery strategy optimization algorithm
CN112542845B (en) Energy storage power station participating peak regulation control method considering frequency support capability
CN106992519A (en) A kind of network load based on information gap decision theory recovers robust Optimal methods
CN111786417A (en) Distributed new energy consumption-oriented active power distribution network multi-target interval optimization scheduling method
CN116247742A (en) Power system source network load storage joint regulation and control method and device based on deep reinforcement learning
CN112103941A (en) Energy storage configuration double-layer optimization method considering flexibility of power grid
CN115423207A (en) Wind storage virtual power plant online scheduling method and device
Zhang et al. Control strategy and optimal configuration of energy storage system for smoothing short-term fluctuation of PV power
Saraswat et al. Type-2 fuzzy logic PID control for efficient power balance in an AC microgrid
CN107919683A (en) A kind of energy storage reduces the Study on Decision-making Method for Optimization that wind power plant abandons wind-powered electricity generation amount
US20230344242A1 (en) Method for automatic adjustment of power grid operation mode base on reinforcement learning
El Bourakadi et al. Multi-agent system based on the fuzzy control and extreme learning machine for intelligent management in hybrid energy system
CN115411776B (en) Thermoelectric collaborative scheduling method and device for residence comprehensive energy system
CN116865270A (en) Optimal scheduling method and system for flexible interconnection power distribution network containing embedded direct current
CN106849157A (en) Energy mix system optimization method based on wind light mutual complementing
CN114362249B (en) Source load system countercurrent prevention control method and device and source load system
CN116307071A (en) Method for accessing high-proportion photovoltaic into low-voltage power distribution network
CN115954952A (en) Flexible resource planning method based on time sequence operation simulation
CN112731803B (en) Energy storage system charging and discharging control method, device, equipment and readable storage medium
CN115764936A (en) Optimization method, device, equipment and storage medium for power grid energy storage configuration
CN115392565A (en) Low-carbon operation optimization method and device for multifunctional park
CN115021336A (en) Optimization method for high-proportion new energy power system inertia deficiency scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Liang Xueqing

Inventor after: Cai Yanchun

Inventor after: Liu Xuan

Inventor after: Wang Xiyue

Inventor after: Ke Deping

Inventor after: Liu Luhao

Inventor after: Lu Youfei

Inventor after: Wu Renbo

Inventor after: Zhang Yang

Inventor after: Zhao Hongwei

Inventor after: Chen Minghui

Inventor after: Zhang Shaofan

Inventor after: Zou Shirong

Inventor before: Long Yun

Inventor before: Zou Shirong

Inventor before: Cai Yanchun

Inventor before: Liu Xuan

Inventor before: Wang Xiyue

Inventor before: Ke Deping

Inventor before: Liang Xueqing

Inventor before: Liu Luhao

Inventor before: Lu Youfei

Inventor before: Wu Renbo

Inventor before: Zhang Yang

Inventor before: Zhao Hongwei

Inventor before: Chen Minghui

Inventor before: Zhang Shaofan