Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a deep reinforcement learning-based power system source network charge storage joint regulation and control method, a device, computer equipment and a storage medium, wherein a source charge storage scheduling scheme is primarily provided according to prediction information to serve as a basis for intelligent body regulation and control; and designing a reinforcement learning framework on the basis of a source load storage scheduling scheme, and learning by an intelligent agent according to an observable value of the environment on the basis of the basic scheme to correct the output of a unit, the wind and light abandoning and the energy storage charging and discharging so as to eliminate inaccurate prediction and the power unbalance caused by sudden accidents in actual operation.
The first aim of the invention is to provide a power system source network charge storage joint regulation and control method based on deep reinforcement learning.
The second aim of the invention is to provide a power system source network load storage joint regulation device based on deep reinforcement learning.
A third object of the present invention is to provide a computer device.
A fourth object of the present invention is to provide a storage medium.
The first object of the present invention can be achieved by adopting the following technical scheme:
a deep reinforcement learning-based power system source network charge storage joint regulation and control method, comprising:
according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
designing a reinforcement learning architecture of the network-load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source-load storage scheduling scheme through learning to realize the source-network-load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
Further, the line tide margin rewards are:
wherein n is L P is the number of the line connecting wires l For line l tide, P l,max Is the maximum value of the power flow of the line l.
Further, the energy storage rewards are:
wherein, SOC is the current charge capacity of energy storage, SOC min 、SOC max Respectively the minimum and maximum charge capacity of energy storage; p (P) b And (5) energy storage charging and discharging power is stored after the decision of the intelligent agent.
Further, the reinforcement learning architecture includes:
environmental state quantity: the method is an abstract representation of the condition presented by the current power system environment, and is information which can be acquired and required by an intelligent agent;
action space: aiming at the adjustable unit, the new energy unit and the energy storage equipment respectively;
state transition: the next environmental state is determined only by the current environmental state and the action performed;
bonus function: for guiding the training of the agent to make decisions in the direction of maximizing the jackpot.
Further, the environmental state quantity is at the power grid state s at the time t t The description is as follows:
s t =(P disp ,P g ,P w,fc ,P s,fc ,P l,fc ,ΔP up,max ,ΔP down,max ,S l ,SOC,ρ,ρ cur )
P g (t)=P G (t)+P disp (t)
ΔP up,max =min(P Gmax -P G (t+1),P G,rampup )
ΔP down,max =min(P G (t+1)-P Gmin ,P G,rampdown )
wherein P is disp (t) is a rescheduling value at the moment t and is the accumulation of rescheduling actions of the intelligent agent unit; p (P) G (t) is the output of an adjustable generator at the moment t in a day-ahead source load storage scheduling scheme; p (P) g (t) is the output of the unit at the moment t in the final scheme of superposition of the source load storage scheduling scheme and the actions of the agent; p (P) w,fc ,P s,fc ,P l,fc The predicted values of wind power, photovoltaic and load are respectively obtained; ΔP up,max ,ΔP down,max The maximum adjustment spaces are respectively upward and downward for the unit; s is S l Is a line state; SOC is the current charge capacity of energy storage; p (P) Gmax 、P Gmin The upper limit and the lower limit of the output of the unit are respectively; p (P) G,rampup 、P G,rampdown The upper limit and the lower limit of the climbing of the unit are respectively set; ρ is the line load factor; p (P) L Is the line tide; p (P) L,max Is the maximum value of the line tide; ρ cur The new energy reduction ratio is the accumulation value of the actions of the intelligent agent.
Further, for the agent part states and actions, there is a state transition relationship as follows:
P W =ρ cur *P W,max
P S =ρ cur *P S,max
P b =P B +ΔP B
wherein P is W 、P S 、P b The wind power, the photovoltaic and the energy storage actual output after decision making are respectively carried out.
Furthermore, the source charge storage scheduling scheme is solved before the day, and when the source charge storage scheduling scheme is used for scheduling in real time in the day, the trained intelligent body rapidly outputs action values including unit output correction quantity, wind power and photovoltaic reduction proportion and energy storage charge and discharge power according to the environmental state quantity provided in each time step, so that source charge storage joint regulation and control are realized.
Further, the environment is treated as follows in the training of the agent:
when the line load rate is not in the set range, the soft overload is considered, and the line is disconnected after m time steps; wherein m is a set value;
when the line load rate is greater than the maximum value of the set range, the hard overload is considered, and the line is immediately disconnected;
the automatic reconnection is carried out n time steps after the circuit is disconnected; wherein n is a set value, and n > m;
setting the probability of line breakage of each time step;
wherein the time step is different from the time step in the source payload scheduling scheme: the time interval/granularity is smaller. The second object of the invention can be achieved by adopting the following technical scheme:
a deep reinforcement learning-based power system source network charge storage joint regulation and control device, the device comprising:
the scheduling scheme generation module is used for preliminarily generating a source charge storage scheduling scheme with economy as a target according to the power system prediction information, wherein the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
the scheduling scheme correction module is used for designing a reinforcement learning architecture of the source network charge storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source charge storage scheduling scheme through learning to realize the source network charge storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
The third object of the present invention can be achieved by adopting the following technical scheme:
the computer equipment comprises a processor and a memory for storing a program executable by the processor, wherein the power system source network load storage joint regulation method is realized when the processor executes the program stored by the memory.
The fourth object of the present invention can be achieved by adopting the following technical scheme:
a storage medium storing a program, which when executed by a processor, implements the above-mentioned power system source network load storage joint regulation method.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a power system source network charge storage scheduling decision method by means of the excellent decision capability of traditional reinforcement learning on an uncertainty scene, and in order to improve the training efficiency of reinforcement learning agents, a source charge storage scheduling scheme is primarily given according to prediction information and is used as a basis for the adjustment and control of the agents; designing a reinforcement learning framework on the basis of a source load storage scheduling scheme, and training an intelligent body through deep reinforcement learning; the intelligent agent corrects the source load storage scheduling scheme according to the observable value of the environment so as to eliminate inaccurate prediction and the power unbalance caused by sudden accidents in actual operation. In addition, the rewarding function in the reinforcement learning architecture reserves enough margin and enough reserve for an uncertainty scene by defining line tide margin and energy storage charge and discharge rewarding guiding agent, so that the safety of system operation is improved.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention. It should be understood that the description of the specific embodiments is intended for purposes of illustration only and is not intended to limit the scope of the present application.
Example 1:
as shown in fig. 1 and 2, the power system source network load storage joint regulation and control method based on deep reinforcement learning provided in this embodiment includes the following steps:
s101, a source load storage scheduling scheme is initially generated as a basic scheme according to prediction information.
According to the method, a source charge storage scheduling scheme is initially generated through mathematical modeling according to prediction information, and the source charge storage scheduling scheme is used as a basic scheme for subsequent correction.
The power system arranges the power output and energy storage charging and discharging scheme of the unit according to grid information, new energy and load prediction information, and according to the start-stop arrangement of the unit, with economy as a main target and 1h as a time interval.
(1) And controlling the target.
The control targets are as follows:
wherein C is G (t) is the total cost of the adjustable generator set at time t; c (C) B And (t) is the total scheduling cost at the time of energy storage t.
(1-1) Adjustable Generator Total cost:
C G (t)=aP G (t) 2 +bP G (t)+c
wherein P is G And (t) is the output of the adjustable generator at the moment t, and a, b and c are the comprehensive cost coefficients of the generator respectively.
(1-2) energy storage scheduling total cost.
In this embodiment, an energy storage system formed of a storage battery is taken as an example, and cost analysis is performed. The total cost comprises maintenance and loss cost, and the whole and the charge and discharge power are in secondary relation:
C B (t)=ω B P B (t) 2
wherein: p (P) B (t) is the charge and discharge power of the energy storage, the value of the charge and discharge power is positive, and the energy storage is in a charging state and can be regarded as a load; otherwise, the power generation device is in a discharge state and can be regarded as power generation equipment; omega B Is a cost factor of the stored energy output.
(2) Constraint conditions.
(2-1) active power balance constraint:
P w (t)+P s (t)+P G (t)=P l (t)+P B (t)
wherein: p (P) w (t)、P s (t)、P l And (t) wind power, photovoltaic power and load prediction power are divided.
(2-2) adjustable unit output and climbing constraint:
P Gmin <P G (t)<P Gmax
P G,rampdown <P G (t)-P G (t-1)<P G,rampup
wherein: p (P) Gmax 、P Gmin Respectively the upper and lower limits of the output of the unit, P G,rampup 、P G,rampdown The upper limit and the lower limit of the climbing of the unit are respectively defined.
(2-3) energy storage charge and discharge limit value:
P B,pro <P B (t)<P B,ab
wherein P is B,pro To store maximum discharge power, P B,ab And storing the maximum charging power.
(2-4) energy storage operation constraint:
wherein: SOC (t) is the charge capacity of energy storage, E (t) is the energy storage energy value at time t, E max For storing the maximum energy value. The state value of the energy storage charge capacity is related to the energy storage output at the last moment; η (eta) ab 、η pro Respectively the charge and discharge efficiency of the energy storage. Meanwhile, when the energy storage is operated, due to capacity limitation, the upper limit and the lower limit of the SOC are constrained as shown in the following formula; meanwhile, in order to enable the next optimization to be continuous, the energy storage is required to be consistent with the initial value at the end of the optimization period, and the following constraint exists:
SOC min ≤SOC(t)≤SOC max
Soc(0)=Soc(T)
wherein SOC is min 、SOC max Respectively the minimum and maximum charge capacity of energy storage; t is the optimal run length, which in this example is referred to as 24 hours on day 1.
(2-5) rotating the standby constraint.
In order to ensure that the system has certain spare capacity to cope with the occurrence of an uncertainty scene in the day, the rotation spare constraint needs to be satisfied during the running:
wherein epsilon is the rotational reserve rate.
The embodiment aims at economy, realizes power balance, and generates a scheduling scheme of 1h time steps by a mathematical modeling method as a basic value of each scheduling unit. The time interval in the embodiment is not limited to 1h.
The modeling method in the embodiment is a basic scheme of unit output and energy storage charge and discharge, and a deterministic modeling method is adopted, so that a reference is provided for intelligent body training. Other algorithms such as random planning, robust optimization and the like are adopted, so that the design and training of the reinforcement learning framework are not affected.
S102, designing a reinforcement learning framework of source network load storage joint scheduling.
And (3) correcting the scheduling scheme generated by the data modeling by training an intelligent agent through a reinforcement learning algorithm to obtain a source network load storage joint scheduling decision scheme with a time interval of 5 minutes. The scheduling scheme generated in step S101 is hereinafter referred to as a base scheme.
According to the requirements formed by five-tuple (S, A, P, R, gamma) of the Markov decision process in reinforcement learning, the power system environment modeling focuses on the setting of the state S, the action A, the reward R and the discount factor gamma. In this embodiment, in a specific embodiment, a continuous decision of 5min time intervals within a day is considered, that is, 288 steps in each scene, and an appropriate γ=0.99 is selected.
(1) Environmental state quantity.
The environment state quantity is an abstract representation of the situation presented by the current power system environment, and is information which can be acquired and required by an intelligent agent. Grid state s at time t t Can be described as:
s t =(P disp ,P g ,P w,fc ,P s,fc ,P l,fc ,ΔP up,max ,ΔP down,max ,S l ,SOC,ρ,ρ cur )
P g (t)=P G (t)+P disp (t)
ΔP up,max =min(P Gmax -P G (t+1),P G,rampup )
ΔP down,max =min(P G (t+1)-P Gmin ,P G,rampdown )
wherein P is disp Is P disp In short terms of (t),representing the rescheduling value at the time t, which is the accumulation of rescheduling actions of the intelligent agent unit, P g Is P g The shorthand of (t) indicates the output of the unit at the moment t, (note that the subscript is the uppercase G, namely the basic scheduling scheme, and the lowercase G is the final scheme of superposition of the basic scheme and the actions of the agent); p (P) w,fc ,P s,fc ,P l,fc Respectively wind power, photovoltaic and load predicted value and delta P up,max ,ΔP down,max The maximum adjustment spaces are respectively upward and downward for the unit; s is S l The state of the line is 0-1 variable, 0 is line disconnection, and l is line connection. SOC is the current charge capacity of energy storage, ρ is the line load rate, P L For the flow of lines, P L,max Is the maximum value of the line tide; ρ cur The new energy reduction ratio is the accumulation value of the actions of the intelligent agent.
(2) An action space.
The action is a control variable of the current time step, and the action space defined in the embodiment comprises three types, namely an adjustable unit, a new energy unit and energy storage equipment.
a t =[ΔP disp ,Δρ cur ,ΔP B ]
Wherein: ΔP disp For rescheduling power of a unit, Δρ cur Is a new energy unit, comprises the reduction proportion of wind power and photovoltaic, delta P B And rescheduling the output for the stored energy.
(3) And (5) state transition.
From the markov decision process definition, the next environmental state is determined only by the current environmental state and the action performed:
p represents the state s as the state transition function of the environment
t Take action a =s
t After =a state transitions to the next state s
t+1 Probability of =s'. Because various uncertainties and high nonlinearity exist in the power system, the environment, namely the state transfer process, of interaction between the reinforcement learning agent and the power system is formed by a power flow simulator, and the power flow simulator calculates power flow of a power grid, outputs line power, line current, unit output and the like according to the environment state and the adjustment quantity given by the agent, and feeds back rewards.
For the agent part states and actions, the following state transition relationships exist:
P W =ρ cur *P W,max
P S =ρ cur *P S,max
P b =P B +ΔP B
wherein P is W 、P S 、P b The wind power, the photovoltaic and the energy storage actual output after decision making are respectively carried out.
(4) A bonus function.
Rewards r t For guiding the training of the agent to make decisions in the direction of maximizing the jackpot. The goal of daily security is met by designing a reward function.
(4-1) setting a positive reward +1 for each time step survivor for basic power supply and power balance;
(4-2) taking system line margin (only connecting lines are considered) into consideration, setting line load rate rewards, wherein the value range is [0,1], and the smaller the line load rate is, the larger the obtained rewards are:
wherein, the liquid crystal display device comprises a liquid crystal display device,n L for the number of connecting lines, P l For line l tide, P l,max Is the maximum value of the power flow of the line l.
(4-3) in order to encourage the energy storage to have sufficient reserve, i.e., discharge when the charge capacity is large, charge when the charge capacity is small, set an energy storage reward:
the rewards obtained in a single time step are the sum of the three rewards.
In the embodiment, the rewarding function guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining the line load flow margin and the energy storage charging and discharging rewards, so that the safety of system operation is improved.
S103, utilizing an intra-day intelligent agent to assist in regulating and controlling the basic scheme.
(1) And training an intelligent body.
The agent needs to make an effective decision in various uncertainty environments, and in order to enable the strategy generated by the agent to be oriented to the uncertainty environments, the agent has certain robustness, so that the operation safety of the power system is ensured, and in the embodiment, the following treatment is performed on the environments:
(1-1) when the line load rate is greater than 1 and less than 1.35, the soft overload is considered to be disconnected after 4 time steps (namely 20 min);
(1-2) when the line load rate is greater than 1.35, considering that the hard overload exists, and immediately disconnecting;
(1-3) automatically reconnecting 16 time steps after the line is disconnected;
(1-4) each time step has a probability of 1% that a line disconnection occurs;
the time interval of 5min is taken as one time step, 288 steps are taken in total, and one day is taken as one training round. And (3) following a reinforcement learning algorithm and a training process, and training the intelligent body for a plurality of rounds.
(2) The intelligent agent assists in the regulation and control process.
The basic scheme can be solved before the day, and when the intelligent agent is scheduled in real time in the day, the trained intelligent agent can provide environment state quantity according to each time step and rapidly output action values, wherein the action values comprise machine set output correction quantity, wind power/photovoltaic reduction proportion and energy storage charge and discharge power, so that source network charge and storage joint regulation and control are realized.
In the embodiment, the intelligent agent is trained through a deep reinforcement learning method, safety is taken as a target, and the adjustment decision is learned and corrected.
In this embodiment, the test system selects an improved ieee 14 node system, which includes a renewable energy unit including wind power generation and photovoltaic power generation, and an energy storage system, and the maximum charge and discharge power is 15MW. The basic scheduling scheme is solved by adopting a Gurobi solver, and the reinforcement learning algorithm selects the SAC algorithm to train the intelligent agent. One scene is selected, as shown in fig. 2, a 1h resolution regulation and control scheme is obtained according to the step S101, a training-completed intelligent body is obtained according to the frame designed in the step S102, and the unit output and the energy storage charge and discharge can be continuously corrected on the basis of the basic scheme, as shown in fig. 3 and 4. The method has the advantages that the reinforcement learning method is adopted to realize the active correction mode for the power system source network load storage joint scheduling, and the basic scheme can be quickly corrected at 5min time step intervals in real time.
Those skilled in the art will appreciate that all or part of the steps in a method implementing the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium.
It should be noted that although the method operations of the above embodiments are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all illustrated operations be performed in order to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
Example 2:
as shown in fig. 6, the present embodiment provides a power system source network load storage joint regulation device based on deep reinforcement learning, which includes a scheduling scheme generating module 601 and a scheduling scheme correcting module 602, wherein:
the scheduling scheme generating module 601 is configured to initially generate a source load storage scheduling scheme according to the power system prediction information, with economy as a target, where the source load storage scheduling scheme is used as a basis for intelligent body regulation;
the scheduling scheme correction module 602 is configured to design a reinforcement learning architecture of the source network load storage joint scheduling, learn to correct a source load storage scheduling scheme by using deep reinforcement learning training agents with safety as a target, and implement the source network load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
Specific implementation of each module in this embodiment may be referred to embodiment 1 above, and will not be described in detail herein; it should be noted that, the apparatus provided in this embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules, so as to perform all or part of the functions described above.
Example 3:
the present embodiment provides a computer device, which may be a computer, as shown in fig. 7, and is connected through a system bus 701, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium 706 and an internal memory 707, where the nonvolatile storage medium 706 stores an operating system, a computer program, and a database, and the internal memory 707 provides an environment for the operating system and the computer program in the nonvolatile storage medium, and when the processor 702 executes the computer program stored in the memory, the power system source network load storage joint control method of the foregoing embodiment 1 is implemented as follows:
according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
designing a reinforcement learning architecture of the network-load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source-load storage scheduling scheme through learning to realize the source-network-load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
Example 4:
the present embodiment provides a storage medium, which is a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the power system source network load storage joint regulation method of the foregoing embodiment 1, and is as follows:
according to the prediction information of the power system, a source charge storage scheduling scheme is initially generated with economy as a target, and the source charge storage scheduling scheme is used as a basis for intelligent body regulation;
designing a reinforcement learning architecture of the network-load storage joint scheduling, training an intelligent body through deep reinforcement learning, aiming at safety, and correcting a source-load storage scheduling scheme through learning to realize the source-network-load storage joint scheduling; and the rewarding function in the reinforcement learning architecture guides the intelligent agent to reserve enough margin and enough reserve for the uncertainty scene by defining line tide margin rewards and energy storage rewards.
The computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The above-mentioned embodiments are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can make equivalent substitutions or modifications according to the technical solution and the inventive concept of the present invention within the scope of the present invention disclosed in the present invention patent, and all those skilled in the art belong to the protection scope of the present invention.