CN115940289A - Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid - Google Patents

Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid Download PDF

Info

Publication number
CN115940289A
CN115940289A CN202211625303.3A CN202211625303A CN115940289A CN 115940289 A CN115940289 A CN 115940289A CN 202211625303 A CN202211625303 A CN 202211625303A CN 115940289 A CN115940289 A CN 115940289A
Authority
CN
China
Prior art keywords
power
charging integrated
energy storage
station
charging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211625303.3A
Other languages
Chinese (zh)
Inventor
刘曌
段玉戈
许寅
孙庆凯
王希豪
王小君
和敬涵
王颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202211625303.3A priority Critical patent/CN115940289A/en
Publication of CN115940289A publication Critical patent/CN115940289A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Charge And Discharge Circuits For Batteries Or The Like (AREA)

Abstract

The invention provides a method for operating a light storage and charging integrated station for power balance and new energy consumption of a power grid, which comprises the following steps of: based on a comprehensive service scene of the light storage and charging integrated station, establishing a combined operation optimization model of the light storage and charging integrated station by taking the minimum operation cost of the light storage and charging integrated station as a target, taking power balance constraint, main grid interaction power constraint and equipment operation constraint as constraint conditions and taking the real-time output of an energy storage unit as a decision variable; a reinforcement learning network for converting the Markov decision process method into a strategy decision problem; the method is characterized in that historical system state data is adopted, a reinforcement learning network is trained in an off-line mode based on a DDPG algorithm, a time-of-use electricity price mechanism is combined to carry out scheduling cost minimum economic optimization calculation on a combined operation optimization model of the optical storage and charging integrated station, a real-time self-optimization operation output result of an energy storage system in the optical storage and charging integrated station is obtained, dynamic optimization operation of the optical storage and charging integrated station is further achieved, and dimension disasters in calculation can be effectively avoided.

Description

Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid
Technical Field
The invention relates to the technical field of operation scheduling of electric vehicle charging stations, in particular to a light storage and charging integrated station operation method for power grid electric quantity balance and new energy consumption.
Background
In the background of traffic electrification, planning and design of charging facilities and power distribution networks need to be considered to adapt to the large-scale development trend of electric automobiles. Meanwhile, energy storage and photovoltaics have received much attention due to improvements in technical performance and reduction in cost. Photovoltaic power generation has low cost, green clear advantage, but it is influenced by external environment great, and it presents certain volatility to exert oneself, owing to dispose energy storage system and can further strengthen the compensation effect on the spot to electric automobile charging load, can realize energy translation on time and space through the charge-discharge action of management energy storage battery, alleviates the power supply pressure of electric wire netting at peak period. Therefore, a diversified application form organically combining the photovoltaic, the energy storage and the electric automobile is an important way for realizing the connection of the electric automobile charging station and the renewable energy source, and the influence of the charging behavior of the electric automobile on a power grid can be effectively reduced.
In the existing method, related researches are relatively few at present for the optimal scheduling of an integrated comprehensive power station integrating photovoltaic and energy storage, and the problem of coordination, optimization and complementation of various resources in the station is mainly solved based on various control algorithms of traditional optimization modeling. The method mainly comprises a self-adaptive robust energy-standby cooperative optimization scheduling method of the light storage charging tower, which aims at minimizing the total daily running cost, and a four-stage intelligent optimization control algorithm of integrating the electric vehicle bidirectional charging station based on photovoltaic power generation and fixed battery energy storage with commercial buildings. The algorithm reduces the operation cost related to customer satisfaction to the maximum extent on the premise of considering potential uncertainty, and balances the real-time supply and demand among source, storage and load by adjusting the plan. However, the optimization model is mostly constructed with the maximum profit of the charging station as a target, and for the optical storage charging and discharging integrated station, while considering the economic operation in the station, the problem of local consumption of new energy still needs to be further considered. And the optimal operation scheduling method at the present stage mostly focuses on day-ahead scheduling, so that the method is limited to a fixed scheduling plan and cannot dynamically respond to random changes of sources and loads. Meanwhile, the existing optimized operation model is based on the traditional mathematical optimization modeling, and the method still depends on the accurate prediction of renewable energy sources and loads. With the popularization of the electricity consumption information acquisition system, the application of the data-driven machine learning method in the aspect of optimizing the operation of the power system has attracted wide attention of scholars at home and abroad. In order to realize the goal of optimizing the output plan of various resources in the station to reduce the operation cost of the power station, the existing reinforcement learning algorithm describes the scheduling problem of each resource in the station as a constraint Markov decision process and provides a model-free method based on deep reinforcement learning. The method generates a constrained optimal in-station resource contribution plan by directly learning using a deep neural network. However, the conventional reinforcement learning method is well in processing the problem of small-scale discrete space, but the number of states obtained by discretization of the conventional reinforcement learning method is exponentially increased along with the increase of the space dimension in processing the continuous state variable task, that is, the problem of dimension disaster exists, and effective learning cannot be performed. When the combined economic operation problem of the light storage and charging integrated power station is solved, the traditional reinforcement learning method cannot be effectively solved because the load, the photovoltaic power generation and the charge state in the state space of the light storage and charging integrated power station are continuous variables. Meanwhile, the action of the in-station energy storage system is also a continuous variable, and discretization of the action space can fuzzify and decide a great deal of information in the action domain.
Therefore, aiming at the optical storage and charging integrated station, when considering the in-station economic operation, the problem of on-site consumption of new energy still needs to be further considered, how to coordinate the output condition of all energy storage systems in the station on the basis of comprehensively considering the integrated service scene of the electric vehicle charging station of the integrated photovoltaic and energy storage system, effectively avoid dimension disaster and save the whole action domain information, and further realize the optimal operation of the optical storage and charging station facing the power grid power balance and the new energy consumption, which is a problem that needs to be solved urgently in the prior art.
Disclosure of Invention
The invention provides a method for operating a light storage and charging integrated station for power balance and new energy consumption of a power grid, and aims to solve the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
The embodiment of the invention provides a method for operating a light storage and charging integrated station for power balance and new energy consumption of a power grid, which comprises the following steps:
s1, establishing a combined operation optimization model of the optical storage and charging integrated station by taking the minimum operation cost of the optical storage and charging integrated station as a target, taking power balance constraint, main power grid interaction power constraint and equipment operation constraint as constraint conditions and taking the real-time output of an energy storage unit as a decision variable based on a comprehensive service scene of the optical storage and charging integrated station;
s2, converting the dynamic scheduling problem in the combined operation optimization model into a reinforcement learning network of a strategy decision problem by using a Markov decision process method;
s3, training the reinforcement learning network in an off-line manner by adopting historical system state data based on a deep deterministic strategy gradient (DDPG) algorithm to obtain a trained reinforcement learning network;
and S4, according to the trained reinforcement learning network, combining a time-of-use electricity price mechanism to carry out scheduling cost minimum economic optimization calculation on the combined operation optimization model of the light storage and charging integrated station to obtain a real-time self-optimization-approaching operation output result of the energy storage system in the light storage and charging integrated station, and further realize the optimal operation of the light storage and charging integrated station.
Preferably, the joint operation optimization model includes:
the objective function is shown in the following equation (1):
F=min(C E +C BES ) (1)
wherein F is the operation cost of the light storage and charging integrated station, and C E The cost of purchasing electricity from the power grid for the light storage and charging integration station,
Figure BDA0004004075460000031
P grid (t) the power of the system and the main power grid for power exchange in a time period t, wherein the positive value represents that the system purchases power from the main power grid, and the negative value represents that the system carries out surplus power online;
ε e (t) electricity prices for time period t; Δ t is the time gap length; c BES The cost is reduced for the charging and discharging of the electric energy storage,
Figure BDA0004004075460000041
P BES (t) is the charging or discharging power of the electrical energy storage in a time period t, a positive value indicates that the electrical energy storage is in a discharging state, and a negative value indicates that the electrical energy storage is in a charging state; ρ is a unit of a gradient BES Depreciation cost factors for electrical energy storage;
the constraints are as follows:
1) And (3) power balance constraint:
at time t, the electric power balance constraint is as shown in the following equation (2):
P grid (t)+P pv (t)+P BES (t)=P load (t) (2)
wherein, P pv (t) is photovoltaic power generation power; p load (t) a customer electrical load demand for a time period t;
2) The main grid interaction power constraint is shown as the following formula (3):
Figure BDA0004004075460000042
/>
wherein the content of the first and second substances,
Figure BDA0004004075460000049
and &>
Figure BDA00040040754600000410
Respectively is the lower limit and the upper limit of the interactive power of the system and the main power grid;
3) The plant operating constraints are shown in the following equations (2) - (3):
Figure BDA00040040754600000415
wherein the content of the first and second substances,
Figure BDA00040040754600000412
and &>
Figure BDA00040040754600000413
Lower and upper limits for electrical energy storage charge/discharge power, respectively;
for electrical energy storage devices, the constraints are shown in equations (3) - (4) below:
Figure BDA0004004075460000043
Figure BDA0004004075460000044
wherein the content of the first and second substances,
Figure BDA0004004075460000045
and &>
Figure BDA0004004075460000046
Respectively the lower limit and the upper limit of the electric energy storage charge state; c SOC (t) is the state of charge of the electrical energy storage over time period t; q BES Capacity to store energy for electricity; />
Figure BDA00040040754600000414
The charge state of the initial electric energy storage is obtained; eta BES A charge/discharge coefficient for electrical energy storage; />
Figure BDA0004004075460000047
Figure BDA0004004075460000048
η ch And η dis Charging efficiency and discharging efficiency of the electrical energy storage are respectively.
Preferably, step S2 comprises:
converting the problem of minimizing the operation cost of the light storage and charging integrated power station into a reward maximization form of the intelligent agent, as shown in the following formula (5):
Figure BDA0004004075460000051
wherein r is t (s t ,a t ) Total reward value, s, for an agent to obtain during a scheduled time period t t For the observation state, s, of the light storage and charging integrated power station in the scheduling time period t t ={P load (t),P pv (t),C soc (t-1),t},P load (t),P pv (t),C soc (t-1), wherein t is the demand of the user electric load, the photovoltaic power generation power, the electric energy storage charge state and the scheduling time interval; a is a t For dynamic economic dispatching action of an energy storage system of a light storage and charging integrated power station, the action in the integrated power station can be controlled by the output condition P of equipment in a time period t BES (t) represents a t ={P BES (t)};
Figure BDA0004004075460000052
Is the scale by which the cost value is scaled, (C) E (s t ,a t ) For agents in state s i Next, the cost of purchasing electricity from the power grid by the light storage and charging integrated station in the time period t; c BES (s t ,a t ) For agent in state S i Next, the cost is reduced by charging and discharging of the electric energy storage in the time period t;
the action-value function Q using the following formula (6) π (s, a) in state s for energy storage system of light storage and charging integrated station l Dynamic economic dispatch action a of time l Making an evaluation, the action-value function Q π The larger (s, a), the l The better:
Figure BDA0004004075460000053
wherein, E π () expectation under optimal target strategy π; gamma ray k ∈[0,1]A discount factor, which represents the influence of the jackpot prize at a future time, γ k The larger the value, the more important the reward for the future; r is t+k A total reward value obtained for the agent over time period t + k; s t+k The state of the light storage and charging integrated station in a time period t + k is shown; a is t+k The action executed by the light storage and charging integrated station in the time period t + k, k belongs to N * The generation of the intelligent agent cycle learning is represented;
an optimal target strategy pi is obtained to maximize the action-value function according to the following equation (7):
Figure BDA0004004075460000054
/>
where A is the action set of the agent action.
Preferably, step S3 comprises:
the historical system state data is an observation state of the optical storage and charging integrated power station and comprises system electrical load demand, photovoltaic power generation power, electrical energy storage charge state and a scheduling time period of the optical storage and charging integrated power station;
the reinforcement learning network comprises a value network and a strategy network, and the value network and the strategy network pass through pi (s | theta) π ) Sum network Q (s, a | θ) Q ) Creating two independent target networks pi' respectively (s | theta π′ ) And Q' (s, a | theta) Q′ ) As shown in the following formulas (8) and (9):
Figure BDA0004004075460000061
Figure BDA0004004075460000062
setting a cycle optimization period T, inputting historical system state data into a target network of a strategy network and a value network, after training a batch of data, updating parameters of the current networks of the strategy network and the value network by a DDPG algorithm through a gradient ascending or gradient descending algorithm, and then updating the parameters of the target networks of the strategy network and the value network through a soft updating method; and after T times of circulation, finishing off-line learning of the DDPG algorithm to obtain a trained reinforcement learning network.
Preferably, the S4 step includes:
when receiving the scheduling task, in each time interval, according to the current state s of the system t Selecting scheduling action a by using the trained reinforcement learning network t
Performing action a t And enters the next environmental state while awarding the prize r t
Then, the state information s in the integrated station in the time period t +1 is collected t+1 And taking the time-of-use electricity price information as a new sample, and performing dynamic scheduling decision of the time period, namely a real-time self-optimization operation output result of the energy storage system in the optical storage and charging integrated station.
Preferably, when training the reinforcement learning network offline, the action a according to the following equation (10) t Training the reinforcement learning network:
at=π(s tπ )+v t (10)
wherein v is t Is random noise.
According to the technical scheme provided by the operation method of the light storage and charging integrated station for power grid electric quantity balance and new energy consumption, the mathematical model is converted into the reinforcement learning network capable of solving the reinforcement learning algorithm, the historical data is input, the DDPG algorithm is adopted for training, and the real-time parameters in the optimization period are input based on the trained reinforcement learning network, so that the economy of scheduling and operation of the light storage and charging integrated station is improved, the in-situ consumption of new energy resources is realized, and the problems of dimension disasters and suboptimal scheduling strategy selection in the discretization process are effectively solved by adopting the DDPG algorithm.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an operation method of a light storage and charging integrated station for power balance and new energy consumption of a power grid according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a framework of an operation method of a light storage and charging integrated station for power balance and new energy consumption of a power grid according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a reinforcement learning process according to the present embodiment;
FIG. 4 is a schematic diagram illustrating a reinforcement learning process according to the present embodiment;
FIG. 5 is a statistical representation of historical data according to an embodiment;
FIG. 6 is a time of use electricity price information diagram;
FIG. 7 is a trend graph of training results of DDPG algorithm;
FIG. 8 is a diagram of energy storage system scheduling results;
fig. 9 is a schematic diagram of a situation in which an integrated station exchanges electric power with a main grid.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
Examples
The embodiment of the invention provides a method for operating a light storage and charging integrated station facing power grid electric quantity balance and new energy consumption, which specifically comprises the following steps as shown in fig. 1 and fig. 2:
the method comprises the following steps of S1, establishing a combined operation optimization model of the optical storage and charging integrated station based on a comprehensive service scene of the optical storage and charging integrated station by taking the minimization of the operation cost of the optical storage and charging integrated station as a target, taking power balance constraint, main power grid interaction power constraint and equipment operation constraint as constraint conditions and taking the real-time output of an energy storage unit as a decision variable.
The combined operation optimization model comprises the following steps:
the objective of the light storage and charging integrated station economic dispatch problem is to minimize the plant operating costs, which include the cost of purchasing electricity from the grid and the cost of charging and discharging depreciation of the electrical energy storage. The objective function is shown in the following equation (1):
F=min(C E +C BES ) (1)
wherein F is the operation cost of the light storage and charging integrated station, and C E The cost of purchasing electricity from the power grid for the light storage and charging integration station,
Figure BDA0004004075460000091
P grid (t) the power of the system exchanging electric power with the main power grid in the time period t, wherein the positive numerical value represents that the system purchases electricity to the main power grid, and the negative numerical value represents that the system carries out surplus electricity networking; epsilon e (t) electricity prices for time period t; Δ t is the time gap length; c BES Charging and discharging depreciation cost for electric energy storage>
Figure BDA0004004075460000092
P BES (t) is the charging or discharging power of the electrical energy storage in a time period t, a positive value indicates that the electrical energy storage is in a discharging state, and a negative value indicates that the electrical energy storage is in a charging state; ρ is a unit of a gradient BES Depreciating the cost coefficient for the electric energy storage.
In order to ensure that the output power of the energy storage system in the optical storage and charging integrated station can realize real-time self-optimization operation as far as possible on the premise of meeting the equipment operation conditions, the following constraints including power balance constraint, main power grid interaction power constraint and equipment operation constraint need to be considered. The specific constraints are as follows:
and power balance constraint:
at time t, the electric power balance constraint is as shown in the following equation (2):
P grid (t)+P pv (t)+P BES (t)=P load (t) (2)
wherein, P pv (t) is photovoltaic power generation power; p load (t) a customer electrical load demand for a time period t;
considering the operation stability of the power grid side, the main grid has upper and lower limit constraint requirements on power interaction of the optical storage and charging integrated station, and the constraint on the power interaction of the main grid is as shown in the following formula (3):
Figure BDA0004004075460000101
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004004075460000107
and &>
Figure BDA0004004075460000108
Respectively is the lower limit and the upper limit of the interactive power of the system and the main power grid;
each device in the light storage and charging integrated station has an upper limit range and a lower limit range of device operation, and the device operation constraint is as shown in the following formula (4):
Figure BDA0004004075460000102
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004004075460000109
and &>
Figure BDA00040040754600001010
Lower and upper limits for electrical energy storage charge/discharge power, respectively;
for the electrical energy storage device, it is also necessary to avoid the damage of deep charging and discharging to the electrical energy storage, so the state of charge (SOC) of the electrical energy storage is limited within a certain range, and in addition, to ensure the continuous and stable operation of the electrical energy storage, the electrical energy storage capacity is required to be equal from beginning to end of a scheduling period, and the constraints are shown in the following formulas (5) to (6):
Figure BDA0004004075460000103
Figure BDA0004004075460000104
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00040040754600001011
and &>
Figure BDA00040040754600001012
Respectively the lower limit and the upper limit of the electric energy storage charge state; c SOC (t) is the state of charge of the electrical energy storage over time period t; q BES Capacity to store energy for electricity; />
Figure BDA00040040754600001013
The charge state of the initial electric energy storage is obtained; eta BES A charge/discharge coefficient for electrical energy storage; />
Figure BDA0004004075460000105
Figure BDA0004004075460000106
η cj And η dis Charging efficiency and discharging efficiency of the electrical energy storage are respectively.
By modeling the optimal operation of the light storage and charging integrated station, the coordinated scheduling relationship among all devices in the integrated station can be determined, and the system power constraint and the device operation constraint are met.
And S2, converting the dynamic scheduling problem in the combined operation optimization model into a reinforcement learning network of a strategy decision problem by using a Markov decision process method.
The joint operation optimization model is specifically shown as the following formula (7):
Figure BDA0004004075460000111
the observation state of the light storage and charging integrated power station comprises the electric load demand of a user, the photovoltaic power generation power, the electric energy storage charge state and the scheduling time interval. For a unified station, its state is represented as: s t ={P load (t),P pv (t),C soc (t-1),t},P load (t),P pv (t),C soc (t-1), t is the user electrical load demand and light respectivelyThe voltage generation power, the electric energy storage charge state and the scheduling time period.
During the time period t, the action in the light-storing and charging integrated station can be represented by the output condition of the equipment, and the action can be represented by P BES (t) represents:
a t ={P BES (t)} (8)
the goal of optimal operation of the light storage and charging integrated station is to minimize the operating costs of the power station. Converting the problem of minimizing the operation cost of the light storage and charging integrated power station into a reward maximization form of the intelligent agent, as shown in the following formula (9):
Figure BDA0004004075460000112
wherein r is i (s i ,a i ) Total reward value, s, for an agent to obtain during a scheduled time period t i The observation state of the light storage and charging integrated station in the scheduling time period t is obtained; a is t Performing dynamic economic dispatching action on an energy storage system of the optical storage and charging integrated power station;
Figure BDA0004004075460000113
is the scaling of the cost value, (C) E (s i ,a i ) For agent in state s i The cost of purchasing electricity from the power grid by the light storage and charging integrated station in the time period t is lower; c BES (s t ,a t ) For agent in state s t Next, the charging and discharging of the electrical energy storage of time period t depreciates the cost.
A certain state s in the operation process of the light storage and charging integrated station t When the determination is made, the action-value function Q of the following equation (10) is used π (s, a) charging the energy storage system of the light charging and storage integrated station at the state s l Dynamic economic dispatch action a of time l Performing an evaluation, the action-value function Q π The larger (s, a), the l The more excellent:
Figure BDA0004004075460000114
wherein, the first and the second end of the pipe are connected with each other,E π () expectation under optimal target strategy pi; gamma ray k ∈[0,1]A discount factor, which represents the influence of the jackpot prize at a future time, γ k The larger the reward, the more important the reward for the future; r is t+k A total reward value obtained for the agent over time period t + k; s t+k The state of the light storage and charging integrated station in a time period t + k is shown; a is a t+k The action executed by the light storage and charging integrated station in the time period t + k, k belongs to N * And represents the generation of the intelligent agent cycle learning.
The goal of the combined operation optimization model of the light storage and charging integrated station is to find an optimal strategy pi to maximize an action-value function, and the optimal target strategy pi is obtained according to the following formula (11):
Figure BDA0004004075460000121
wherein, A is the action set of the action of the agent.
And S3, training the reinforcement learning network in an off-line mode by adopting historical system state data based on a Deep Deterministic strategy gradient (DDPG) algorithm to obtain the trained reinforcement learning network. Specifically, as shown in the schematic diagram of the reinforcement learning process in fig. 3 and the schematic diagram of the reinforcement learning flow in fig. 4.
The historical system state data is an observation state of the optical storage and charging integrated power station and comprises system electrical load demand, photovoltaic power generation power, electrical energy storage charge state and a scheduling time period of the optical storage and charging integrated power station;
the basic components of the training include a set of states S characterizing the environment, a set of actions A characterizing the actions of the agent, and a reward r for the agent, the environment providing the agent with an observed state S during a time period t t The intelligent agent belongs to S and is based on strategy pi and integrated station state S t Generating an operating State a t
Because the data of reinforcement learning has Markov property and does not meet the precondition assumption that the training neural network needs independent and same distribution of samples, in order to ensure the learning effect, when generating sample data, DDPG stores the data obtained by exploring from the environment in a playback pool R, and when updating each time, a value network and a strategy network randomly extract a part of samples from the data to optimize so as to reduce instability.
The reinforcement learning network comprises a value network and a strategy network, and pi (s | theta) passes through the strategy network π ) Sum network Q (s, a | θ) Q ) Creating two independent target networks pi' (s | theta) respectively π′ ) And Q' (s, a | theta) Q′ ) As shown in the following formulas (12) and (13):
Figure BDA0004004075460000131
Figure BDA0004004075460000132
setting a cycle optimization period T, inputting historical system state data into a target network of a strategy network and a value network, after training a batch of (mini-batch) data, updating parameters of a current (online) network of the strategy network and the value network by a DDPG algorithm through a gradient ascending or gradient descending algorithm, and then updating parameters of the target (target) network of the strategy network and the value network by a soft update (soft update) method; and after T times of circulation, completing the off-line learning of the DDPG algorithm to obtain the trained DDPG network.
The specific training content is as follows:
1) Value network training
In a value network, by minimizing a loss function L (θ) Q ) To optimize the parameters, as shown in the following formula (14):
L(θ Q )=E(y t -Q(s t ,a tQ ) 2 ) (14)
wherein, theta Q Is a parameter of a current network in the value network; y is t Is the target Q value; e (.) is the expectation function.
y t =r t +γQ′(s t+1 ,π′(s t+1π′ )|θ Q′ ) (15)
Wherein r is t A total reward value obtained for the agent over time period t; gamma is an element of [0,1 ]]Is a discount factor; q' is a target Q value before updating; pi' is a target strategy; theta π′ Parameters of a target network in a policy network; theta Q′ Are parameters of a target network in the value network.
In time period t, the light storage and charging integrated station executes action a t Then enters the next state s t+1 Namely, the updated state of charge value of the electrical energy storage, the electrical load observed in a period of time and the photovoltaic power generation value.
L(θ Q ) About theta Q The gradient of (a) is the following formula (16):
Figure BDA0004004075460000133
wherein, y t -Q(s t ,a tQ ) Namely, the time sequence difference error is obtained, the network is updated according to the gradient rule, and the obtained updating formula is as follows:
Figure BDA0004004075460000141
wherein, mu Q Is the value web learning rate.
2) Policy network training
In a policy network, it provides gradient information
Figure BDA0004004075460000142
As a direction of motion improvement. To update the policy network, a sampling policy gradient is used as follows:
Figure BDA0004004075460000143
updating a policy network parameter θ according to the deterministic policy gradient π
Figure BDA0004004075460000144
Wherein, mu π Is the policy web learning rate.
Further, the air conditioner is provided with a fan,
θ Q′ ←τθ Q +(1-τ)θ Q′ (20)
θ π′ ←τθ π +(1-τ)θ π′ (21)
wherein tau is a soft update coefficient, and tau < 1.
And S4, according to the trained reinforcement learning network, carrying out scheduling cost minimum economic optimization calculation on the combined operation optimization model of the light storage and charging integrated station by combining a time-of-use electricity price mechanism to obtain a real-time self-optimization-approaching operation output result of the energy storage system in the light storage and charging integrated station, and further realizing the optimal operation of the light storage and charging integrated station.
When receiving the scheduling task, in each time interval, according to the current state s of the system t Selecting scheduling action a by using the trained reinforcement learning network t
Performing action a t And enters the next environmental state while awarding the prize r t
Then collecting state information s in the time period t +1 integrated station t+1 And taking the time-of-use electricity price information as a new sample, and performing dynamic scheduling decision of the time period, namely a real-time self-optimization-trending operation output result of the energy storage system in the optical storage and charging integrated station.
Preferably, when training the reinforcement learning network offline, the action a according to the following equation (18) t Training the reinforcement learning network:
a t =π(s tπ )+v t (22)
wherein v is t Is random noise. By action a t ={P BES (t) } adding random noise v t The exploration capability of the DDPG algorithm on the environment during interaction of the light storage and charging integrated station is increased, and a more optimized dynamic scheduling strategy is learned.
The following is a specific example of the operation method of the light storage and charging integrated station for power grid power balance and new energy consumption according to the embodiment:
fig. 5 shows a statistical chart of partial historical detection sample data of 2 month and 1 day 2021, which specifically includes data of electrical load and photovoltaic power generation power required by a user within one day. The electricity rate adopts a time-of-use electricity rate mechanism, as shown in fig. 6, wherein the peak time period is 12:00-19:00, plateau period 07:00-12: 00. 19:00-23:00, valley period 23:00-07:00. based on existing photovoltaic output and electric load historical data and by combining time-of-use electricity price information, the scheduling cycle length of the light storage and charging integrated station is set to be 24 hours, the interval between two adjacent time periods is set to be 15min, 5000episodes training is carried out on the intelligent agent by means of a DDPG algorithm, and then convergence is carried out, so that the optimal operation strategy of the electric energy storage system in the power station is obtained. Fig. 5 shows an average reward curve in the training process of the agent, and the algorithm converges after 5000 epsilon classes of training, so as to obtain an optimal dynamic economic dispatching strategy. It can be observed that the value of the reward obtained after the agent performs the scheduling decision is small, since the agent is initially unfamiliar with the environment. As the training process continues, the agent is constantly interacting with the environment and gaining experience, so the overall trend of the reward value is gradually increasing and eventually converging. This indicates that the agent has learned the optimal scheduling strategy that minimizes the system operating cost. As can be seen from fig. 7, the stored energy is charged and discharged under the guidance of electricity prices, and is charged at the valley electricity prices and when the electric load is small for the subsequent peak time, such as 00: 00-00: 30. 03: 45-04: 00 equal time period; discharge at peak power rates and at higher electrical loads to reduce operating costs, such as 12: 00-12: 15. 17: 30-18: 45, etc. periods of time. As can be seen from fig. 8, in the valley electricity price and flat electricity price stage, the light storage and charging integrated station purchases electricity from the main power grid to meet the electricity demand. When the electricity price is the peak electricity price, the photovoltaic power generation system and the electric energy storage equipment in the light storage and charging integrated station generate electric energy to avoid purchasing electricity from a main power grid, so that the operation cost of the power station is reduced.
In summary, the embodiment of the application focuses on the comprehensive service scene of the electric vehicle charging station integrating photovoltaic and energy storage systems, and coordinates the output conditions of all energy storage systems in the station by combining a time-of-use electricity price mechanism, so that the optimal operation economy of the light storage and charging integrated station is realized, and the real-time optimization of the light storage and charging integrated station is realized; by estimating the optimal strategy function based on the DDPG algorithm, the dimension disaster can be effectively avoided, the information of the whole action domain can be saved, and the local consumption of new energy resources is realized.
Those skilled in the art should understand that the above-mentioned application types of the input box are only examples, and other existing or future application types of the input box, such as those applicable to the embodiments of the present invention, should be included in the scope of the present invention and are also included herein by reference.
Those skilled in the art will appreciate that the various network elements shown in fig. 2 for simplicity only may be fewer than those in an actual network, but such omissions are clearly not to be considered as a prerequisite to a clear and complete disclosure of the inventive embodiments.
It should be understood by those skilled in the art that the foregoing description of determining the invoking policy according to the user information is only for better illustrating the technical solutions of the embodiments of the present invention, and is not intended to limit the embodiments of the present invention. Any method of determining the invoking policy based on the user attributes is included in the scope of embodiments of the present invention.
Those of ordinary skill in the art will understand that: the drawings are merely schematic representations of one embodiment, and the flow charts in the drawings are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A method for operating a light storage and charging integrated station facing power grid electric quantity balance and new energy consumption is characterized by comprising the following steps:
s1, based on a comprehensive service scene of the optical storage and charging integrated station, establishing a combined operation optimization model of the optical storage and charging integrated station by taking the minimum operation cost of the optical storage and charging integrated station as a target, taking power balance constraint, main power grid interaction power constraint and equipment operation constraint as constraint conditions and taking real-time output of an energy storage unit as a decision variable;
s2, converting the dynamic scheduling problem in the combined operation optimization model into a reinforcement learning network of a strategy decision problem by using a Markov decision process method;
s3, training the reinforcement learning network in an off-line mode by adopting historical system state data based on a Deep Deterministic strategy gradient (DDPG) algorithm to obtain a trained reinforcement learning network;
and S4, according to the trained reinforcement learning network, carrying out scheduling cost minimum economic optimization calculation on the combined operation optimization model of the light storage and charging integrated station by combining a time-of-use electricity price mechanism to obtain a real-time self-optimization-trending operation output result of the energy storage system in the light storage and charging integrated station, and further realizing the optimal operation of the light storage and charging integrated station.
2. The method of claim 1, wherein the joint operational optimization model comprises:
the objective function is shown in the following equation (1):
F=min(C E +C BES ) (I)
wherein F is the running cost of the light storage and charging integrated station, C E Power purchase from power grid for light storage and charging integrated stationThe utility model relates to a novel water-saving device,
Figure QLYQS_1
P grid (t) the power of the system exchanging electric power with the main power grid in the time period t, wherein the positive numerical value represents that the system purchases electricity to the main power grid, and the negative numerical value represents that the system carries out surplus electricity networking; epsilon e (t) electricity prices for time period t; Δ t is the time gap length; c BES For the depreciation cost of charging and discharging of the electric energy storage>
Figure QLYQS_2
P BES (t) is the charging or discharging power of the electrical energy storage in a time period t, a positive value indicates that the electrical energy storage is in a discharging state, and a negative value indicates that the electrical energy storage is in a charging state; rho BES Depreciation cost factors for electrical energy storage;
the constraints are as follows:
1) And power balance constraint:
at time t, the electric power balance constraint is as shown in the following equation (2):
P grid (t)+P pv (t)+P BES (t)=P load (t) (2)
wherein, P pv (t) is photovoltaic power generation power; p load (t) a customer electrical load demand for a time period t;
2) The main grid interaction power constraint is shown as the following formula (3):
Figure QLYQS_3
wherein the content of the first and second substances,
Figure QLYQS_4
and &>
Figure QLYQS_5
Respectively is the lower limit and the upper limit of the interactive power of the system and the main power grid;
3) The plant operating constraints are shown in the following equations (2) to (3):
Figure QLYQS_6
wherein the content of the first and second substances,
Figure QLYQS_7
and &>
Figure QLYQS_8
A lower limit and an upper limit of the electrical energy storage charging/discharging power, respectively;
for electrical energy storage devices, the constraints are shown in equations (3) - (4) below:
Figure QLYQS_9
/>
Figure QLYQS_10
wherein the content of the first and second substances,
Figure QLYQS_11
and &>
Figure QLYQS_12
Respectively the lower limit and the upper limit of the electric energy storage charge state; c SOC (t) is the state of charge of the electrical energy storage over time period t; q BES Capacity to store energy for electricity; />
Figure QLYQS_13
The state of charge at the beginning of the electrical energy storage; eta BES A charge/discharge coefficient for electrical energy storage;
Figure QLYQS_14
Figure QLYQS_15
η ch and η dis Charging efficiency and discharging efficiency of the electrical energy storage are respectively.
3. The method according to claim 1, wherein said step S2 comprises:
converting the problem of minimizing the operation cost of the light storage and charging integrated power station into a reward maximization form of the intelligent agent, as shown in the following formula (5):
Figure QLYQS_16
wherein r is t (s t ,a t ) Total reward value, s, for an agent to obtain during a scheduled time period t t For the observation state, s, of the light storage and charging integrated power station in the scheduling time period t t =(P load (t)),P pv (t),C soc (t-1),t},P load (t),P pv (t),C soc (t-1), wherein t is the demand of the user electric load, the photovoltaic power generation power, the electric energy storage charge state and the scheduling time interval; a is t For dynamic economic dispatching action of an energy storage system of a light storage and charging integrated power station, the action in the integrated power station can be controlled by the output condition P of equipment in a time period t BES (t) represents a t ={P BES (t)};
Figure QLYQS_17
Is the scaling of the cost value, (C) E (s t ,a t ) For agent in state s t The cost of purchasing electricity from the power grid by the light storage and charging integrated station in the time period t is lower; c BES (s t ,a t ) For agent in state s t Next, the charging and discharging depreciation cost of the electric energy storage in the time period t is reduced;
the action-value function Q using the following formula (6) π (s, a) in state s for energy storage system of light storage and charging integrated station l Dynamic economic dispatch action a of time l Performing an evaluation, the action-value function Q π The larger (s, a), the l The more excellent:
Figure QLYQS_18
wherein E is π () expectation under optimal target strategy pi; gamma ray k ∈[0,1]A discount factor, which represents the influence of the jackpot prize at a future time, γ k The larger the reward, the more important the reward for the future; r is t+k A total reward value obtained for the agent over time period t + k; s t+k The state of the light storage and charging integrated station in a time period t + k is shown; a is a t+k The action executed by the light storage and charging integrated station in the time period t + k, k belongs to N * The generation of the intelligent agent cycle learning is represented;
an optimal target strategy pi is obtained to maximize the action-value function according to the following equation (7):
Figure QLYQS_19
where A is the action set of the agent action.
4. The method according to claim 1, wherein the step S3 comprises:
the historical system state data is an observation state of the light storage and charging integrated power station, and comprises system electric load demand, photovoltaic power generation power, electric energy storage charge state and scheduling time interval of the light storage and charging integrated power station;
the reinforcement learning network comprises a value network and a strategy network, and pi (s | theta) passes through the strategy network π ) Sum network Q (s, a | θ) Q ) Creating two independent target networks pi' respectively (s | theta π′ ) And Q' (s, a | theta) Q′ ) As shown in the following formulas (8) and (9):
Figure QLYQS_20
Figure QLYQS_21
setting a cycle optimization period T, inputting historical system state data into a target network of a strategy network and a value network, after training a batch of data, updating parameters of the current networks of the strategy network and the value network by a DDPG algorithm through a gradient ascending or gradient descending algorithm, and then updating the parameters of the target networks of the strategy network and the value network through a soft updating method; and after T times of circulation, finishing off-line learning of the DDPG algorithm to obtain a trained reinforcement learning network.
5. The method of claim 4, wherein the step S4 comprises:
when receiving the scheduling task, in each time interval, according to the current state s of the system t Selecting and scheduling action a by using the trained reinforcement learning network t
Performing action a t And enters the next environmental state while awarding the prize r t
Then collecting state information s in the time period t +1 integrated station t+1 And taking the time-of-use electricity price information as a new sample, and performing dynamic scheduling decision of the time period, namely a real-time self-optimization operation output result of the energy storage system in the optical storage and charging integrated station.
6. The method of claim 1, wherein the action a according to the following equation (10) is performed during offline training of the reinforcement learning network t Training a reinforcement learning network:
a t =π(s tπ )+v t (10)
wherein v is t Is random noise.
CN202211625303.3A 2022-12-16 2022-12-16 Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid Pending CN115940289A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211625303.3A CN115940289A (en) 2022-12-16 2022-12-16 Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211625303.3A CN115940289A (en) 2022-12-16 2022-12-16 Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid

Publications (1)

Publication Number Publication Date
CN115940289A true CN115940289A (en) 2023-04-07

Family

ID=86654046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211625303.3A Pending CN115940289A (en) 2022-12-16 2022-12-16 Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid

Country Status (1)

Country Link
CN (1) CN115940289A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116451875A (en) * 2023-06-14 2023-07-18 国网吉林省电力有限公司经济技术研究院 Optical storage and filling integrated station capacity optimization configuration method
CN117254505A (en) * 2023-09-22 2023-12-19 南方电网调峰调频(广东)储能科技有限公司 Energy storage power station optimal operation mode decision method and system based on data processing
CN117879016A (en) * 2024-03-11 2024-04-12 国网江西省电力有限公司经济技术研究院 Charging station configuration optimization method and system based on optical charge integration

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116451875A (en) * 2023-06-14 2023-07-18 国网吉林省电力有限公司经济技术研究院 Optical storage and filling integrated station capacity optimization configuration method
CN117254505A (en) * 2023-09-22 2023-12-19 南方电网调峰调频(广东)储能科技有限公司 Energy storage power station optimal operation mode decision method and system based on data processing
CN117254505B (en) * 2023-09-22 2024-03-26 南方电网调峰调频(广东)储能科技有限公司 Energy storage power station optimal operation mode decision method and system based on data processing
CN117879016A (en) * 2024-03-11 2024-04-12 国网江西省电力有限公司经济技术研究院 Charging station configuration optimization method and system based on optical charge integration

Similar Documents

Publication Publication Date Title
Ju et al. Multi-objective stochastic scheduling optimization model for connecting a virtual power plant to wind-photovoltaic-electric vehicles considering uncertainties and demand response
Wang et al. Incentive mechanism for sharing distributed energy resources
Lujano-Rojas et al. Optimum residential load management strategy for real time pricing (RTP) demand response programs
Machlev et al. A review of optimal control methods for energy storage systems-energy trading, energy balancing and electric vehicles
CN115940289A (en) Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid
Jiao et al. Online optimal dispatch based on combined robust and stochastic model predictive control for a microgrid including EV charging station
Abedi et al. Battery energy storage control using a reinforcement learning approach with cyclic time-dependent Markov process
CN112821465B (en) Industrial microgrid load optimization scheduling method and system containing cogeneration
Zhang et al. Bi-level stochastic real-time pricing model in multi-energy generation system: A reinforcement learning approach
CN113326994A (en) Virtual power plant energy collaborative optimization method considering source load storage interaction
Bagheri et al. Stochastic optimization and scenario generation for peak load shaving in Smart District microgrid: sizing and operation
Dong et al. Optimal scheduling framework of electricity-gas-heat integrated energy system based on asynchronous advantage actor-critic algorithm
Shi et al. Lyapunov optimization in online battery energy storage system control for commercial buildings
Liu et al. Impact of industrial virtual power plant on renewable energy integration
Hayati et al. A Two-Stage Stochastic Optimization Scheduling Approach for Integrating Renewable Energy Sources and Deferrable Demand in the Spinning Reserve Market
Yin et al. Collaborative decision-making model for capacity allocation of photovoltaics energy storage system under Energy Internet in China
Alikhani et al. Optimal implementation of consumer demand response program with consideration of uncertain generation in a microgrid
Chen Energy-use Internet and friendly interaction with power grid: A perspective
CN112510690A (en) Optimal scheduling method and system considering wind-fire-storage combination and demand response reward and punishment
Qiu et al. Local integrated energy system operational optimization considering multi‐type uncertainties: A reinforcement learning approach based on improved TD3 algorithm
CN114498769B (en) High-proportion wind-solar island micro-grid group energy scheduling method and system
Vanitha et al. A hybrid approach for optimal energy management system of internet of things enabled residential buildings in smart grid
McIlvenna et al. Investigating the impact of stochasticity in microgrid energy management
Thangavelu et al. Transactive energy management systems: Mathematical models and formulations
CN111062513B (en) Distributed community energy trading system and method based on self-adaptive consensus mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination