CN115940289A

CN115940289A - Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid

Info

Publication number: CN115940289A
Application number: CN202211625303.3A
Authority: CN
Inventors: 刘曌; 段玉戈; 许寅; 孙庆凯; 王希豪; 王小君; 和敬涵; 王颖
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2023-04-07

Abstract

The invention provides a method for operating a light storage and charging integrated station for power balance and new energy consumption of a power grid, which comprises the following steps of: based on a comprehensive service scene of the light storage and charging integrated station, establishing a combined operation optimization model of the light storage and charging integrated station by taking the minimum operation cost of the light storage and charging integrated station as a target, taking power balance constraint, main grid interaction power constraint and equipment operation constraint as constraint conditions and taking the real-time output of an energy storage unit as a decision variable; a reinforcement learning network for converting the Markov decision process method into a strategy decision problem; the method is characterized in that historical system state data is adopted, a reinforcement learning network is trained in an off-line mode based on a DDPG algorithm, a time-of-use electricity price mechanism is combined to carry out scheduling cost minimum economic optimization calculation on a combined operation optimization model of the optical storage and charging integrated station, a real-time self-optimization operation output result of an energy storage system in the optical storage and charging integrated station is obtained, dynamic optimization operation of the optical storage and charging integrated station is further achieved, and dimension disasters in calculation can be effectively avoided.

Description

Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid

Technical Field

The invention relates to the technical field of operation scheduling of electric vehicle charging stations, in particular to a light storage and charging integrated station operation method for power grid electric quantity balance and new energy consumption.

Background

In the background of traffic electrification, planning and design of charging facilities and power distribution networks need to be considered to adapt to the large-scale development trend of electric automobiles. Meanwhile, energy storage and photovoltaics have received much attention due to improvements in technical performance and reduction in cost. Photovoltaic power generation has low cost, green clear advantage, but it is influenced by external environment great, and it presents certain volatility to exert oneself, owing to dispose energy storage system and can further strengthen the compensation effect on the spot to electric automobile charging load, can realize energy translation on time and space through the charge-discharge action of management energy storage battery, alleviates the power supply pressure of electric wire netting at peak period. Therefore, a diversified application form organically combining the photovoltaic, the energy storage and the electric automobile is an important way for realizing the connection of the electric automobile charging station and the renewable energy source, and the influence of the charging behavior of the electric automobile on a power grid can be effectively reduced.

In the existing method, related researches are relatively few at present for the optimal scheduling of an integrated comprehensive power station integrating photovoltaic and energy storage, and the problem of coordination, optimization and complementation of various resources in the station is mainly solved based on various control algorithms of traditional optimization modeling. The method mainly comprises a self-adaptive robust energy-standby cooperative optimization scheduling method of the light storage charging tower, which aims at minimizing the total daily running cost, and a four-stage intelligent optimization control algorithm of integrating the electric vehicle bidirectional charging station based on photovoltaic power generation and fixed battery energy storage with commercial buildings. The algorithm reduces the operation cost related to customer satisfaction to the maximum extent on the premise of considering potential uncertainty, and balances the real-time supply and demand among source, storage and load by adjusting the plan. However, the optimization model is mostly constructed with the maximum profit of the charging station as a target, and for the optical storage charging and discharging integrated station, while considering the economic operation in the station, the problem of local consumption of new energy still needs to be further considered. And the optimal operation scheduling method at the present stage mostly focuses on day-ahead scheduling, so that the method is limited to a fixed scheduling plan and cannot dynamically respond to random changes of sources and loads. Meanwhile, the existing optimized operation model is based on the traditional mathematical optimization modeling, and the method still depends on the accurate prediction of renewable energy sources and loads. With the popularization of the electricity consumption information acquisition system, the application of the data-driven machine learning method in the aspect of optimizing the operation of the power system has attracted wide attention of scholars at home and abroad. In order to realize the goal of optimizing the output plan of various resources in the station to reduce the operation cost of the power station, the existing reinforcement learning algorithm describes the scheduling problem of each resource in the station as a constraint Markov decision process and provides a model-free method based on deep reinforcement learning. The method generates a constrained optimal in-station resource contribution plan by directly learning using a deep neural network. However, the conventional reinforcement learning method is well in processing the problem of small-scale discrete space, but the number of states obtained by discretization of the conventional reinforcement learning method is exponentially increased along with the increase of the space dimension in processing the continuous state variable task, that is, the problem of dimension disaster exists, and effective learning cannot be performed. When the combined economic operation problem of the light storage and charging integrated power station is solved, the traditional reinforcement learning method cannot be effectively solved because the load, the photovoltaic power generation and the charge state in the state space of the light storage and charging integrated power station are continuous variables. Meanwhile, the action of the in-station energy storage system is also a continuous variable, and discretization of the action space can fuzzify and decide a great deal of information in the action domain.

Therefore, aiming at the optical storage and charging integrated station, when considering the in-station economic operation, the problem of on-site consumption of new energy still needs to be further considered, how to coordinate the output condition of all energy storage systems in the station on the basis of comprehensively considering the integrated service scene of the electric vehicle charging station of the integrated photovoltaic and energy storage system, effectively avoid dimension disaster and save the whole action domain information, and further realize the optimal operation of the optical storage and charging station facing the power grid power balance and the new energy consumption, which is a problem that needs to be solved urgently in the prior art.

Disclosure of Invention

The invention provides a method for operating a light storage and charging integrated station for power balance and new energy consumption of a power grid, and aims to solve the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme.

The embodiment of the invention provides a method for operating a light storage and charging integrated station for power balance and new energy consumption of a power grid, which comprises the following steps:

s1, establishing a combined operation optimization model of the optical storage and charging integrated station by taking the minimum operation cost of the optical storage and charging integrated station as a target, taking power balance constraint, main power grid interaction power constraint and equipment operation constraint as constraint conditions and taking the real-time output of an energy storage unit as a decision variable based on a comprehensive service scene of the optical storage and charging integrated station;

s2, converting the dynamic scheduling problem in the combined operation optimization model into a reinforcement learning network of a strategy decision problem by using a Markov decision process method;

s3, training the reinforcement learning network in an off-line manner by adopting historical system state data based on a deep deterministic strategy gradient (DDPG) algorithm to obtain a trained reinforcement learning network;

and S4, according to the trained reinforcement learning network, combining a time-of-use electricity price mechanism to carry out scheduling cost minimum economic optimization calculation on the combined operation optimization model of the light storage and charging integrated station to obtain a real-time self-optimization-approaching operation output result of the energy storage system in the light storage and charging integrated station, and further realize the optimal operation of the light storage and charging integrated station.

Preferably, the joint operation optimization model includes:

the objective function is shown in the following equation (1):

F＝min(C _E +C _BES ) (1)

wherein F is the operation cost of the light storage and charging integrated station, and C _E The cost of purchasing electricity from the power grid for the light storage and charging integration station,

P _grid (t) the power of the system and the main power grid for power exchange in a time period t, wherein the positive value represents that the system purchases power from the main power grid, and the negative value represents that the system carries out surplus power online;

ε _e (t) electricity prices for time period t; Δ t is the time gap length; c _BES The cost is reduced for the charging and discharging of the electric energy storage,

P _BES (t) is the charging or discharging power of the electrical energy storage in a time period t, a positive value indicates that the electrical energy storage is in a discharging state, and a negative value indicates that the electrical energy storage is in a charging state; ρ is a unit of a gradient _BES Depreciation cost factors for electrical energy storage;

the constraints are as follows:

1) And (3) power balance constraint:

at time t, the electric power balance constraint is as shown in the following equation (2):

P _grid (t)+P _pv (t)+P _BES (t)＝P _load (t) (2)

wherein, P _pv (t) is photovoltaic power generation power; p _load (t) a customer electrical load demand for a time period t;

2) The main grid interaction power constraint is shown as the following formula (3):

/>

wherein the content of the first and second substances,

and &>

Respectively is the lower limit and the upper limit of the interactive power of the system and the main power grid;

3) The plant operating constraints are shown in the following equations (2) - (3):

wherein the content of the first and second substances,

and &>

Lower and upper limits for electrical energy storage charge/discharge power, respectively;

for electrical energy storage devices, the constraints are shown in equations (3) - (4) below:

wherein the content of the first and second substances,

and &>

Respectively the lower limit and the upper limit of the electric energy storage charge state; c _SOC (t) is the state of charge of the electrical energy storage over time period t; q _BES Capacity to store energy for electricity; />

The charge state of the initial electric energy storage is obtained; eta _BES A charge/discharge coefficient for electrical energy storage; />

η _ch And η _dis Charging efficiency and discharging efficiency of the electrical energy storage are respectively.

Preferably, step S2 comprises:

converting the problem of minimizing the operation cost of the light storage and charging integrated power station into a reward maximization form of the intelligent agent, as shown in the following formula (5):

wherein r is _t (s _t ，a _t ) Total reward value, s, for an agent to obtain during a scheduled time period t _t For the observation state, s, of the light storage and charging integrated power station in the scheduling time period t _t ＝{P _load (t)，P _pv (t)，C _soc (t-1)，t}，P _load (t)，P _pv (t)，C _soc (t-1), wherein t is the demand of the user electric load, the photovoltaic power generation power, the electric energy storage charge state and the scheduling time interval; a is a _t For dynamic economic dispatching action of an energy storage system of a light storage and charging integrated power station, the action in the integrated power station can be controlled by the output condition P of equipment in a time period t _BES (t) represents a _t ＝{P _BES (t)}；

Is the scale by which the cost value is scaled, (C) _E (s _t ，a _t ) For agents in state s _i Next, the cost of purchasing electricity from the power grid by the light storage and charging integrated station in the time period t; c _BES (s _t ，a _t ) For agent in state S _i Next, the cost is reduced by charging and discharging of the electric energy storage in the time period t;

the action-value function Q using the following formula (6) _π (s, a) in state s for energy storage system of light storage and charging integrated station _l Dynamic economic dispatch action a of time _l Making an evaluation, the action-value function Q _π The larger (s, a), the _l The better:

wherein, E _π () expectation under optimal target strategy π; gamma ray ^k ∈[0，1]A discount factor, which represents the influence of the jackpot prize at a future time, γ ^k The larger the value, the more important the reward for the future; r is _t+k A total reward value obtained for the agent over time period t + k; s _t+k The state of the light storage and charging integrated station in a time period t + k is shown; a is _t+k The action executed by the light storage and charging integrated station in the time period t + k, k belongs to N ^* The generation of the intelligent agent cycle learning is represented;

an optimal target strategy pi is obtained to maximize the action-value function according to the following equation (7):

/>

where A is the action set of the agent action.

Preferably, step S3 comprises:

the historical system state data is an observation state of the optical storage and charging integrated power station and comprises system electrical load demand, photovoltaic power generation power, electrical energy storage charge state and a scheduling time period of the optical storage and charging integrated power station;

the reinforcement learning network comprises a value network and a strategy network, and the value network and the strategy network pass through pi (s | theta) _π ) Sum network Q (s, a | θ) _Q ) Creating two independent target networks pi' respectively (s | theta _π′ ) And Q' (s, a | theta) _Q′ ) As shown in the following formulas (8) and (9):

setting a cycle optimization period T, inputting historical system state data into a target network of a strategy network and a value network, after training a batch of data, updating parameters of the current networks of the strategy network and the value network by a DDPG algorithm through a gradient ascending or gradient descending algorithm, and then updating the parameters of the target networks of the strategy network and the value network through a soft updating method; and after T times of circulation, finishing off-line learning of the DDPG algorithm to obtain a trained reinforcement learning network.

Preferably, the S4 step includes:

when receiving the scheduling task, in each time interval, according to the current state s of the system _t Selecting scheduling action a by using the trained reinforcement learning network _t ；

Performing action a _t And enters the next environmental state while awarding the prize r _t ；

Then, the state information s in the integrated station in the time period t +1 is collected _t+1 And taking the time-of-use electricity price information as a new sample, and performing dynamic scheduling decision of the time period, namely a real-time self-optimization operation output result of the energy storage system in the optical storage and charging integrated station.

Preferably, when training the reinforcement learning network offline, the action a according to the following equation (10) _t Training the reinforcement learning network:

at＝π(s _t |θ _π )+v _t (10)

wherein v is _t Is random noise.

According to the technical scheme provided by the operation method of the light storage and charging integrated station for power grid electric quantity balance and new energy consumption, the mathematical model is converted into the reinforcement learning network capable of solving the reinforcement learning algorithm, the historical data is input, the DDPG algorithm is adopted for training, and the real-time parameters in the optimization period are input based on the trained reinforcement learning network, so that the economy of scheduling and operation of the light storage and charging integrated station is improved, the in-situ consumption of new energy resources is realized, and the problems of dimension disasters and suboptimal scheduling strategy selection in the discretization process are effectively solved by adopting the DDPG algorithm.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an operation method of a light storage and charging integrated station for power balance and new energy consumption of a power grid according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a framework of an operation method of a light storage and charging integrated station for power balance and new energy consumption of a power grid according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a reinforcement learning process according to the present embodiment;

FIG. 4 is a schematic diagram illustrating a reinforcement learning process according to the present embodiment;

FIG. 5 is a statistical representation of historical data according to an embodiment;

FIG. 6 is a time of use electricity price information diagram;

FIG. 7 is a trend graph of training results of DDPG algorithm;

FIG. 8 is a diagram of energy storage system scheduling results;

fig. 9 is a schematic diagram of a situation in which an integrated station exchanges electric power with a main grid.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Examples

The embodiment of the invention provides a method for operating a light storage and charging integrated station facing power grid electric quantity balance and new energy consumption, which specifically comprises the following steps as shown in fig. 1 and fig. 2:

the method comprises the following steps of S1, establishing a combined operation optimization model of the optical storage and charging integrated station based on a comprehensive service scene of the optical storage and charging integrated station by taking the minimization of the operation cost of the optical storage and charging integrated station as a target, taking power balance constraint, main power grid interaction power constraint and equipment operation constraint as constraint conditions and taking the real-time output of an energy storage unit as a decision variable.

The combined operation optimization model comprises the following steps:

the objective of the light storage and charging integrated station economic dispatch problem is to minimize the plant operating costs, which include the cost of purchasing electricity from the grid and the cost of charging and discharging depreciation of the electrical energy storage. The objective function is shown in the following equation (1):

F＝min(C _E +C _BES ) (1)

P _grid (t) the power of the system exchanging electric power with the main power grid in the time period t, wherein the positive numerical value represents that the system purchases electricity to the main power grid, and the negative numerical value represents that the system carries out surplus electricity networking; epsilon _e (t) electricity prices for time period t; Δ t is the time gap length; c _BES Charging and discharging depreciation cost for electric energy storage>

P _BES (t) is the charging or discharging power of the electrical energy storage in a time period t, a positive value indicates that the electrical energy storage is in a discharging state, and a negative value indicates that the electrical energy storage is in a charging state; ρ is a unit of a gradient _BES Depreciating the cost coefficient for the electric energy storage.

In order to ensure that the output power of the energy storage system in the optical storage and charging integrated station can realize real-time self-optimization operation as far as possible on the premise of meeting the equipment operation conditions, the following constraints including power balance constraint, main power grid interaction power constraint and equipment operation constraint need to be considered. The specific constraints are as follows:

and power balance constraint:

P _grid (t)+P _pv (t)+P _BES (t)＝P _load (t) (2)

considering the operation stability of the power grid side, the main grid has upper and lower limit constraint requirements on power interaction of the optical storage and charging integrated station, and the constraint on the power interaction of the main grid is as shown in the following formula (3):

wherein, the first and the second end of the pipe are connected with each other,

and &>

each device in the light storage and charging integrated station has an upper limit range and a lower limit range of device operation, and the device operation constraint is as shown in the following formula (4):

and &>

for the electrical energy storage device, it is also necessary to avoid the damage of deep charging and discharging to the electrical energy storage, so the state of charge (SOC) of the electrical energy storage is limited within a certain range, and in addition, to ensure the continuous and stable operation of the electrical energy storage, the electrical energy storage capacity is required to be equal from beginning to end of a scheduling period, and the constraints are shown in the following formulas (5) to (6):

and &>

η _cj And η _dis Charging efficiency and discharging efficiency of the electrical energy storage are respectively.

By modeling the optimal operation of the light storage and charging integrated station, the coordinated scheduling relationship among all devices in the integrated station can be determined, and the system power constraint and the device operation constraint are met.

And S2, converting the dynamic scheduling problem in the combined operation optimization model into a reinforcement learning network of a strategy decision problem by using a Markov decision process method.

The joint operation optimization model is specifically shown as the following formula (7):

the observation state of the light storage and charging integrated power station comprises the electric load demand of a user, the photovoltaic power generation power, the electric energy storage charge state and the scheduling time interval. For a unified station, its state is represented as: s _t ＝{P _load (t)，P _pv (t)，C _soc (t-1)，t}，P _load (t)，P _pv (t)，C _soc (t-1), t is the user electrical load demand and light respectivelyThe voltage generation power, the electric energy storage charge state and the scheduling time period.

During the time period t, the action in the light-storing and charging integrated station can be represented by the output condition of the equipment, and the action can be represented by P _BES (t) represents:

a _t ＝{P _BES (t)} (8)

the goal of optimal operation of the light storage and charging integrated station is to minimize the operating costs of the power station. Converting the problem of minimizing the operation cost of the light storage and charging integrated power station into a reward maximization form of the intelligent agent, as shown in the following formula (9):

wherein r is _i (s _i ，a _i ) Total reward value, s, for an agent to obtain during a scheduled time period t _i The observation state of the light storage and charging integrated station in the scheduling time period t is obtained; a is _t Performing dynamic economic dispatching action on an energy storage system of the optical storage and charging integrated power station;

is the scaling of the cost value, (C) _E (s _i ，a _i ) For agent in state s _i The cost of purchasing electricity from the power grid by the light storage and charging integrated station in the time period t is lower; c _BES (s _t ，a _t ) For agent in state s _t Next, the charging and discharging of the electrical energy storage of time period t depreciates the cost.

A certain state s in the operation process of the light storage and charging integrated station _t When the determination is made, the action-value function Q of the following equation (10) is used _π (s, a) charging the energy storage system of the light charging and storage integrated station at the state s _l Dynamic economic dispatch action a of time _l Performing an evaluation, the action-value function Q _π The larger (s, a), the _l The more excellent:

wherein, the first and the second end of the pipe are connected with each other,E _π () expectation under optimal target strategy pi; gamma ray ^k ∈[0，1]A discount factor, which represents the influence of the jackpot prize at a future time, γ ^k The larger the reward, the more important the reward for the future; r is _t+k A total reward value obtained for the agent over time period t + k; s _t+k The state of the light storage and charging integrated station in a time period t + k is shown; a is a _t+k The action executed by the light storage and charging integrated station in the time period t + k, k belongs to N ^* And represents the generation of the intelligent agent cycle learning.

The goal of the combined operation optimization model of the light storage and charging integrated station is to find an optimal strategy pi to maximize an action-value function, and the optimal target strategy pi is obtained according to the following formula (11):

wherein, A is the action set of the action of the agent.

And S3, training the reinforcement learning network in an off-line mode by adopting historical system state data based on a Deep Deterministic strategy gradient (DDPG) algorithm to obtain the trained reinforcement learning network. Specifically, as shown in the schematic diagram of the reinforcement learning process in fig. 3 and the schematic diagram of the reinforcement learning flow in fig. 4.

the basic components of the training include a set of states S characterizing the environment, a set of actions A characterizing the actions of the agent, and a reward r for the agent, the environment providing the agent with an observed state S during a time period t _t The intelligent agent belongs to S and is based on strategy pi and integrated station state S _t Generating an operating State a _t 。

Because the data of reinforcement learning has Markov property and does not meet the precondition assumption that the training neural network needs independent and same distribution of samples, in order to ensure the learning effect, when generating sample data, DDPG stores the data obtained by exploring from the environment in a playback pool R, and when updating each time, a value network and a strategy network randomly extract a part of samples from the data to optimize so as to reduce instability.

The reinforcement learning network comprises a value network and a strategy network, and pi (s | theta) passes through the strategy network _π ) Sum network Q (s, a | θ) _Q ) Creating two independent target networks pi' (s | theta) respectively _π′ ) And Q' (s, a | theta) _Q′ ) As shown in the following formulas (12) and (13):

setting a cycle optimization period T, inputting historical system state data into a target network of a strategy network and a value network, after training a batch of (mini-batch) data, updating parameters of a current (online) network of the strategy network and the value network by a DDPG algorithm through a gradient ascending or gradient descending algorithm, and then updating parameters of the target (target) network of the strategy network and the value network by a soft update (soft update) method; and after T times of circulation, completing the off-line learning of the DDPG algorithm to obtain the trained DDPG network.

The specific training content is as follows:

1) Value network training

In a value network, by minimizing a loss function L (θ) _Q ) To optimize the parameters, as shown in the following formula (14):

L(θ _Q )＝E(y _t -Q(s _t ，a _t |θ _Q ) ² ) (14)

wherein, theta _Q Is a parameter of a current network in the value network; y is _t Is the target Q value; e (.) is the expectation function.

y _t ＝r _t +γQ′(s _t+1 ，π′(s _t+1 |θ _π′ )|θ _Q′ ) (15)

Wherein r is _t A total reward value obtained for the agent over time period t; gamma is an element of [0,1 ]]Is a discount factor; q' is a target Q value before updating; pi' is a target strategy; theta _π′ Parameters of a target network in a policy network; theta _Q′ Are parameters of a target network in the value network.

In time period t, the light storage and charging integrated station executes action a _t Then enters the next state s _t+1 Namely, the updated state of charge value of the electrical energy storage, the electrical load observed in a period of time and the photovoltaic power generation value.

L(θ _Q ) About theta _Q The gradient of (a) is the following formula (16):

wherein, y _t -Q(s _t ，a _t |θ _Q ) Namely, the time sequence difference error is obtained, the network is updated according to the gradient rule, and the obtained updating formula is as follows:

wherein, mu _Q Is the value web learning rate.

2) Policy network training

In a policy network, it provides gradient information

As a direction of motion improvement. To update the policy network, a sampling policy gradient is used as follows:

updating a policy network parameter θ according to the deterministic policy gradient _π ：

Wherein, mu _π Is the policy web learning rate.

Further, the air conditioner is provided with a fan,

θ _Q′ ←τθ _Q +(1-τ)θ _Q′ (20)

θ _π′ ←τθ _π +(1-τ)θ _π′ (21)

wherein tau is a soft update coefficient, and tau < 1.

And S4, according to the trained reinforcement learning network, carrying out scheduling cost minimum economic optimization calculation on the combined operation optimization model of the light storage and charging integrated station by combining a time-of-use electricity price mechanism to obtain a real-time self-optimization-approaching operation output result of the energy storage system in the light storage and charging integrated station, and further realizing the optimal operation of the light storage and charging integrated station.

Then collecting state information s in the time period t +1 integrated station _t+1 And taking the time-of-use electricity price information as a new sample, and performing dynamic scheduling decision of the time period, namely a real-time self-optimization-trending operation output result of the energy storage system in the optical storage and charging integrated station.

Preferably, when training the reinforcement learning network offline, the action a according to the following equation (18) _t Training the reinforcement learning network:

a _t ＝π(s _t |θ _π )+v _t (22)

wherein v is _t Is random noise. By action a _t ＝{P _BES (t) } adding random noise v _t The exploration capability of the DDPG algorithm on the environment during interaction of the light storage and charging integrated station is increased, and a more optimized dynamic scheduling strategy is learned.

The following is a specific example of the operation method of the light storage and charging integrated station for power grid power balance and new energy consumption according to the embodiment:

fig. 5 shows a statistical chart of partial historical detection sample data of 2 month and 1 day 2021, which specifically includes data of electrical load and photovoltaic power generation power required by a user within one day. The electricity rate adopts a time-of-use electricity rate mechanism, as shown in fig. 6, wherein the peak time period is 12:00-19:00, plateau period 07:00-12: 00. 19:00-23:00, valley period 23:00-07:00. based on existing photovoltaic output and electric load historical data and by combining time-of-use electricity price information, the scheduling cycle length of the light storage and charging integrated station is set to be 24 hours, the interval between two adjacent time periods is set to be 15min, 5000episodes training is carried out on the intelligent agent by means of a DDPG algorithm, and then convergence is carried out, so that the optimal operation strategy of the electric energy storage system in the power station is obtained. Fig. 5 shows an average reward curve in the training process of the agent, and the algorithm converges after 5000 epsilon classes of training, so as to obtain an optimal dynamic economic dispatching strategy. It can be observed that the value of the reward obtained after the agent performs the scheduling decision is small, since the agent is initially unfamiliar with the environment. As the training process continues, the agent is constantly interacting with the environment and gaining experience, so the overall trend of the reward value is gradually increasing and eventually converging. This indicates that the agent has learned the optimal scheduling strategy that minimizes the system operating cost. As can be seen from fig. 7, the stored energy is charged and discharged under the guidance of electricity prices, and is charged at the valley electricity prices and when the electric load is small for the subsequent peak time, such as 00: 00-00: 30. 03: 45-04: 00 equal time period; discharge at peak power rates and at higher electrical loads to reduce operating costs, such as 12: 00-12: 15. 17: 30-18: 45, etc. periods of time. As can be seen from fig. 8, in the valley electricity price and flat electricity price stage, the light storage and charging integrated station purchases electricity from the main power grid to meet the electricity demand. When the electricity price is the peak electricity price, the photovoltaic power generation system and the electric energy storage equipment in the light storage and charging integrated station generate electric energy to avoid purchasing electricity from a main power grid, so that the operation cost of the power station is reduced.

In summary, the embodiment of the application focuses on the comprehensive service scene of the electric vehicle charging station integrating photovoltaic and energy storage systems, and coordinates the output conditions of all energy storage systems in the station by combining a time-of-use electricity price mechanism, so that the optimal operation economy of the light storage and charging integrated station is realized, and the real-time optimization of the light storage and charging integrated station is realized; by estimating the optimal strategy function based on the DDPG algorithm, the dimension disaster can be effectively avoided, the information of the whole action domain can be saved, and the local consumption of new energy resources is realized.

Those skilled in the art should understand that the above-mentioned application types of the input box are only examples, and other existing or future application types of the input box, such as those applicable to the embodiments of the present invention, should be included in the scope of the present invention and are also included herein by reference.

Those skilled in the art will appreciate that the various network elements shown in fig. 2 for simplicity only may be fewer than those in an actual network, but such omissions are clearly not to be considered as a prerequisite to a clear and complete disclosure of the inventive embodiments.

It should be understood by those skilled in the art that the foregoing description of determining the invoking policy according to the user information is only for better illustrating the technical solutions of the embodiments of the present invention, and is not intended to limit the embodiments of the present invention. Any method of determining the invoking policy based on the user attributes is included in the scope of embodiments of the present invention.

Those of ordinary skill in the art will understand that: the drawings are merely schematic representations of one embodiment, and the flow charts in the drawings are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for operating a light storage and charging integrated station facing power grid electric quantity balance and new energy consumption is characterized by comprising the following steps:

s1, based on a comprehensive service scene of the optical storage and charging integrated station, establishing a combined operation optimization model of the optical storage and charging integrated station by taking the minimum operation cost of the optical storage and charging integrated station as a target, taking power balance constraint, main power grid interaction power constraint and equipment operation constraint as constraint conditions and taking real-time output of an energy storage unit as a decision variable;

s3, training the reinforcement learning network in an off-line mode by adopting historical system state data based on a Deep Deterministic strategy gradient (DDPG) algorithm to obtain a trained reinforcement learning network;

and S4, according to the trained reinforcement learning network, carrying out scheduling cost minimum economic optimization calculation on the combined operation optimization model of the light storage and charging integrated station by combining a time-of-use electricity price mechanism to obtain a real-time self-optimization-trending operation output result of the energy storage system in the light storage and charging integrated station, and further realizing the optimal operation of the light storage and charging integrated station.

2. The method of claim 1, wherein the joint operational optimization model comprises:

the objective function is shown in the following equation (1):

F＝min(C _E +C _BES ) (I)

wherein F is the running cost of the light storage and charging integrated station, C _E Power purchase from power grid for light storage and charging integrated stationThe utility model relates to a novel water-saving device,

P _grid (t) the power of the system exchanging electric power with the main power grid in the time period t, wherein the positive numerical value represents that the system purchases electricity to the main power grid, and the negative numerical value represents that the system carries out surplus electricity networking; epsilon _e (t) electricity prices for time period t; Δ t is the time gap length; c _BES For the depreciation cost of charging and discharging of the electric energy storage>

P _BES (t) is the charging or discharging power of the electrical energy storage in a time period t, a positive value indicates that the electrical energy storage is in a discharging state, and a negative value indicates that the electrical energy storage is in a charging state; rho _BES Depreciation cost factors for electrical energy storage;

the constraints are as follows:

1) And power balance constraint:

P _grid (t)+P _pv (t)+P _BES (t)＝P _load (t) (2)

wherein the content of the first and second substances,

and &>

3) The plant operating constraints are shown in the following equations (2) to (3):

wherein the content of the first and second substances,

and &>

A lower limit and an upper limit of the electrical energy storage charging/discharging power, respectively;

/>

wherein the content of the first and second substances,

and &>

The state of charge at the beginning of the electrical energy storage; eta _BES A charge/discharge coefficient for electrical energy storage;

3. The method according to claim 1, wherein said step S2 comprises:

wherein r is _t (s _t ，a _t ) Total reward value, s, for an agent to obtain during a scheduled time period t _t For the observation state, s, of the light storage and charging integrated power station in the scheduling time period t _t ＝(P _load (t))，P _pv (t)，C _soc (t-1)，t}，P _load (t)，P _pv (t)，C _soc (t-1), wherein t is the demand of the user electric load, the photovoltaic power generation power, the electric energy storage charge state and the scheduling time interval; a is _t For dynamic economic dispatching action of an energy storage system of a light storage and charging integrated power station, the action in the integrated power station can be controlled by the output condition P of equipment in a time period t _BES (t) represents a _t ＝{P _BES (t)}；

Is the scaling of the cost value, (C) _E (s _t ，a _t ) For agent in state s _t The cost of purchasing electricity from the power grid by the light storage and charging integrated station in the time period t is lower; c _BES (s _t ，a _t ) For agent in state s _t Next, the charging and discharging depreciation cost of the electric energy storage in the time period t is reduced;

the action-value function Q using the following formula (6) _π (s, a) in state s for energy storage system of light storage and charging integrated station _l Dynamic economic dispatch action a of time _l Performing an evaluation, the action-value function Q _π The larger (s, a), the _l The more excellent:

wherein E is _π () expectation under optimal target strategy pi; gamma ray ^k ∈[0，1]A discount factor, which represents the influence of the jackpot prize at a future time, γ ^k The larger the reward, the more important the reward for the future; r is _t+k A total reward value obtained for the agent over time period t + k; s _t+k The state of the light storage and charging integrated station in a time period t + k is shown; a is a _t+k The action executed by the light storage and charging integrated station in the time period t + k, k belongs to N ^* The generation of the intelligent agent cycle learning is represented;

where A is the action set of the agent action.

4. The method according to claim 1, wherein the step S3 comprises:

the historical system state data is an observation state of the light storage and charging integrated power station, and comprises system electric load demand, photovoltaic power generation power, electric energy storage charge state and scheduling time interval of the light storage and charging integrated power station;

the reinforcement learning network comprises a value network and a strategy network, and pi (s | theta) passes through the strategy network _π ) Sum network Q (s, a | θ) _Q ) Creating two independent target networks pi' respectively (s | theta _π′ ) And Q' (s, a | theta) _Q′ ) As shown in the following formulas (8) and (9):

5. The method of claim 4, wherein the step S4 comprises:

when receiving the scheduling task, in each time interval, according to the current state s of the system _t Selecting and scheduling action a by using the trained reinforcement learning network _t ；

Then collecting state information s in the time period t +1 integrated station _t+1 And taking the time-of-use electricity price information as a new sample, and performing dynamic scheduling decision of the time period, namely a real-time self-optimization operation output result of the energy storage system in the optical storage and charging integrated station.

6. The method of claim 1, wherein the action a according to the following equation (10) is performed during offline training of the reinforcement learning network _t Training a reinforcement learning network:

a _t ＝π(s _t |θ _π )+v _t (10)

wherein v is _t Is random noise.