CN112529727A

CN112529727A - Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning

Info

Publication number: CN112529727A
Application number: CN202011229149.9A
Authority: CN
Inventors: 高强; 王昕�; 潘弘; 林烨; 叶丽娜; 杨强; 杨迷霞
Original assignee: Taizhou Hongyuan Electric Power Design Institute Co ltd; Taizhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Taizhou Hongyuan Electric Power Design Institute Co ltd; Taizhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2020-11-06
Filing date: 2020-11-06
Publication date: 2021-03-19

Abstract

The invention discloses a micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning, wherein firstly, a simulation model corresponding to a controlled new energy micro-grid is established according to the controlled new energy micro-grid, and the micro-grid energy storage scheduling is converted into a Markov decision problem according to the micro-grid simulation model; secondly, training the established energy storage system intelligent agent according to the new energy power generation, load and electricity price data before the micro-grid, and storing network parameters and finishing training after the reward obtained by the energy storage system intelligent agent from the environment is stabilized in the training; and finally, the trained energy storage system intelligent bodies are used for real-time scheduling of the microgrid energy storage system, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time in hours. The invention fully utilizes the renewable energy in the microgrid, reduces the impact of the renewable energy on the main power grid and realizes the minimization of the operating cost of the microgrid.

Description

Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of power dispatching engineering, and particularly relates to real-time dispatching of an energy storage system of a smart power grid.

Background

With the gradual consumption of traditional fossil energy, the obvious problem of environmental pollution and the continuous increase of energy demand brought by the development of human society, the energy and environmental problems are more and more concerned by countries all over the world, and the development and utilization of renewable energy are effective means for solving the environmental pollution and energy crisis. However, due to the intermittency and fluctuation of renewable energy, the stability of the power grid is affected by directly connecting large-scale renewable energy sources to the grid. In order to fully utilize renewable energy, the application of micro-grid has been widely regarded. The micro-grid is a small-sized power system composed of various distributed power supplies, loads, energy storage and related protection control devices, and the renewable energy is connected to a power distribution network in a micro-grid mode to effectively exert distributed electric energy.

Due to the randomness, intermittence and fluctuation characteristics of diversified loads of renewable energy sources, the requirements of the operation control of the micro-grid on flexibility and instantaneity are higher and higher. Because the microgrid faces uncertainty of an energy source side and a load side, accurate modeling of the microgrid is difficult to complete, and an optimization decision scene of the microgrid is difficult to express as a clear mathematical expression, so that designing a stable energy scheduling strategy of the microgrid is important for developing the microgrid under the condition of considering new energy and random fluctuation of the load.

The energy storage system is an important component of the microgrid, can improve the consumption capacity of the microgrid for distributed new energy through a reasonable charging and discharging strategy, and can also be matched with an energy management strategy of the microgrid to reduce the running cost of the microgrid in the daily running of the microgrid. The key point of solving the problem of optimizing the operation of the micro-grid is to research the scheduling strategy of the micro-grid by taking the control method of the energy storage system as a core.

In recent years, with the rise of artificial intelligence, studies for applying reinforcement learning to power systems have been increasing. The reinforcement learning method is a model-free method for solving sequential decisions, and learns the strategy of obtaining the maximum reward in the environment by obtaining feedback through interaction between the intelligent agent and the uncertain environment. Energy scheduling of the microgrid energy storage system can be abstracted to a Markov decision process, and the problem that decision optimization of a microgrid is difficult to accurately model and real-time calculation is large due to overlarge state dimension can be effectively solved by applying reinforcement learning to microgrid energy management.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a micro-grid energy storage scheduling method based on deep reinforcement learning, which can be well adapted to real-time energy scheduling of a micro-grid.

In order to solve the technical problems, the invention adopts the following technical scheme: the micro-grid energy storage scheduling method based on deep reinforcement learning comprises the following steps:

firstly, establishing a simulation model corresponding to the controlled new energy microgrid according to the new energy microgrid, and converting the microgrid energy storage scheduling into a Markov decision problem according to the microgrid simulation model so as to establish an energy storage system agent and corresponding observation state quantity, action value and reward function;

secondly, training the established energy storage system intelligent agent according to the new energy power generation, load and electricity price data before the micro-grid, and storing network parameters and finishing training after the reward obtained by the energy storage system intelligent agent from the environment is stabilized in the training;

and finally, the trained energy storage system intelligent bodies are used for real-time scheduling of the microgrid energy storage system, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time in hours.

Preferably, the establishing of the microgrid simulation model comprises the following sub-steps:

(1.1) establishing an energy storage system model: representing an energy storage system using a dynamic model, P_b(t) represents the charging or discharging power of the energy storage system at each time t, the state of charge of the energy storage system is represented by SOC (t), and the SOC dynamic model of the energy storage system is as follows:

xi and eta respectively represent the discharging efficiency and the charging efficiency of the energy storage system, and at any moment, the energy storage system can only be charged or discharged or is idle; e_cRepresentative is the capacity of the energy storage system; Δ t represents the time interval for charging or discharging the stored energy;

(1.2) setting limiting conditions: for the established energy storage model, the charging and discharging power P of the energy storage model is_b(t), state of charge SOC is limited:

P_bmin≤P_b(t)≤P_bmax (2)

SOC_min≤SOC(t)≤SOC_max (3)

wherein, P_bmin、P_bmaxRespectively representing the minimum value and the maximum value of the charge and discharge power of the energy storage system; SOC_min、SOC_maxRespectively representing the minimum value and the maximum value of the state of charge SOC (t) of the energy storage system, and the SOC_max1 or less, SOC_minGreater than or equal to 0;

(1.3) setting a micro-grid power balance limit: the power balance relationship is as follows:

P_balance(t)＝P_renew(t)-P_load(t) (4)

P_net(t)＝P_balance(t)+P_b(t) (5)

wherein, P_renewIs the total output power, P, of the distributed new energy in the microgrid at the moment of t_loadFor the power demand of the load in the microgrid at time t, P_balanceIs the difference between renewable energy power and load power demand, if P_balanceIf the power generation amount is more than 0, the power generation amount of the renewable energy is excessive, otherwise, the power generation amount of the renewable energy is insufficient; p_b(t) positive indicates that the energy storage system emits power, and negative indicates that the energy storage system absorbs power; p_netThe power of the transaction between the micro-grid and the main grid is represented, if the power is positive, the micro-grid outputs power to the main grid, and if the power is negative, the micro-grid purchases power to the main grid.

Preferably, establishing the energy storage system agent comprises the following substeps:

(2.1) setting the observation state quantity required by the intelligent agent of the energy storage system: the method comprises the generating capacity, the load demand, the battery charge state, the electricity purchasing price from a power grid and the time of the day of the renewable energy, wherein the generating capacity and the load are Kw as units, the battery charge state range is 0-1, the electricity purchasing price is in units of yuan, the time is the hour integer time of the day, and the state space is as follows:

s_t∈S:{P_t ^W,P_t ^load,SOC_t,R_t,t} (6)

in the system state space S, P_t ^W、P_t ^loadRepresenting the renewable energy power generation power and the load power, SOC, respectively at time t_tRepresenting the state of charge, R, of the energy storage system at time t_tRepresenting the power grid electricity purchase price at the moment t, wherein t represents the moment of the intelligent agent;

(2.2) setting the action value of the intelligent agent of the energy storage system: the action of the energy storage system comprises charging, discharging and non-action, the specified action value is 1 to represent the full-rate discharging of the energy storage system, the action value is 2 to represent the full-rate charging of the system, the action value is 0 to represent the non-action of the energy storage system, and the action space is expressed as follows:

a_t∈A:{0,1,2} (7)

(2.3) setting a reward function of the intelligent agent of the energy storage system in the training process, and when the intelligent agent is in the state s_tTaking action of_tThe immediate reward obtained is set as the operating cost of the microgrid at time t, expressed as follows:

r_t＝P_net×R_t×Δt (8)

(2.4) establishing a decision method: approximating an action-cost function of the agent using a deep neural network, the agent receiving (7) a state quantity, inputting the state quantity into the deep neural network, the deep neural network outputting a state-action-cost Q (s, a) in an observed state, the state-action-cost function representing the agent in the observed state s_tExpectation of long-term return of time and action:

in the above formula, γ is a discount factor, whose value range is 0 to 1, and represents the importance degree of the long-term return, the state-action value output by the deep neural network corresponds to the action that can be taken by the energy storage agent, and the agent selects the action corresponding to the maximum Q value to perform the action.

Preferably, the training of the energy storage system agent comprises the following sub-steps:

(3.1) initializing a neural network by using a random weight theta, namely, a value function Q; initializing a playback memory unit D; simultaneously enabling theta 'to be equal to theta to initialize the target value function Q'; initializing an error storage unit P;

(3.2) acquiring day-ahead prediction data of the microgrid, including the state information of the microgrid required in the step (2.1), initializing a battery state SOC, preprocessing the initial state information and converting the initial state information into tensor, and adding noise to the new energy generating capacity, the load demand and the electricity price data during training;

(3.3) in each training period, selecting actions according to an epsilon-greedy strategy, and setting epsilon as follows:

ε＝0.5×(1/(i+1)) (10)

where i is the number of cycles of agent training, and is simultaneously [0,1 ]]Randomly generating a number within the range with equal probability, and if the number is greater than epsilon, selecting the action a corresponding to the action cost function for obtaining the maximum estimation value by the intelligent agent at the moment_t(ii) a If the number is less than epsilon, then only one action a can be selected randomly from the action space_t；

(3.4) agent in any state s_tExecuting the action according to the action selected in (3.3), and observing the reward r obtained after executing the action_tWhile shifting to the next state s_t+1；

(3.5) if so, the next state s_t+1Presence, will tuple(s)_t,a_t,r_t,s_t+1) Storing the data into a playback memory unit, and selecting small-batch data from the playback memory unit to train the intelligent neural network after the sample data stored in the memory playback unit meets the requirement of the minimum sample number;

(3.6) calculating the absolute value errors of the cost functions in different states in the playback memory unit, storing the absolute value errors after calculation of different tuples into an error storage unit, wherein the index of the error storage unit corresponds to the playback memory unit, and the calculation formula is as follows:

|R(t+1)+γmax_a[Q(s(t+1),a]-Q(s(t),a(t)| (11)

(3.7) according to the principle of preferentially learning tuples with larger errors, selecting the tuples from the playback memory unit to train the neural network after each time the agent interacts with the environment;

(3.8) inputting the state information st of the tuple selected from the playback memory unit to the value function Q, selecting the maximum action state value Q(s) output by the value function Q_t,a_t(ii) a θ), and Q(s) at that time are compared_t,a_t) As supervisory information; then the status information s_t+1The motion index is also input into the value function Q, and the motion index corresponding to the maximum motion state value output at the moment is obtained; then obtaining the input s from the target network_t+1Q'(s) corresponding to the operation of index in the state of (1)_t+1,a_t+1(ii) a Theta') valueThen, the update target of the neural network at this time is:

(3.9) the parameters of the Q value network of the agent are updated according to the following formula:

in the above formula, θ_i、θ_i+1Respectively representing theta values at the ith and (i + 1) th updates, wherein alpha represents the learning rate of the neural network;

for the target value function Q ', every two training periods, let θ' be θ to update the deep neural network parameters of the target value function Q.

Preferably, said step (3.7) comprises the following sub-steps:

(3.7.1) initializing an Index list element Index-list;

(3.7.2) calculating Sum of absolute value errors Sum in the error storage unit;

(3.7.3) randomly generating N numbers from (0, Sum) according to the required tuple number N updated by the Q value network, and arranging the N numbers into a list Rand-list from small to large;

(3.7.4) sequentially adding each number in the error storage unit according to the sequence, and when the sum is larger than the minimum number in the Rand-list, storing the Index of the error which is added at the last time into an Index list Index-list, and removing the minimum number from the Rand-list;

(3.7.3) repeat N times (3.7.3) until N Index values are obtained, and fetch the corresponding tuple from the playback memory according to the Index list Index-list.

The invention also provides a micro-grid energy storage scheduling device based on deep reinforcement learning, which comprises:

energy storage system agent building block: converting the microgrid energy storage scheduling into a Markov decision problem according to a simulation model of the controlled new energy microgrid so as to establish an energy storage system intelligent agent;

energy storage system agent training module: training an energy storage system intelligent agent according to day-ahead prediction data of the microgrid, storing network parameters after rewards acquired by the energy storage system intelligent agent from the environment are stabilized in the training, and finishing the training;

energy storage system dispatch module: the trained energy storage system intelligent bodies are used for scheduling the energy storage system in the microgrid, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time counted in hours.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the micro-grid energy storage scheduling method based on deep reinforcement learning when executing the computer program.

The invention trains an energy storage system through prediction data, utilizes a deep neural network to fit a state-action cost function, and adopts a double Q network to process the over-estimation problem of DQN.

When the data are used for training, the prior experience playback technology is adopted, so that the data with poor learning effect can be preferentially learned, and the utilization efficiency of the data is improved.

The energy storage system which is well trained through reinforcement learning can meet the requirement of real-time scheduling, renewable energy sources in the microgrid are fully utilized, impact of the renewable energy sources on the main power grid is reduced, and the minimization of the operation cost of the microgrid is realized.

The following detailed description of the present invention will be provided in conjunction with the accompanying drawings.

Drawings

The invention is further described with reference to the accompanying drawings and the detailed description below:

FIG. 1 is a schematic diagram of energy flow in a hybrid microgrid system containing renewable energy sources;

FIG. 2 is a graph of a predicted value of the day-ahead power of the wind generating set and an actual generated power output value;

FIG. 3 is a graph of predicted values of the power of the local load before day and the actual required power;

FIG. 4 is a graph showing a predicted value and an actual value of a real-time electricity rate day ahead;

FIG. 5 is a graph illustrating the variation in daily rewards earned during energy storage system training;

FIG. 6 is a diagram of actual wind power generation, load demand, and energy storage system SOC variation.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Aiming at the characteristics that a microgrid system is difficult to accurately predict the random volatility and the intermittence of renewable energy sources and the load demand, the embodiment of the invention provides a microgrid energy storage scheduling method based on deep reinforcement learning, wherein an energy storage system intelligent agent obtains reward feedback through interaction with the environment to obtain an energy scheduling strategy of the energy storage system, and the method comprises the following steps:

step (1): establishing a simulation model corresponding to the controlled new energy microgrid according to the controlled new energy microgrid, and specifically comprising the following substeps:

(1.1) establishing an energy storage system model: representing an energy storage system using a dynamic model, P_b(t) represents the charging or discharging power of the energy storage at each time t, the state of charge of the energy storage battery is represented by SOC (t), and the SOC dynamic model of the energy storage battery is as follows:

xi and eta represent the discharge efficiency and the charge efficiency of the energy storage, and at any moment, the energy storage system can only be charged or discharged or idle; e_cRepresentative is the capacity of the energy storage system; Δ t represents the time interval for charging or discharging the stored energy.

(1.2) setting limiting conditions: for the established energy storage model, in order to ensure the normal work of the energy storage model, the charging and discharging power P of the energy storage model is carried out_b(t), state of charge SOC is limited:

P_bmin≤P_b(t)≤P_bmax (2)

SOC_min≤SOC(t)≤SOC_max (3)

wherein, P_bmin、P_bmaxThe method comprises the steps of representing the minimum value and the maximum value of the charge and discharge power of the energy storage system, wherein specific numerical values are set according to the capacity of a microgrid and the capacity of the energy storage system; SOC_min、SOC_maxRespectively representing the minimum value and the maximum value of the electric quantity state SOC (t) of the energy storage system, setting different upper and lower limits of the energy storage level according to the energy storage characteristics, and defining the known SOC by the SOC (t)_max1 or less, SOC_minIs greater than or equal to 0.

(1.3) setting a micro-grid power balance limit: the renewable energy is fully utilized, the intermittency of the new energy and the randomness of the load demand are stabilized, and the distributed renewable energy is preferentially considered to meet the load demand. The power balance relationship is as follows:

P_balance(t)＝P_renew(t)-P_load(t) (4)

P_net(t)＝P_balance(t)-P_b(t) (5)

wherein, P_renewIs the total output power, P, of the distributed new energy in the microgrid at the moment of t_loadFor the power demand of the load in the microgrid at time t, P_balanceIs the difference between renewable energy power and load power demand, if P_balanceIf the power generation amount is more than 0, the power generation amount of the renewable energy is excessive, otherwise, the power generation amount of the renewable energy is insufficient; p_b(t) positive indicates that the energy storage system absorbs power, and negative indicates that the energy storage system generates workRate; p_netThe power of the transaction between the micro-grid and the main grid is represented, if the power is positive, the micro-grid outputs power to the main grid, and if the power is negative, the micro-grid purchases power to the main grid.

And (1.4) taking the microgrid model as an environment for applying the energy storage intelligent agent, and setting the state quantity, the action value and the reward function required by the intelligent agent according to the model.

In this embodiment, a microgrid model including wind power generation is established, and a schematic diagram is shown in fig. 1. And in the day-ahead scheduling stage, the predicted wind turbine set, local load demand and the electricity purchase price of the micro-grid from the main grid in each time period of the future day are obtained by using the existing technical means. The data required to contain the prediction error are shown in fig. 2 to 4, and the prediction error satisfies the gaussian distribution.

Step (2): the method for establishing the energy storage system intelligent agent suitable for reinforcement learning specifically comprises the following substeps:

(2.1) setting the observation state quantity required by the intelligent agent: the states observed by the reinforcement learning agent in the environment comprise the generated energy of renewable energy sources, the load demand, the battery state of charge, the purchase price of electricity from a power grid and the time of the day. And respectively processing the state quantities of different units, wherein the generated energy and the load quantity take Kw as a unit, the battery charge state range is 0-1, the electricity purchase price takes Yuan as a unit, and the time is the hour of the day and the hour of the day. The state space is as follows:

s_t∈S:{P_t ^W,P_t ^load,SOC_t,R_t,t} (6)

(2.2) setting the action value of the agent: in the agent, the scheduling strategy is mainly implemented by controlling the actions of the energy storage system. The action of the energy storage system is mainly charging, discharging and non-action. The action value of 1 is defined to represent that the energy storage system discharges at the full rate, the action value of 2 represents that the system charges at the full rate, and the action value of 0 represents that the energy storage does not act.

a_t∈A:{0,1,2} (7)

In the system state space S, P_t ^W、P_t ^loadRepresenting the power generated by the renewable energy source at time t andload power, SOC_tRepresenting the state of charge, R, of the energy storage system at time t_tThe electricity purchase price of the power grid at the moment t is shown, and the moment t is shown as the time of the intelligent agent.

And (2.3) setting a reward function of the intelligent agent in the training process. When agent is in state s_tTaking action of_tThe obtained instant reward is set as the operation cost of the micro-grid at the time t, and is specifically expressed as follows:

r_t＝P_net×R_t×Δt (8)

(2.4) establishing a decision method: approximating an action-cost function of the agent using a deep neural network, the agent receiving (2.1) the state quantities, inputting the state quantities into the deep neural network, the neural network outputting a state-action-cost Q (s, a) at the input state, the agent then selecting the action corresponding to the maximum Q value to act, the state-action-cost function representing the agent in the observed state s_tExpectation of long-term return of time and action:

in the above equation, γ is a discount factor, and has a value ranging from 0 to 1, which indicates the importance of the long-term return. The state-action value output by the deep neural network corresponds to the action which can be taken by the energy storage intelligent agent, and the intelligent agent selects the action corresponding to the maximum Q value to act.

And (3): training the reinforcement learning agent established in the step (2), and specifically comprising the following substeps:

(3.1) initializing a neural network by using a random weight theta, namely, a value function Q; initializing a playback memory unit D; simultaneously enabling theta 'to be equal to theta to initialize the target value function Q'; the error storage unit P is initialized.

And (3.2) acquiring day-ahead prediction data of the microgrid, containing the required microgrid state information in the step (2.1), initializing the battery state SOC, and preprocessing the initial state information so as to input the initial state information into the neural network. Noise is added to new energy generated energy, load demand, electricity price data during training so that the intelligence can explore more situations, and reliability is enhanced.

And (3.3) selecting the action according to an epsilon-greedy strategy in each training period. Let ε be as follows:

ε＝0.5×(1/(i+1)) (10)

wherein i is the agent training number. Simultaneously in [0,1 ]]Randomly generating a number within the range with equal probability, and if the number is greater than epsilon, selecting the action a corresponding to the action cost function for obtaining the maximum estimation value by the intelligent agent at the moment_t(ii) a If the number is less than epsilon, then only one action a can be selected randomly from the action space_t。

(3.4) agent in any state s_tExecuting the action according to the action selected in (3.3), and observing the reward r obtained after executing the action_tWhile shifting to the next state s_t+1。

(3.5) if so, the next state s_t+1Presence, will tuple(s)_t,a_t,r_t,s_t+1) And storing the data into a playback memory unit, and selecting small-batch data from the playback memory unit to train the intelligent neural network after the sample data stored in the memory playback unit meets the requirement of the minimum sample number.

And (3.6) calculating absolute value errors of the value functions in different states in the playback memory unit, storing the absolute value errors after calculation of different tuples into an error storage unit, wherein the index of the error storage unit corresponds to the playback memory unit. The calculation formula is as follows:

|r_t+γmax_a[Q'(s_t+1,a；θ')]-Q(s_t,a_t；θ)| (11)

and (3.7) according to the principle of preferentially learning tuples with larger errors, after each time the agent interacts with the environment, selecting 32 tuple groups from the playback memory unit to train the neural network.

(3.8) State information s in the tuple to be selected from the playback memory_tInputting the value function Q, selecting the maximum action state value Q(s) output by the value function Q_t,a_t(ii) a θ), and Q(s) at that time are compared_t,a_t) As supervisory information; then the status information s_t+1The motion index is also input into the value function Q, and the motion index corresponding to the maximum motion state value output at the moment is obtained; then obtaining the input s from the target network_t+1Q'(s) corresponding to the operation of index in the state of (1)_t+1,a_t+1(ii) a θ') value. Then the update target of the neural network at this time is:

in the above formula, θ_i、θ_i+1The values of θ at the i-th and i + 1-th updates are shown, respectively, and α represents the learning rate of the neural network.

And (4): and (3) training an intelligent agent, namely an energy storage system in the microgrid, according to the day-ahead prediction data by the method in the step (3), storing network parameters after the reward obtained by the intelligent agent from the environment is stabilized in the training, applying the trained intelligent agent to a real-time management system of the microgrid, and performing charge-discharge control on the energy storage system at each energy scheduling time in hours according to the real-time generated energy and load requirements to realize economic scheduling of the microgrid.

Preferably, the step (3.7) specifically comprises the following sub-steps:

step 1: the Index list element Index-list is initialized.

Step 2: the Sum of absolute errors in the error storage unit is calculated.

Step 3: and randomly generating N numbers from (0, Sum) according to the number N of tuples required by the updating of the Q value network (namely the neural network), and arranging the N numbers into a list Rand-list from small to large.

And Step4, sequentially adding each number in the error storage unit according to the sequence, and storing the Index of the error which is added at the last time into the Index list Index-list when the sum is larger than the minimum number in the Rand-list. While removing the smallest number from the Rand-list.

And Step 5, repeating Step4 for N times until N Index values are obtained, and taking the corresponding tuple from the playback memory unit according to the Index list Index-list.

Fig. 5 shows that the total electricity purchasing cost of the microgrid during operation per day changes with the number of times of training during the training process of the energy storage intelligent agent, and it can be seen from fig. 5 that the operation cost of the microgrid per day basically reaches a stable value after a certain number of times of training, which indicates that the microgrid energy storage intelligent agent has learned a better strategy. The energy storage intelligent agent obtained by the reinforcement learning training can be charged and discharged reasonably according to the state of the micro-grid. Fig. 6 shows that in the face of renewable energy and load fluctuation, the energy storage system can fully utilize renewable energy, store energy when the renewable energy generates more power and provide power to local loads when the load demand is larger, so as to perform peak clipping and valley filling functions, and reduce the electricity purchasing cost to the power market during operation.

Example two

EXAMPLE III

An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the microgrid energy storage scheduling method based on deep reinforcement learning according to an embodiment.

The electronic devices in the embodiments of the present invention may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, PDAs (personal digital assistants), PADs (tablet computers), and the like, and fixed terminals such as desktop computers and the like.

The electronic device may include a processing means (e.g., a central processing unit) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage means into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device, the ROM, and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

Generally, the following systems may be connected to the I/O interface: input devices including, for example, touch screens, touch pads, keyboards, mice, etc.; output devices including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices including, for example, magnetic tape, hard disk, etc.; and a communication device. The communication means may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data.

A computer program, carried on a computer readable medium, comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from a storage means, or installed from a ROM. The computer program, when executed by a processing device, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, system, or apparatus, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, system, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, system, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that the invention is not limited thereto, and may be embodied in other forms without departing from the spirit or essential characteristics thereof. Any modification which does not depart from the functional and structural principles of the present invention is intended to be included within the scope of the claims.

Claims

1. The micro-grid energy storage scheduling method based on deep reinforcement learning is characterized by comprising the following steps:

2. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 1, wherein: the micro-grid simulation model building method comprises the following substeps:

P_bmin≤P_b(t)≤P_bmax (2)

SOC_min≤SOC(t)≤SOC_max (3)

P_balance(t)＝P_renew(t)-P_load(t) (4)

P_net(t)＝P_balance(t)+P_b(t) (5)

3. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 2, wherein: the establishing of the energy storage system intelligent agent comprises the following substeps:

s_t∈S:{P_t ^W,P_t ^load,SOC_t,R_t,t} (6)

a_t∈A:{0,1,2} (7)

r_t＝P_net×R_t×Δt (8)

4. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 3, wherein: the training of the energy storage system agent comprises the following substeps:

ε＝0.5×(1/(i+1)) (10)

|R(t+1)+γmax_a[Q(s(t+1),a]-Q(s(t),a(t)| (11)

(3.8) inputting the state information st of the tuple selected from the playback memory unit to the value function Q, selecting the maximum action state value Q(s) output by the value function Q_t,a_t(ii) a θ), and Q(s) at that time are compared_t,a_t) As supervisory information; then the status information s_t+1The motion index is also input into the value function Q, and the motion index corresponding to the maximum motion state value output at the moment is obtained; then obtaining the input s from the target network_t+1Q'(s) corresponding to the operation of index in the state of (1)_t+1,a_t+1(ii) a θ') value, then the update target for the neural network at this time is:

5. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 4, wherein: said step (3.7) comprises the sub-steps of:

(3.7.1) initializing an Index list element Index-list;

(3.7.2) calculating Sum of absolute value errors Sum in the error storage unit;

6. Little grid energy storage scheduling device based on deep reinforcement study, its characterized in that includes:

7. An electronic device, characterized in that: the micro-grid energy storage scheduling method based on deep reinforcement learning comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the micro-grid energy storage scheduling method based on deep reinforcement learning according to any one of claims 1 to 5.