CN112529727A - Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning - Google Patents

Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning Download PDF

Info

Publication number
CN112529727A
CN112529727A CN202011229149.9A CN202011229149A CN112529727A CN 112529727 A CN112529727 A CN 112529727A CN 202011229149 A CN202011229149 A CN 202011229149A CN 112529727 A CN112529727 A CN 112529727A
Authority
CN
China
Prior art keywords
energy storage
storage system
energy
grid
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011229149.9A
Other languages
Chinese (zh)
Inventor
高强
王昕�
潘弘
林烨
叶丽娜
杨强
杨迷霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taizhou Hongyuan Electric Power Design Institute Co ltd
Taizhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Taizhou Hongyuan Electric Power Design Institute Co ltd
Taizhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taizhou Hongyuan Electric Power Design Institute Co ltd, Taizhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Taizhou Hongyuan Electric Power Design Institute Co ltd
Priority to CN202011229149.9A priority Critical patent/CN112529727A/en
Publication of CN112529727A publication Critical patent/CN112529727A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/0047Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries with monitoring or indicating devices or circuits
    • H02J7/0048Detection of remaining charge capacity or state of charge [SOC]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E70/00Other energy conversion or management systems reducing GHG emissions
    • Y02E70/30Systems combining energy storage with energy generation of non-fossil origin

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning, wherein firstly, a simulation model corresponding to a controlled new energy micro-grid is established according to the controlled new energy micro-grid, and the micro-grid energy storage scheduling is converted into a Markov decision problem according to the micro-grid simulation model; secondly, training the established energy storage system intelligent agent according to the new energy power generation, load and electricity price data before the micro-grid, and storing network parameters and finishing training after the reward obtained by the energy storage system intelligent agent from the environment is stabilized in the training; and finally, the trained energy storage system intelligent bodies are used for real-time scheduling of the microgrid energy storage system, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time in hours. The invention fully utilizes the renewable energy in the microgrid, reduces the impact of the renewable energy on the main power grid and realizes the minimization of the operating cost of the microgrid.

Description

Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of power dispatching engineering, and particularly relates to real-time dispatching of an energy storage system of a smart power grid.
Background
With the gradual consumption of traditional fossil energy, the obvious problem of environmental pollution and the continuous increase of energy demand brought by the development of human society, the energy and environmental problems are more and more concerned by countries all over the world, and the development and utilization of renewable energy are effective means for solving the environmental pollution and energy crisis. However, due to the intermittency and fluctuation of renewable energy, the stability of the power grid is affected by directly connecting large-scale renewable energy sources to the grid. In order to fully utilize renewable energy, the application of micro-grid has been widely regarded. The micro-grid is a small-sized power system composed of various distributed power supplies, loads, energy storage and related protection control devices, and the renewable energy is connected to a power distribution network in a micro-grid mode to effectively exert distributed electric energy.
Due to the randomness, intermittence and fluctuation characteristics of diversified loads of renewable energy sources, the requirements of the operation control of the micro-grid on flexibility and instantaneity are higher and higher. Because the microgrid faces uncertainty of an energy source side and a load side, accurate modeling of the microgrid is difficult to complete, and an optimization decision scene of the microgrid is difficult to express as a clear mathematical expression, so that designing a stable energy scheduling strategy of the microgrid is important for developing the microgrid under the condition of considering new energy and random fluctuation of the load.
The energy storage system is an important component of the microgrid, can improve the consumption capacity of the microgrid for distributed new energy through a reasonable charging and discharging strategy, and can also be matched with an energy management strategy of the microgrid to reduce the running cost of the microgrid in the daily running of the microgrid. The key point of solving the problem of optimizing the operation of the micro-grid is to research the scheduling strategy of the micro-grid by taking the control method of the energy storage system as a core.
In recent years, with the rise of artificial intelligence, studies for applying reinforcement learning to power systems have been increasing. The reinforcement learning method is a model-free method for solving sequential decisions, and learns the strategy of obtaining the maximum reward in the environment by obtaining feedback through interaction between the intelligent agent and the uncertain environment. Energy scheduling of the microgrid energy storage system can be abstracted to a Markov decision process, and the problem that decision optimization of a microgrid is difficult to accurately model and real-time calculation is large due to overlarge state dimension can be effectively solved by applying reinforcement learning to microgrid energy management.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a micro-grid energy storage scheduling method based on deep reinforcement learning, which can be well adapted to real-time energy scheduling of a micro-grid.
In order to solve the technical problems, the invention adopts the following technical scheme: the micro-grid energy storage scheduling method based on deep reinforcement learning comprises the following steps:
firstly, establishing a simulation model corresponding to the controlled new energy microgrid according to the new energy microgrid, and converting the microgrid energy storage scheduling into a Markov decision problem according to the microgrid simulation model so as to establish an energy storage system agent and corresponding observation state quantity, action value and reward function;
secondly, training the established energy storage system intelligent agent according to the new energy power generation, load and electricity price data before the micro-grid, and storing network parameters and finishing training after the reward obtained by the energy storage system intelligent agent from the environment is stabilized in the training;
and finally, the trained energy storage system intelligent bodies are used for real-time scheduling of the microgrid energy storage system, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time in hours.
Preferably, the establishing of the microgrid simulation model comprises the following sub-steps:
(1.1) establishing an energy storage system model: representing an energy storage system using a dynamic model, Pb(t) represents the charging or discharging power of the energy storage system at each time t, the state of charge of the energy storage system is represented by SOC (t), and the SOC dynamic model of the energy storage system is as follows:
Figure BDA0002764579930000021
xi and eta respectively represent the discharging efficiency and the charging efficiency of the energy storage system, and at any moment, the energy storage system can only be charged or discharged or is idle; ecRepresentative is the capacity of the energy storage system; Δ t represents the time interval for charging or discharging the stored energy;
(1.2) setting limiting conditions: for the established energy storage model, the charging and discharging power P of the energy storage model isb(t), state of charge SOC is limited:
Pbmin≤Pb(t)≤Pbmax (2)
SOCmin≤SOC(t)≤SOCmax (3)
wherein, Pbmin、PbmaxRespectively representing the minimum value and the maximum value of the charge and discharge power of the energy storage system; SOCmin、SOCmaxRespectively representing the minimum value and the maximum value of the state of charge SOC (t) of the energy storage system, and the SOCmax1 or less, SOCminGreater than or equal to 0;
(1.3) setting a micro-grid power balance limit: the power balance relationship is as follows:
Pbalance(t)=Prenew(t)-Pload(t) (4)
Pnet(t)=Pbalance(t)+Pb(t) (5)
wherein, PrenewIs the total output power, P, of the distributed new energy in the microgrid at the moment of tloadFor the power demand of the load in the microgrid at time t, PbalanceIs the difference between renewable energy power and load power demand, if PbalanceIf the power generation amount is more than 0, the power generation amount of the renewable energy is excessive, otherwise, the power generation amount of the renewable energy is insufficient; pb(t) positive indicates that the energy storage system emits power, and negative indicates that the energy storage system absorbs power; pnetThe power of the transaction between the micro-grid and the main grid is represented, if the power is positive, the micro-grid outputs power to the main grid, and if the power is negative, the micro-grid purchases power to the main grid.
Preferably, establishing the energy storage system agent comprises the following substeps:
(2.1) setting the observation state quantity required by the intelligent agent of the energy storage system: the method comprises the generating capacity, the load demand, the battery charge state, the electricity purchasing price from a power grid and the time of the day of the renewable energy, wherein the generating capacity and the load are Kw as units, the battery charge state range is 0-1, the electricity purchasing price is in units of yuan, the time is the hour integer time of the day, and the state space is as follows:
st∈S:{Pt W,Pt load,SOCt,Rt,t} (6)
in the system state space S, Pt W、Pt loadRepresenting the renewable energy power generation power and the load power, SOC, respectively at time ttRepresenting the state of charge, R, of the energy storage system at time ttRepresenting the power grid electricity purchase price at the moment t, wherein t represents the moment of the intelligent agent;
(2.2) setting the action value of the intelligent agent of the energy storage system: the action of the energy storage system comprises charging, discharging and non-action, the specified action value is 1 to represent the full-rate discharging of the energy storage system, the action value is 2 to represent the full-rate charging of the system, the action value is 0 to represent the non-action of the energy storage system, and the action space is expressed as follows:
at∈A:{0,1,2} (7)
(2.3) setting a reward function of the intelligent agent of the energy storage system in the training process, and when the intelligent agent is in the state stTaking action oftThe immediate reward obtained is set as the operating cost of the microgrid at time t, expressed as follows:
rt=Pnet×Rt×Δt (8)
(2.4) establishing a decision method: approximating an action-cost function of the agent using a deep neural network, the agent receiving (7) a state quantity, inputting the state quantity into the deep neural network, the deep neural network outputting a state-action-cost Q (s, a) in an observed state, the state-action-cost function representing the agent in the observed state stExpectation of long-term return of time and action:
Figure BDA0002764579930000041
in the above formula, γ is a discount factor, whose value range is 0 to 1, and represents the importance degree of the long-term return, the state-action value output by the deep neural network corresponds to the action that can be taken by the energy storage agent, and the agent selects the action corresponding to the maximum Q value to perform the action.
Preferably, the training of the energy storage system agent comprises the following sub-steps:
(3.1) initializing a neural network by using a random weight theta, namely, a value function Q; initializing a playback memory unit D; simultaneously enabling theta 'to be equal to theta to initialize the target value function Q'; initializing an error storage unit P;
(3.2) acquiring day-ahead prediction data of the microgrid, including the state information of the microgrid required in the step (2.1), initializing a battery state SOC, preprocessing the initial state information and converting the initial state information into tensor, and adding noise to the new energy generating capacity, the load demand and the electricity price data during training;
(3.3) in each training period, selecting actions according to an epsilon-greedy strategy, and setting epsilon as follows:
ε=0.5×(1/(i+1)) (10)
where i is the number of cycles of agent training, and is simultaneously [0,1 ]]Randomly generating a number within the range with equal probability, and if the number is greater than epsilon, selecting the action a corresponding to the action cost function for obtaining the maximum estimation value by the intelligent agent at the momentt(ii) a If the number is less than epsilon, then only one action a can be selected randomly from the action spacet
(3.4) agent in any state stExecuting the action according to the action selected in (3.3), and observing the reward r obtained after executing the actiontWhile shifting to the next state st+1
(3.5) if so, the next state st+1Presence, will tuple(s)t,at,rt,st+1) Storing the data into a playback memory unit, and selecting small-batch data from the playback memory unit to train the intelligent neural network after the sample data stored in the memory playback unit meets the requirement of the minimum sample number;
(3.6) calculating the absolute value errors of the cost functions in different states in the playback memory unit, storing the absolute value errors after calculation of different tuples into an error storage unit, wherein the index of the error storage unit corresponds to the playback memory unit, and the calculation formula is as follows:
|R(t+1)+γmaxa[Q(s(t+1),a]-Q(s(t),a(t)| (11)
(3.7) according to the principle of preferentially learning tuples with larger errors, selecting the tuples from the playback memory unit to train the neural network after each time the agent interacts with the environment;
(3.8) inputting the state information st of the tuple selected from the playback memory unit to the value function Q, selecting the maximum action state value Q(s) output by the value function Qt,at(ii) a θ), and Q(s) at that time are comparedt,at) As supervisory information; then the status information st+1The motion index is also input into the value function Q, and the motion index corresponding to the maximum motion state value output at the moment is obtained; then obtaining the input s from the target networkt+1Q'(s) corresponding to the operation of index in the state of (1)t+1,at+1(ii) a Theta') valueThen, the update target of the neural network at this time is:
Figure BDA0002764579930000061
(3.9) the parameters of the Q value network of the agent are updated according to the following formula:
Figure BDA0002764579930000062
in the above formula, θi、θi+1Respectively representing theta values at the ith and (i + 1) th updates, wherein alpha represents the learning rate of the neural network;
for the target value function Q ', every two training periods, let θ' be θ to update the deep neural network parameters of the target value function Q.
Preferably, said step (3.7) comprises the following sub-steps:
(3.7.1) initializing an Index list element Index-list;
(3.7.2) calculating Sum of absolute value errors Sum in the error storage unit;
(3.7.3) randomly generating N numbers from (0, Sum) according to the required tuple number N updated by the Q value network, and arranging the N numbers into a list Rand-list from small to large;
(3.7.4) sequentially adding each number in the error storage unit according to the sequence, and when the sum is larger than the minimum number in the Rand-list, storing the Index of the error which is added at the last time into an Index list Index-list, and removing the minimum number from the Rand-list;
(3.7.3) repeat N times (3.7.3) until N Index values are obtained, and fetch the corresponding tuple from the playback memory according to the Index list Index-list.
The invention also provides a micro-grid energy storage scheduling device based on deep reinforcement learning, which comprises:
energy storage system agent building block: converting the microgrid energy storage scheduling into a Markov decision problem according to a simulation model of the controlled new energy microgrid so as to establish an energy storage system intelligent agent;
energy storage system agent training module: training an energy storage system intelligent agent according to day-ahead prediction data of the microgrid, storing network parameters after rewards acquired by the energy storage system intelligent agent from the environment are stabilized in the training, and finishing the training;
energy storage system dispatch module: the trained energy storage system intelligent bodies are used for scheduling the energy storage system in the microgrid, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time counted in hours.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the micro-grid energy storage scheduling method based on deep reinforcement learning when executing the computer program.
The invention trains an energy storage system through prediction data, utilizes a deep neural network to fit a state-action cost function, and adopts a double Q network to process the over-estimation problem of DQN.
When the data are used for training, the prior experience playback technology is adopted, so that the data with poor learning effect can be preferentially learned, and the utilization efficiency of the data is improved.
The energy storage system which is well trained through reinforcement learning can meet the requirement of real-time scheduling, renewable energy sources in the microgrid are fully utilized, impact of the renewable energy sources on the main power grid is reduced, and the minimization of the operation cost of the microgrid is realized.
The following detailed description of the present invention will be provided in conjunction with the accompanying drawings.
Drawings
The invention is further described with reference to the accompanying drawings and the detailed description below:
FIG. 1 is a schematic diagram of energy flow in a hybrid microgrid system containing renewable energy sources;
FIG. 2 is a graph of a predicted value of the day-ahead power of the wind generating set and an actual generated power output value;
FIG. 3 is a graph of predicted values of the power of the local load before day and the actual required power;
FIG. 4 is a graph showing a predicted value and an actual value of a real-time electricity rate day ahead;
FIG. 5 is a graph illustrating the variation in daily rewards earned during energy storage system training;
FIG. 6 is a diagram of actual wind power generation, load demand, and energy storage system SOC variation.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Aiming at the characteristics that a microgrid system is difficult to accurately predict the random volatility and the intermittence of renewable energy sources and the load demand, the embodiment of the invention provides a microgrid energy storage scheduling method based on deep reinforcement learning, wherein an energy storage system intelligent agent obtains reward feedback through interaction with the environment to obtain an energy scheduling strategy of the energy storage system, and the method comprises the following steps:
step (1): establishing a simulation model corresponding to the controlled new energy microgrid according to the controlled new energy microgrid, and specifically comprising the following substeps:
(1.1) establishing an energy storage system model: representing an energy storage system using a dynamic model, Pb(t) represents the charging or discharging power of the energy storage at each time t, the state of charge of the energy storage battery is represented by SOC (t), and the SOC dynamic model of the energy storage battery is as follows:
Figure BDA0002764579930000081
xi and eta represent the discharge efficiency and the charge efficiency of the energy storage, and at any moment, the energy storage system can only be charged or discharged or idle; ecRepresentative is the capacity of the energy storage system; Δ t represents the time interval for charging or discharging the stored energy.
(1.2) setting limiting conditions: for the established energy storage model, in order to ensure the normal work of the energy storage model, the charging and discharging power P of the energy storage model is carried outb(t), state of charge SOC is limited:
Pbmin≤Pb(t)≤Pbmax (2)
SOCmin≤SOC(t)≤SOCmax (3)
wherein, Pbmin、PbmaxThe method comprises the steps of representing the minimum value and the maximum value of the charge and discharge power of the energy storage system, wherein specific numerical values are set according to the capacity of a microgrid and the capacity of the energy storage system; SOCmin、SOCmaxRespectively representing the minimum value and the maximum value of the electric quantity state SOC (t) of the energy storage system, setting different upper and lower limits of the energy storage level according to the energy storage characteristics, and defining the known SOC by the SOC (t)max1 or less, SOCminIs greater than or equal to 0.
(1.3) setting a micro-grid power balance limit: the renewable energy is fully utilized, the intermittency of the new energy and the randomness of the load demand are stabilized, and the distributed renewable energy is preferentially considered to meet the load demand. The power balance relationship is as follows:
Pbalance(t)=Prenew(t)-Pload(t) (4)
Pnet(t)=Pbalance(t)-Pb(t) (5)
wherein, PrenewIs the total output power, P, of the distributed new energy in the microgrid at the moment of tloadFor the power demand of the load in the microgrid at time t, PbalanceIs the difference between renewable energy power and load power demand, if PbalanceIf the power generation amount is more than 0, the power generation amount of the renewable energy is excessive, otherwise, the power generation amount of the renewable energy is insufficient; pb(t) positive indicates that the energy storage system absorbs power, and negative indicates that the energy storage system generates workRate; pnetThe power of the transaction between the micro-grid and the main grid is represented, if the power is positive, the micro-grid outputs power to the main grid, and if the power is negative, the micro-grid purchases power to the main grid.
And (1.4) taking the microgrid model as an environment for applying the energy storage intelligent agent, and setting the state quantity, the action value and the reward function required by the intelligent agent according to the model.
In this embodiment, a microgrid model including wind power generation is established, and a schematic diagram is shown in fig. 1. And in the day-ahead scheduling stage, the predicted wind turbine set, local load demand and the electricity purchase price of the micro-grid from the main grid in each time period of the future day are obtained by using the existing technical means. The data required to contain the prediction error are shown in fig. 2 to 4, and the prediction error satisfies the gaussian distribution.
Step (2): the method for establishing the energy storage system intelligent agent suitable for reinforcement learning specifically comprises the following substeps:
(2.1) setting the observation state quantity required by the intelligent agent: the states observed by the reinforcement learning agent in the environment comprise the generated energy of renewable energy sources, the load demand, the battery state of charge, the purchase price of electricity from a power grid and the time of the day. And respectively processing the state quantities of different units, wherein the generated energy and the load quantity take Kw as a unit, the battery charge state range is 0-1, the electricity purchase price takes Yuan as a unit, and the time is the hour of the day and the hour of the day. The state space is as follows:
st∈S:{Pt W,Pt load,SOCt,Rt,t} (6)
(2.2) setting the action value of the agent: in the agent, the scheduling strategy is mainly implemented by controlling the actions of the energy storage system. The action of the energy storage system is mainly charging, discharging and non-action. The action value of 1 is defined to represent that the energy storage system discharges at the full rate, the action value of 2 represents that the system charges at the full rate, and the action value of 0 represents that the energy storage does not act.
at∈A:{0,1,2} (7)
In the system state space S, Pt W、Pt loadRepresenting the power generated by the renewable energy source at time t andload power, SOCtRepresenting the state of charge, R, of the energy storage system at time ttThe electricity purchase price of the power grid at the moment t is shown, and the moment t is shown as the time of the intelligent agent.
And (2.3) setting a reward function of the intelligent agent in the training process. When agent is in state stTaking action oftThe obtained instant reward is set as the operation cost of the micro-grid at the time t, and is specifically expressed as follows:
rt=Pnet×Rt×Δt (8)
(2.4) establishing a decision method: approximating an action-cost function of the agent using a deep neural network, the agent receiving (2.1) the state quantities, inputting the state quantities into the deep neural network, the neural network outputting a state-action-cost Q (s, a) at the input state, the agent then selecting the action corresponding to the maximum Q value to act, the state-action-cost function representing the agent in the observed state stExpectation of long-term return of time and action:
Figure BDA0002764579930000111
in the above equation, γ is a discount factor, and has a value ranging from 0 to 1, which indicates the importance of the long-term return. The state-action value output by the deep neural network corresponds to the action which can be taken by the energy storage intelligent agent, and the intelligent agent selects the action corresponding to the maximum Q value to act.
And (3): training the reinforcement learning agent established in the step (2), and specifically comprising the following substeps:
(3.1) initializing a neural network by using a random weight theta, namely, a value function Q; initializing a playback memory unit D; simultaneously enabling theta 'to be equal to theta to initialize the target value function Q'; the error storage unit P is initialized.
And (3.2) acquiring day-ahead prediction data of the microgrid, containing the required microgrid state information in the step (2.1), initializing the battery state SOC, and preprocessing the initial state information so as to input the initial state information into the neural network. Noise is added to new energy generated energy, load demand, electricity price data during training so that the intelligence can explore more situations, and reliability is enhanced.
And (3.3) selecting the action according to an epsilon-greedy strategy in each training period. Let ε be as follows:
ε=0.5×(1/(i+1)) (10)
wherein i is the agent training number. Simultaneously in [0,1 ]]Randomly generating a number within the range with equal probability, and if the number is greater than epsilon, selecting the action a corresponding to the action cost function for obtaining the maximum estimation value by the intelligent agent at the momentt(ii) a If the number is less than epsilon, then only one action a can be selected randomly from the action spacet
(3.4) agent in any state stExecuting the action according to the action selected in (3.3), and observing the reward r obtained after executing the actiontWhile shifting to the next state st+1
(3.5) if so, the next state st+1Presence, will tuple(s)t,at,rt,st+1) And storing the data into a playback memory unit, and selecting small-batch data from the playback memory unit to train the intelligent neural network after the sample data stored in the memory playback unit meets the requirement of the minimum sample number.
And (3.6) calculating absolute value errors of the value functions in different states in the playback memory unit, storing the absolute value errors after calculation of different tuples into an error storage unit, wherein the index of the error storage unit corresponds to the playback memory unit. The calculation formula is as follows:
|rt+γmaxa[Q'(st+1,a;θ')]-Q(st,at;θ)| (11)
and (3.7) according to the principle of preferentially learning tuples with larger errors, after each time the agent interacts with the environment, selecting 32 tuple groups from the playback memory unit to train the neural network.
(3.8) State information s in the tuple to be selected from the playback memorytInputting the value function Q, selecting the maximum action state value Q(s) output by the value function Qt,at(ii) a θ), and Q(s) at that time are comparedt,at) As supervisory information; then the status information st+1The motion index is also input into the value function Q, and the motion index corresponding to the maximum motion state value output at the moment is obtained; then obtaining the input s from the target networkt+1Q'(s) corresponding to the operation of index in the state of (1)t+1,at+1(ii) a θ') value. Then the update target of the neural network at this time is:
Figure BDA0002764579930000121
(3.9) the parameters of the Q value network of the agent are updated according to the following formula:
Figure BDA0002764579930000122
in the above formula, θi、θi+1The values of θ at the i-th and i + 1-th updates are shown, respectively, and α represents the learning rate of the neural network.
For the target value function Q ', every two training periods, let θ' be θ to update the deep neural network parameters of the target value function Q.
And (4): and (3) training an intelligent agent, namely an energy storage system in the microgrid, according to the day-ahead prediction data by the method in the step (3), storing network parameters after the reward obtained by the intelligent agent from the environment is stabilized in the training, applying the trained intelligent agent to a real-time management system of the microgrid, and performing charge-discharge control on the energy storage system at each energy scheduling time in hours according to the real-time generated energy and load requirements to realize economic scheduling of the microgrid.
Preferably, the step (3.7) specifically comprises the following sub-steps:
step 1: the Index list element Index-list is initialized.
Step 2: the Sum of absolute errors in the error storage unit is calculated.
Step 3: and randomly generating N numbers from (0, Sum) according to the number N of tuples required by the updating of the Q value network (namely the neural network), and arranging the N numbers into a list Rand-list from small to large.
And Step4, sequentially adding each number in the error storage unit according to the sequence, and storing the Index of the error which is added at the last time into the Index list Index-list when the sum is larger than the minimum number in the Rand-list. While removing the smallest number from the Rand-list.
And Step 5, repeating Step4 for N times until N Index values are obtained, and taking the corresponding tuple from the playback memory unit according to the Index list Index-list.
Fig. 5 shows that the total electricity purchasing cost of the microgrid during operation per day changes with the number of times of training during the training process of the energy storage intelligent agent, and it can be seen from fig. 5 that the operation cost of the microgrid per day basically reaches a stable value after a certain number of times of training, which indicates that the microgrid energy storage intelligent agent has learned a better strategy. The energy storage intelligent agent obtained by the reinforcement learning training can be charged and discharged reasonably according to the state of the micro-grid. Fig. 6 shows that in the face of renewable energy and load fluctuation, the energy storage system can fully utilize renewable energy, store energy when the renewable energy generates more power and provide power to local loads when the load demand is larger, so as to perform peak clipping and valley filling functions, and reduce the electricity purchasing cost to the power market during operation.
Example two
Energy storage system agent building block: converting the microgrid energy storage scheduling into a Markov decision problem according to a simulation model of the controlled new energy microgrid so as to establish an energy storage system intelligent agent;
energy storage system agent training module: training an energy storage system intelligent agent according to day-ahead prediction data of the microgrid, storing network parameters after rewards acquired by the energy storage system intelligent agent from the environment are stabilized in the training, and finishing the training;
energy storage system dispatch module: the trained energy storage system intelligent bodies are used for scheduling the energy storage system in the microgrid, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time counted in hours.
EXAMPLE III
An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the microgrid energy storage scheduling method based on deep reinforcement learning according to an embodiment.
The electronic devices in the embodiments of the present invention may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, PDAs (personal digital assistants), PADs (tablet computers), and the like, and fixed terminals such as desktop computers and the like.
The electronic device may include a processing means (e.g., a central processing unit) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage means into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device, the ROM, and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
Generally, the following systems may be connected to the I/O interface: input devices including, for example, touch screens, touch pads, keyboards, mice, etc.; output devices including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices including, for example, magnetic tape, hard disk, etc.; and a communication device. The communication means may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data.
A computer program, carried on a computer readable medium, comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from a storage means, or installed from a ROM. The computer program, when executed by a processing device, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, system, or apparatus, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, system, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, system, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that the invention is not limited thereto, and may be embodied in other forms without departing from the spirit or essential characteristics thereof. Any modification which does not depart from the functional and structural principles of the present invention is intended to be included within the scope of the claims.

Claims (7)

1. The micro-grid energy storage scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
firstly, establishing a simulation model corresponding to the controlled new energy microgrid according to the new energy microgrid, and converting the microgrid energy storage scheduling into a Markov decision problem according to the microgrid simulation model so as to establish an energy storage system agent and corresponding observation state quantity, action value and reward function;
secondly, training the established energy storage system intelligent agent according to the new energy power generation, load and electricity price data before the micro-grid, and storing network parameters and finishing training after the reward obtained by the energy storage system intelligent agent from the environment is stabilized in the training;
and finally, the trained energy storage system intelligent bodies are used for real-time scheduling of the microgrid energy storage system, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time in hours.
2. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 1, wherein: the micro-grid simulation model building method comprises the following substeps:
(1.1) establishing an energy storage system model: representing an energy storage system using a dynamic model, Pb(t) represents the charging or discharging power of the energy storage system at each time t, the state of charge of the energy storage system is represented by SOC (t), and the SOC dynamic model of the energy storage system is as follows:
Figure FDA0002764579920000011
xi and eta respectively represent the discharging efficiency and the charging efficiency of the energy storage system, and at any moment, the energy storage system can only be charged or discharged or is idle; ecRepresentative is the capacity of the energy storage system; Δ t represents the time interval for charging or discharging the stored energy;
(1.2) setting limiting conditions: for the established energy storage model, the charging and discharging power P of the energy storage model isb(t), state of charge SOC is limited:
Pbmin≤Pb(t)≤Pbmax (2)
SOCmin≤SOC(t)≤SOCmax (3)
wherein, Pbmin、PbmaxRespectively representing the minimum value and the maximum value of the charge and discharge power of the energy storage system; SOCmin、SOCmaxRespectively representing the minimum value and the maximum value of the state of charge SOC (t) of the energy storage system, and the SOCmax1 or less, SOCminGreater than or equal to 0;
(1.3) setting a micro-grid power balance limit: the power balance relationship is as follows:
Pbalance(t)=Prenew(t)-Pload(t) (4)
Pnet(t)=Pbalance(t)+Pb(t) (5)
wherein, PrenewIs the total output power, P, of the distributed new energy in the microgrid at the moment of tloadFor the power demand of the load in the microgrid at time t, PbalanceIs the difference between renewable energy power and load power demand, if PbalanceIf the power generation amount is more than 0, the power generation amount of the renewable energy is excessive, otherwise, the power generation amount of the renewable energy is insufficient; pb(t) positive indicates that the energy storage system emits power, and negative indicates that the energy storage system absorbs power; pnetThe power of the transaction between the micro-grid and the main grid is represented, if the power is positive, the micro-grid outputs power to the main grid, and if the power is negative, the micro-grid purchases power to the main grid.
3. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 2, wherein: the establishing of the energy storage system intelligent agent comprises the following substeps:
(2.1) setting the observation state quantity required by the intelligent agent of the energy storage system: the method comprises the generating capacity, the load demand, the battery charge state, the electricity purchasing price from a power grid and the time of the day of the renewable energy, wherein the generating capacity and the load are Kw as units, the battery charge state range is 0-1, the electricity purchasing price is in units of yuan, the time is the hour integer time of the day, and the state space is as follows:
st∈S:{Pt W,Pt load,SOCt,Rt,t} (6)
in the system state space S, Pt W、Pt loadRepresenting the renewable energy power generation power and the load power, SOC, respectively at time ttRepresenting the state of charge, R, of the energy storage system at time ttRepresenting the power grid electricity purchase price at the moment t, wherein t represents the moment of the intelligent agent;
(2.2) setting the action value of the intelligent agent of the energy storage system: the action of the energy storage system comprises charging, discharging and non-action, the specified action value is 1 to represent the full-rate discharging of the energy storage system, the action value is 2 to represent the full-rate charging of the system, the action value is 0 to represent the non-action of the energy storage system, and the action space is expressed as follows:
at∈A:{0,1,2} (7)
(2.3) setting a reward function of the intelligent agent of the energy storage system in the training process, and when the intelligent agent is in the state stTaking action oftThe immediate reward obtained is set as the operating cost of the microgrid at time t, expressed as follows:
rt=Pnet×Rt×Δt (8)
(2.4) establishing a decision method: approximating an action-cost function of the agent using a deep neural network, the agent receiving (7) a state quantity, inputting the state quantity into the deep neural network, the deep neural network outputting a state-action-cost Q (s, a) in an observed state, the state-action-cost function representing the agent in the observed state stExpectation of long-term return of time and action:
Figure FDA0002764579920000031
in the above formula, γ is a discount factor, whose value range is 0 to 1, and represents the importance degree of the long-term return, the state-action value output by the deep neural network corresponds to the action that can be taken by the energy storage agent, and the agent selects the action corresponding to the maximum Q value to perform the action.
4. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 3, wherein: the training of the energy storage system agent comprises the following substeps:
(3.1) initializing a neural network by using a random weight theta, namely, a value function Q; initializing a playback memory unit D; simultaneously enabling theta 'to be equal to theta to initialize the target value function Q'; initializing an error storage unit P;
(3.2) acquiring day-ahead prediction data of the microgrid, including the state information of the microgrid required in the step (2.1), initializing a battery state SOC, preprocessing the initial state information and converting the initial state information into tensor, and adding noise to the new energy generating capacity, the load demand and the electricity price data during training;
(3.3) in each training period, selecting actions according to an epsilon-greedy strategy, and setting epsilon as follows:
ε=0.5×(1/(i+1)) (10)
where i is the number of cycles of agent training, and is simultaneously [0,1 ]]Randomly generating a number within the range with equal probability, and if the number is greater than epsilon, selecting the action a corresponding to the action cost function for obtaining the maximum estimation value by the intelligent agent at the momentt(ii) a If the number is less than epsilon, then only one action a can be selected randomly from the action spacet
(3.4) agent in any state stExecuting the action according to the action selected in (3.3), and observing the reward r obtained after executing the actiontWhile shifting to the next state st+1
(3.5) if so, the next state st+1Presence, will tuple(s)t,at,rt,st+1) Storing the data into a playback memory unit, and selecting small-batch data from the playback memory unit to train the intelligent neural network after the sample data stored in the memory playback unit meets the requirement of the minimum sample number;
(3.6) calculating the absolute value errors of the cost functions in different states in the playback memory unit, storing the absolute value errors after calculation of different tuples into an error storage unit, wherein the index of the error storage unit corresponds to the playback memory unit, and the calculation formula is as follows:
|R(t+1)+γmaxa[Q(s(t+1),a]-Q(s(t),a(t)| (11)
(3.7) according to the principle of preferentially learning tuples with larger errors, selecting the tuples from the playback memory unit to train the neural network after each time the agent interacts with the environment;
(3.8) inputting the state information st of the tuple selected from the playback memory unit to the value function Q, selecting the maximum action state value Q(s) output by the value function Qt,at(ii) a θ), and Q(s) at that time are comparedt,at) As supervisory information; then the status information st+1The motion index is also input into the value function Q, and the motion index corresponding to the maximum motion state value output at the moment is obtained; then obtaining the input s from the target networkt+1Q'(s) corresponding to the operation of index in the state of (1)t+1,at+1(ii) a θ') value, then the update target for the neural network at this time is:
Figure FDA0002764579920000041
(3.9) the parameters of the Q value network of the agent are updated according to the following formula:
Figure FDA0002764579920000042
in the above formula, θi、θi+1Respectively representing theta values at the ith and (i + 1) th updates, wherein alpha represents the learning rate of the neural network;
for the target value function Q ', every two training periods, let θ' be θ to update the deep neural network parameters of the target value function Q.
5. The micro-grid energy storage scheduling method based on deep reinforcement learning of claim 4, wherein: said step (3.7) comprises the sub-steps of:
(3.7.1) initializing an Index list element Index-list;
(3.7.2) calculating Sum of absolute value errors Sum in the error storage unit;
(3.7.3) randomly generating N numbers from (0, Sum) according to the required tuple number N updated by the Q value network, and arranging the N numbers into a list Rand-list from small to large;
(3.7.4) sequentially adding each number in the error storage unit according to the sequence, and when the sum is larger than the minimum number in the Rand-list, storing the Index of the error which is added at the last time into an Index list Index-list, and removing the minimum number from the Rand-list;
(3.7.3) repeat N times (3.7.3) until N Index values are obtained, and fetch the corresponding tuple from the playback memory according to the Index list Index-list.
6. Little grid energy storage scheduling device based on deep reinforcement study, its characterized in that includes:
energy storage system agent building block: converting the microgrid energy storage scheduling into a Markov decision problem according to a simulation model of the controlled new energy microgrid so as to establish an energy storage system intelligent agent;
energy storage system agent training module: training an energy storage system intelligent agent according to day-ahead prediction data of the microgrid, storing network parameters after rewards acquired by the energy storage system intelligent agent from the environment are stabilized in the training, and finishing the training;
energy storage system dispatch module: the trained energy storage system intelligent bodies are used for scheduling the energy storage system in the microgrid, and the energy storage system performs charge and discharge control according to the real-time generated energy and load requirements of the microgrid at each energy scheduling time counted in hours.
7. An electronic device, characterized in that: the micro-grid energy storage scheduling method based on deep reinforcement learning comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the micro-grid energy storage scheduling method based on deep reinforcement learning according to any one of claims 1 to 5.
CN202011229149.9A 2020-11-06 2020-11-06 Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning Pending CN112529727A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011229149.9A CN112529727A (en) 2020-11-06 2020-11-06 Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011229149.9A CN112529727A (en) 2020-11-06 2020-11-06 Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN112529727A true CN112529727A (en) 2021-03-19

Family

ID=74979783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011229149.9A Pending CN112529727A (en) 2020-11-06 2020-11-06 Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112529727A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113110493A (en) * 2021-05-07 2021-07-13 北京邮电大学 Path planning equipment and path planning method based on photonic neural network
CN113139682A (en) * 2021-04-15 2021-07-20 北京工业大学 Micro-grid energy management method based on deep reinforcement learning
CN113270937A (en) * 2021-03-30 2021-08-17 鹏城实验室 Standby battery scheduling method, computer readable storage medium and system
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
CN113437753A (en) * 2021-08-25 2021-09-24 广州乐盈信息科技股份有限公司 Energy storage system
CN113541272A (en) * 2021-08-26 2021-10-22 山东浪潮科学研究院有限公司 Energy storage battery balanced charging and discharging method and device based on deep learning model and medium
CN113779871A (en) * 2021-08-26 2021-12-10 清华大学 Electric heating coupling system scheduling method and device, electronic equipment and storage medium thereof
CN113807564A (en) * 2021-07-28 2021-12-17 合肥工业大学 Garden micro-grid load optimization scheduling method and system based on two-stage reinforcement learning
CN113890112A (en) * 2021-09-29 2022-01-04 合肥工业大学 Power grid prospective scheduling method based on multi-scene parallel learning
CN113962446A (en) * 2021-10-08 2022-01-21 国网安徽省电力有限公司电力科学研究院 Micro-grid group cooperative scheduling method and device, electronic equipment and storage medium
CN114123256A (en) * 2021-11-02 2022-03-01 华中科技大学 Distributed energy storage configuration method and system adaptive to random optimization decision
CN114219045A (en) * 2021-12-30 2022-03-22 国网北京市电力公司 Dynamic early warning method, system and device for risk of power distribution network and storage medium
CN114362218A (en) * 2021-12-30 2022-04-15 中国电子科技南湖研究院 Deep Q learning-based multi-type energy storage scheduling method and device in microgrid
CN114498750A (en) * 2022-02-14 2022-05-13 华北电力大学 Distributed multi-agent microgrid energy management method based on Q-Learning algorithm
CN114707871A (en) * 2022-04-12 2022-07-05 东南大学 Microgrid energy management method, device, equipment and storage medium
CN114742453A (en) * 2022-05-06 2022-07-12 江苏大学 Micro-grid energy management method based on Rainbow deep Q network
CN114792974A (en) * 2022-04-26 2022-07-26 南京邮电大学 Method and system for energy optimization management of interconnected micro-grid
CN114971250A (en) * 2022-05-17 2022-08-30 重庆大学 Comprehensive energy economic dispatching system based on deep Q learning
CN115577647A (en) * 2022-12-09 2023-01-06 南方电网数字电网研究院有限公司 Power grid fault type identification method and intelligent agent construction method
CN115731072A (en) * 2022-11-22 2023-03-03 东南大学 Microgrid space-time perception energy management method based on safe deep reinforcement learning
CN116316755A (en) * 2023-03-07 2023-06-23 西南交通大学 Energy management method for electrified railway energy storage system based on reinforcement learning
WO2023116742A1 (en) * 2021-12-21 2023-06-29 清华大学 Energy-saving optimization method and apparatus for terminal air conditioning system of integrated data center cabinet
CN116488154A (en) * 2023-04-17 2023-07-25 海南大学 Energy scheduling method, system, computer equipment and medium based on micro-grid
CN116780627A (en) * 2023-06-27 2023-09-19 中国电建集团华东勘测设计研究院有限公司 Micro-grid regulation and control method in building park
WO2024022194A1 (en) * 2022-07-26 2024-02-01 中国电力科学研究院有限公司 Power grid real-time scheduling optimization method and system, computer device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
US20200133220A1 (en) * 2017-06-30 2020-04-30 Merit Si, Llc Method and system for managing microgrid assets
CN111242388A (en) * 2020-01-22 2020-06-05 上海电机学院 Micro-grid optimization scheduling method considering combined supply of cold, heat and power
CN111352419A (en) * 2020-02-25 2020-06-30 山东大学 Path planning method and system for updating experience playback cache based on time sequence difference
CN111369181A (en) * 2020-06-01 2020-07-03 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling deep reinforcement learning method and module

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200133220A1 (en) * 2017-06-30 2020-04-30 Merit Si, Llc Method and system for managing microgrid assets
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN111242388A (en) * 2020-01-22 2020-06-05 上海电机学院 Micro-grid optimization scheduling method considering combined supply of cold, heat and power
CN111352419A (en) * 2020-02-25 2020-06-30 山东大学 Path planning method and system for updating experience playback cache based on time sequence difference
CN111369181A (en) * 2020-06-01 2020-07-03 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling deep reinforcement learning method and module

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王亚东等: ""基于深度强化学习的微电网储能调度策略研究"", 《可再生能源》, vol. 37, no. 8, pages 1220 - 1228 *

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113270937A (en) * 2021-03-30 2021-08-17 鹏城实验室 Standby battery scheduling method, computer readable storage medium and system
CN113270937B (en) * 2021-03-30 2024-06-21 鹏城实验室 Standby battery scheduling method, computer readable storage medium and system
CN113139682B (en) * 2021-04-15 2023-10-10 北京工业大学 Micro-grid energy management method based on deep reinforcement learning
CN113139682A (en) * 2021-04-15 2021-07-20 北京工业大学 Micro-grid energy management method based on deep reinforcement learning
CN113110493A (en) * 2021-05-07 2021-07-13 北京邮电大学 Path planning equipment and path planning method based on photonic neural network
CN113110493B (en) * 2021-05-07 2022-09-30 北京邮电大学 Path planning equipment and path planning method based on photonic neural network
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
CN113807564B (en) * 2021-07-28 2023-08-04 合肥工业大学 Park microgrid load optimal scheduling method and system based on two-stage reinforcement learning
CN113807564A (en) * 2021-07-28 2021-12-17 合肥工业大学 Garden micro-grid load optimization scheduling method and system based on two-stage reinforcement learning
CN113437753A (en) * 2021-08-25 2021-09-24 广州乐盈信息科技股份有限公司 Energy storage system
CN113779871A (en) * 2021-08-26 2021-12-10 清华大学 Electric heating coupling system scheduling method and device, electronic equipment and storage medium thereof
CN113541272A (en) * 2021-08-26 2021-10-22 山东浪潮科学研究院有限公司 Energy storage battery balanced charging and discharging method and device based on deep learning model and medium
CN113541272B (en) * 2021-08-26 2023-06-02 山东浪潮科学研究院有限公司 Balanced charge and discharge method and equipment for energy storage battery and medium
CN113779871B (en) * 2021-08-26 2024-08-06 清华大学 Electrothermal coupling system scheduling method and device, electronic equipment and storage medium thereof
CN113890112A (en) * 2021-09-29 2022-01-04 合肥工业大学 Power grid prospective scheduling method based on multi-scene parallel learning
CN113890112B (en) * 2021-09-29 2023-09-15 合肥工业大学 Power grid look-ahead scheduling method based on multi-scene parallel learning
CN113962446B (en) * 2021-10-08 2024-06-07 国网安徽省电力有限公司电力科学研究院 Micro-grid group cooperative scheduling method and device, electronic equipment and storage medium
CN113962446A (en) * 2021-10-08 2022-01-21 国网安徽省电力有限公司电力科学研究院 Micro-grid group cooperative scheduling method and device, electronic equipment and storage medium
CN114123256A (en) * 2021-11-02 2022-03-01 华中科技大学 Distributed energy storage configuration method and system adaptive to random optimization decision
CN114123256B (en) * 2021-11-02 2023-10-03 华中科技大学 Distributed energy storage configuration method and system adapting to random optimization decision
WO2023116742A1 (en) * 2021-12-21 2023-06-29 清华大学 Energy-saving optimization method and apparatus for terminal air conditioning system of integrated data center cabinet
CN114219045A (en) * 2021-12-30 2022-03-22 国网北京市电力公司 Dynamic early warning method, system and device for risk of power distribution network and storage medium
CN114362218A (en) * 2021-12-30 2022-04-15 中国电子科技南湖研究院 Deep Q learning-based multi-type energy storage scheduling method and device in microgrid
CN114362218B (en) * 2021-12-30 2024-03-19 中国电子科技南湖研究院 Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning
CN114498750A (en) * 2022-02-14 2022-05-13 华北电力大学 Distributed multi-agent microgrid energy management method based on Q-Learning algorithm
CN114707871B (en) * 2022-04-12 2024-07-09 东南大学 Micro-grid energy management method, device, equipment and storage medium
CN114707871A (en) * 2022-04-12 2022-07-05 东南大学 Microgrid energy management method, device, equipment and storage medium
CN114792974A (en) * 2022-04-26 2022-07-26 南京邮电大学 Method and system for energy optimization management of interconnected micro-grid
CN114742453A (en) * 2022-05-06 2022-07-12 江苏大学 Micro-grid energy management method based on Rainbow deep Q network
CN114971250A (en) * 2022-05-17 2022-08-30 重庆大学 Comprehensive energy economic dispatching system based on deep Q learning
CN114971250B (en) * 2022-05-17 2024-05-07 重庆大学 Comprehensive energy economy dispatching system based on deep Q learning
WO2024022194A1 (en) * 2022-07-26 2024-02-01 中国电力科学研究院有限公司 Power grid real-time scheduling optimization method and system, computer device and storage medium
CN115731072A (en) * 2022-11-22 2023-03-03 东南大学 Microgrid space-time perception energy management method based on safe deep reinforcement learning
CN115731072B (en) * 2022-11-22 2024-01-30 东南大学 Micro-grid space-time perception energy management method based on safety deep reinforcement learning
CN115577647A (en) * 2022-12-09 2023-01-06 南方电网数字电网研究院有限公司 Power grid fault type identification method and intelligent agent construction method
CN116316755B (en) * 2023-03-07 2023-11-14 西南交通大学 Energy management method for electrified railway energy storage system based on reinforcement learning
CN116316755A (en) * 2023-03-07 2023-06-23 西南交通大学 Energy management method for electrified railway energy storage system based on reinforcement learning
CN116488154A (en) * 2023-04-17 2023-07-25 海南大学 Energy scheduling method, system, computer equipment and medium based on micro-grid
CN116780627A (en) * 2023-06-27 2023-09-19 中国电建集团华东勘测设计研究院有限公司 Micro-grid regulation and control method in building park
CN116780627B (en) * 2023-06-27 2024-07-09 中国电建集团华东勘测设计研究院有限公司 Micro-grid regulation and control method in building park

Similar Documents

Publication Publication Date Title
CN112529727A (en) Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning
CN108964050A (en) Micro-capacitance sensor dual-layer optimization dispatching method based on Demand Side Response
Fang et al. Deep reinforcement learning for scenario-based robust economic dispatch strategy in internet of energy
CN112491094B (en) Hybrid-driven micro-grid energy management method, system and device
Zhou et al. Four‐level robust model for a virtual power plant in energy and reserve markets
Liu et al. Deep reinforcement learning based energy storage management strategy considering prediction intervals of wind power
CN111864742B (en) Active power distribution system extension planning method and device and terminal equipment
CN114036825A (en) Collaborative optimization scheduling method, device, equipment and storage medium for multiple virtual power plants
Han et al. Optimization of transactive energy systems with demand response: A cyber‐physical‐social system perspective
CN117522087B (en) Virtual power plant resource allocation method, device, equipment and medium
CN112510690B (en) Optimal scheduling method and system considering wind-fire-storage combination and demand response reward and punishment
Qiu et al. Local integrated energy system operational optimization considering multi‐type uncertainties: A reinforcement learning approach based on improved TD3 algorithm
Cai et al. An improved sequential importance sampling method for reliability assessment of renewable power systems with energy storage
CN112819307A (en) Demand response method and system based on load supervision in smart power grid
An et al. Optimal scheduling for charging and discharging of electric vehicles based on deep reinforcement learning
Zhang et al. Negotiation strategy for discharging price of EVs based on fuzzy Bayesian learning
CN116826796A (en) Demand control method, device and charge storage system
CN116862172A (en) Full-period variable time period ordered power utilization scheduling method and storage medium
Sun et al. Digital twin‐based online resilience scheduling for microgrids: An approach combining imitative learning and deep reinforcement learning
CN113052630B (en) Method for configuring electric power equipment by using model and electric power equipment configuration method
CN115360768A (en) Power scheduling method and device based on muzero and deep reinforcement learning and storage medium
CN114435165A (en) Charging method and device of charging pile, electronic equipment and storage medium
CN118082598B (en) Electric vehicle charging method, apparatus, device, medium, and program product
Evangeline et al. Minimizing voltage fluctuation in stand-alone microgrid system using a Kriging-based multi-objective stochastic optimization algorithm
CN116956018A (en) Training method, training device, training equipment, training medium and training program product for strategy generator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination