CN112003269B - Intelligent on-line control method of grid-connected shared energy storage system - Google Patents

Intelligent on-line control method of grid-connected shared energy storage system Download PDF

Info

Publication number
CN112003269B
CN112003269B CN202010754472.1A CN202010754472A CN112003269B CN 112003269 B CN112003269 B CN 112003269B CN 202010754472 A CN202010754472 A CN 202010754472A CN 112003269 B CN112003269 B CN 112003269B
Authority
CN
China
Prior art keywords
cbess
network
soc
grid
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010754472.1A
Other languages
Chinese (zh)
Other versions
CN112003269A (en
Inventor
刘友波
宋航
黄媛
刘俊勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202010754472.1A priority Critical patent/CN112003269B/en
Publication of CN112003269A publication Critical patent/CN112003269A/en
Application granted granted Critical
Publication of CN112003269B publication Critical patent/CN112003269B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses an intelligent online control method of a grid-connected shared energy storage system, which comprises the steps of building two multi-hidden-layer competition Q network models; establishing a Markov decision process of the CBESS, and mapping the charging and discharging behaviors of the CBESS into a reinforcement learning process based on action value iterative updating; determining environmental state characteristics and an instant reward function; e-round loop iterative learning is carried out; the MG executes the first planning scheduling in the round to obtain a first state vector s obtained by the agent perception environment of the pre-transaction amount CBESS of the external systemt(ii) a Using s in a master contention Q networktAs an input, Q value outputs corresponding to all actions are obtained. Residual capacity SOC of CBESStUpdate to SOCt+1(ii) a MG carries out secondary planning of the period according to the tradable electric quantity actually fed back by CBESS, calculates st、at、rt、st+1Updating all the hyperparameters of the main competition Q network through gradient back propagation; updating priority p of stored data in sumtreeiAnd copying the parameters of the main competition Q network to the target competition Q network.

Description

Intelligent on-line control method of grid-connected shared energy storage system
Technical Field
The invention relates to the technical field of power system automation, in particular to an intelligent online control method of a grid-connected shared energy storage system.
Background
Unlike an Energy Storage System (ESS) with centralized control, a shared energy storage system (CESS) has a small scale, generally has a capacity of only several megawatts, and is configured on the secondary side of an electrical transformer of a distribution substation to reduce the negative effects of renewable resources and continuous load changes. Once integrated into a grid-tied micro-grid (MG), CESS can improve the flexibility and reliability of the MG through rapid charging and discharging. With the deregulation of the distribution market, CESS can be operated by independent entity enterprises and participate in the market through price reaction behavior and realize arbitrage. However, in the conventional method for CESS optimization decision, whether centralized optimization control or distributed coordination optimization method is adopted, complex system modeling, data non-observability and various uncertainty factors bring many challenges to model-based physical modeling.
In recent years, machine learning is rapidly developed, and strong perception learning capacity and data analysis capacity of the machine learning accord with the requirements of big data application in a smart grid. Among them, Reinforcement Learning (RL) acquires environmental knowledge through continuous interaction between a decision-making subject and an environment, and takes actions that affect the environment to achieve a preset target. Deep Learning (DL) does not depend on any analytical equation, but describes a mathematical problem and an approximate solution by using a large amount of existing data, and can effectively alleviate the problems of difficulty in solving a cost function and the like when the Deep Learning (DL) is applied to RL. The method aims to solve the problems of difficult modeling, poor expansibility, poor practicability and the like of a physical modeling method, and overcomes the defects of difficult solving, poor convergence, poor robustness, slow convergence and the like of the traditional intelligent algorithm when the state space is too large.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an intelligent online control method of a grid-connected shared energy storage system, which comprises the following steps:
step one, building two multi-hidden-layer competition Q network models, namely a main competition Q network and a target competition Q network, and inputting a feature vector s of an observation statetThe output corresponds to a in each action set AtOperating value Q(s)t,at);
Step two, establishing a Markov decision process of the CBESS, and mapping the charging and discharging behaviors of the Markov decision process into a reinforcement learning process based on action value iterative updating; determining environmental state characteristics and an instant reward function;
step three, entering loop iterative learning of E rounds, and starting to reinitialize the load curve of the MG, the output of the RDG, the market price and the SOC for sharing energy storage in each round;
step four, MG executesThe first scheduling in turn, the first state vector s obtained by the agent-aware environment of the pretreat volume CBESS with the external systemt
Step five, using s in the main competition Q networktAs an input, Q value outputs corresponding to all actions are obtained. Selecting an optimal estimated Q value from the current Q value output by adopting an epsilon-greedy method to determine the corresponding action a tAnd executing;
step six, residual electric quantity SOC of CBESStUpdate to SOCt+1Judging the SOCt+1Whether or not to exceed [0,1 ]]The range is used for judging whether the range exceeds the limit or not, and the termination judgment index done of the iteration is calculated according to the rangetSimultaneously calculating the instant reward r after the actiont
Step seven, the MG performs secondary planning of the current time interval according to the actually fed back transactable electric quantity by the CBESS, determines the transactable electric quantity with an external system, and simultaneously gives the pre-transacted electric quantity Pmg.CHE t +1 and Pmg.grid t +1 of the next time interval to be used as the sensing state information of the agent in the next time interval; and updating the state of the system to st+1
Step eight, calculating st、at、rt、st+1And then compares it with donetAll indexes are sequentially stored in leaf nodes of sumtree; if the quantity of the stored data reaches a preset small batch sampling capacity m, randomly sampling m samples from the stored data, calculating a current target Q value and an error of the current target Q value, and updating all hyper-parameters of the main competition Q network through gradient back propagation;
step nine, recalculating and updating the priority p of the stored data in the sumtree after the Q network is updatediCopying the parameters of the main competition Q network to the target Q network, and simultaneously setting the current state s to be st+1(ii) a If s is in a termination state or the number of iteration rounds T is reached, the iteration of the current round is finished, and the step three is returned to for circulation; otherwise, go to step five to continue the iteration.
Furthermore, the master competition Q network is a multi-hidden-layer master competition Q network architecture with a single-neuron state value sub-layer and K neuron motion advantage sub-layers, and the ReLu is selected as an activation functionA function to speed up the convergence process; normally initializing interlayer weight omega, and initializing bias b to be constant tending to 0; forming a state characteristic vector s by the time sequence number, the charge state of the CBESS, the market price, the pre-trading electric quantity of the MG and the CBESS/superior distribution networktOutputting the optimal discretized charge-discharge action value Q as network inputtAnd finally, performing network training by preferentially playing back data to iteratively converge.
Further, the action set a is:
the motion space of CBESS is divided into K discrete charge-discharge options P (K) be, and the motion space A is uniformly discretized
Figure BDA0002611081380000021
Wherein A is a set of all possible actions; pbe (k)Representing the k-th charge/discharge event in the CBESS uniform discrete motion space.
Further, the markov decision process for building the CBESS maps the CBESS charging and discharging behavior to a reinforcement learning process based on iterative update of action value, which specifically includes:
the residual electric quantity of the BESS continuously changes in the charging and discharging process, and the change quantity of the BESS is related to the charging and discharging electric quantity and self-discharging in the period; the recursive relationship of energy storage and charging is
SoC(t)=(1-σsdr)·SoC(t-1)+Pbe·(1-Lc)Δt/Ecap
The energy storage discharge process is shown below
SoC(t)=(1-σsdr)·SoC(t-1)-PbeΔt/[Ecap·(1-Ldc)]
In the formula: soc (t) is the state of charge of CBESS during t period; pbe(t) is the charge and discharge power of the CBESS during the period t; sigmasdrIs the self-discharge rate of the energy storage medium; l iscAnd LdcCharging and discharging losses for CBESS, respectively; delta t is the duration of each calculation window;
the maximum allowable charge-discharge power of the CBESS at the moment t is determined by the charge-discharge characteristics of the CBESS and the residual charge state at the moment t, and simultaneously, the maximum allowable charge-discharge power meets the following constraint in the operation process:
SoCmin≤SoC(t)≤SoCmax
in the formula: SoC (system on chip)maxAnd SoCminThe upper limit and the lower limit of the CBESS SOC constraint are respectively set;
the environmental state is characterized in that:
defining the environmental state feature vector sensed by CBESS at time t as stIs composed of
Figure BDA0002611081380000031
In the formula, t is a time sequence number; prict b.pre/prict s.preRespectively represents the predicted selling and purchasing prices P of the superior power grid at the time of tt mg.CHE/Pt mg.gridRespectively representing the pre-transaction electric quantity between the micro-grid and the CBESS and the superior grid;
2) the instant reward function is as follows: CBESS gains energy arbitrage by charging during off-peak hours and then discharging during peak hours; after the actual transaction power with the micro-grid and the upper-level power grid is respectively determined, the reward benefit r is calculated according to the real-time priceEAP
Total cost of operation and maintenance C of CBESSo,mSee the following formula
C1=|Pbe|·cbe
Figure BDA0002611081380000032
Adding a negative reward line with coefficient sigma as penalty to suppress power (P) of point of connection exc_grid) Wave motion
rline=-σ·|Pexc_grid|
If the action performed results in SOC exceeding [0,1 ]]Giving a large penalty of rexcTo prevent the agent from learning laterMaking an unreasonable decision; instant reward rtComprises the following steps:
Figure BDA0002611081380000033
further, the MG executes the first scheduling in the round to obtain a first state vector s obtained by the agent sensing environment of the pre-transaction amount CBESS of the external systemtThe method comprises the following steps: for the MG model, which aims to minimize the running cost under the predicted price signal, the objective function of the economic dispatch model is as follows:
Figure BDA0002611081380000034
in the formula, T is a planning period; cCDG z is the power generation cost of the z-th CDG, ci esThe operation cost of the ith microgrid for energy storage is; the PCDG z, t is the power output of the z-th CDG, and the Pes i, t is the charge and discharge power of the i-th microgrid energy storage; grid t/ps.grid t represents the selling and purchasing prices of the superior distribution network at each time period, Pt b.CHE/Pt s.CHERespectively representing the selling price and the purchasing price issued by the CBESS operator;
the micro-grid adopts a Mixed Integer Linear Programming (MILP) method according to the prediction data to obtain the transaction electric quantity P between the time interval and the CBESS and the superior distribution networkt mg.CHE/Pt mg.gridAnd issuing the transaction information to the outside; the agent of CBESS obtains the state feature vector s by sensing the external environment t=[t,SOCt,prict b.pre,prict s.pre,Pt mg.CHE,Pt mg.grid]。
Further, said using s in the master contention Q networktAs an input, Q value outputs corresponding to all actions are obtained. Selecting an optimal estimated Q value from the current Q value output by adopting an epsilon-greedy method to determine the corresponding action atAnd executing, including the following processes:
using s in a master contention Q networktAs input, obtaining Q value output corresponding to all actions; selecting a corresponding action a in the current Q value output by adopting epsilon greedy methodtIn a state stPerforming a current action at(ii) a For the ε -greedy policy, first by setting the value of ε ∈ (0,1), then at the corresponding action, greedily select the optimal action a, currently considered to be the maximum Q value, with probability (1- ε)*And the potential behavior is randomly explored from all K discrete alternative behaviors with a probability of epsilon:
Figure BDA0002611081380000041
wherein ε will follow the iterative process from εiniGradually decrease epsilonfin
Further, the residual capacity SOC of the CBESStUpdate to SOCt+1Judging the SOCt+1Whether or not to exceed [0,1 ]]The range is used for judging whether the range exceeds the limit or not, and the termination judgment index done of the iteration is calculated according to the rangetSimultaneously calculating the instant reward r after the actiontThe method specifically comprises the following steps: electric quantity SOC of CBESStUpdate to SOC t+1Judging whether the iteration is in a termination state or not, and calculating the instant reward r after the actiont(ii) a Using a binary variable done as an iteration termination judgment index used as an interruption index of each iteration process
Figure BDA0002611081380000042
In the formula, if the charge state in the energy storage operation process is out of limit, done of the iteration is equal to 1, otherwise, done is 0; and done-1 indicates termination and jumps out of the iteration, and done-0 indicates that the iteration is not terminated.
Further, said step eight calculates st、at、rt、st+1And then compares it with donetAll indexes are based onStoring the leaf node of the sumtree; if the quantity of the stored data reaches a preset small batch sampling capacity m, randomly sampling m samples from the stored data, calculating a current target Q value and an error thereof, and updating all hyper-parameters of the main competition Q network through gradient back propagation, wherein the current target Q value yjComprises the following steps:
Figure BDA0002611081380000051
adopting a proportional prioritization strategy, namely P (i) of the extraction probability of the ith sample is:
Figure BDA0002611081380000052
wherein alpha is [0,1 ]]The significance of the TD error is converted into a power exponent of the priority; if alpha is 0, converting into uniform random sampling; p is a radical ofiIs the priority of transition i, calculated as follows:
p(i)=|δi|+ζ
wherein
Figure BDA0002611081380000054
Is a positive deviation;
The bias is corrected using the significant sampling weights to obtain a mean square error loss function L that takes into account the sample priorityi(θ i). And finally, updating all parameters theta of the main competition Q network through gradient back propagation of the neural network:
Figure BDA0002611081380000053
ωj=(N·P(j))/maxiωi
θi=θi-1+α▽θiLii)
wherein omegajIS the IS weight of sample j; beta is a hyperparameter that gradually increases to 1.
The invention has the beneficial effects that: 1. the invention endows CBESS with strong online learning and decision-making ability in high uncertain environment, and solves the problem that iterative solution cannot be carried out due to continuous environment state and huge space by approximating the optimal action value function without depending on any analytic equation;
2. the collaborative optimization of the double-competition Q network structure and the priority playback strategy can effectively relieve the problem of model over-optimization estimation, obviously improve the accuracy of agent decision and the convergence robustness, accelerate the convergence speed of the algorithm and improve the on-line calculation efficiency.
Drawings
FIG. 1 is a flow chart of an intelligent online control method of a grid-connected shared energy storage system;
FIG. 2 is a diagram of a contention Q network;
FIG. 3 is a diagram of a sumtree data structure;
fig. 4 is a schematic diagram of an algorithm structure of a prior experience playback strategy.
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following descriptions.
As shown in fig. 1, the inventive data driving technique for online control decision of a grid-connected shared energy storage system includes the following steps:
s1: two multi-hidden-layer competition Q network models, namely a main competition Q network and a target competition Q network, are set up, and the input of the models is a feature vector s of an observation statetThe output corresponds to a in each action set AtOperating value Q(s)t,at). All parameters of the Q network, the capacity D of the data storage structure sumtree and the priority values of its leaf nodes are first initialized.
S2: establishing a Markov decision process of CBESS, mapping the charging and discharging behaviors of the CBESS into a reinforcement learning process based on action value iterative updating, and determining that 1) the control target of the algorithm is as follows: the power fluctuation of a grid-connected point of the micro-grid is stabilized as much as possible under the condition of maximizing the arbitrage of the energy storage market; 2) environmental status characteristicsCombining: the method comprises the steps of obtaining a pre-transaction electric quantity value of a distribution network/CBESS, wherein the pre-transaction electric quantity value is obtained by MG one-time economic dispatching and comprises a time sequence number of a current time interval, the residual electric quantity of the CBESS, a predicted sale/purchase electric price of a superior power grid and the pre-transaction electric quantity value of the distribution network/CBESS; 3) the reward function: energy arbitrage profit r realized by flexible charging and discharging comprising CBESS EAPTotal cost of operation and maintenance Co,mAnd punishment r of power fluctuation of grid-connected pointlineAnd energy storage SOC out-of-limit punishment rexc
S3: before each round of the iteration is started, the uncertain data including a load curve of the microgrid, renewable distributed generation output, market price signals and the like need to be initialized again;
s4: the micro-grid carries out pre-planning of each time interval based on the prediction data to obtain the amount of pre-transaction electric quantity between the time interval t and the CBESS/superior distribution network, namely Pt mg.CHE/Pt mg.gridAnd release the information to the outside; meanwhile, the agent of CBESS obtains the state feature vector s by sensing the external environmentt=[t,SOCt,prict b.pre,prict s.pre,Pt mg.CHE,Pt mg.grid]。
S5: using s in a master contention Q networktAs an input, Q value outputs corresponding to all actions are obtained. Selecting an optimal estimated Q value from the current Q value output by adopting an epsilon-greedy method to determine the corresponding action atAnd executed.
S6: residual capacity SOC of CBESStUpdate to SOCt+1Judging the SOCt+1Whether or not to exceed [0,1 ]]The range is used for judging whether the range exceeds the limit or not, and the termination judgment index done of the iteration is calculated according to the rangetSimultaneously calculating the instant reward r after the actiont
S7: the MG carries out secondary planning of the current time interval according to the tradable electric quantity actually fed back by the CBESS, determines the tradable electric quantity with an external system, and simultaneously gives pre-tradable electric quantities Pmg.CHE t +1 and Pmg.grid t +1 of the next time interval to be used as sensing state information of the agent in the next time interval; at this time, the state of the system is updated to s t+1
S8: calculating outst、at、rt、st+1And then compares it with donetAll indexes are sequentially stored in leaf nodes of sumtree. Once the quantity of the stored data reaches the preset small batch sampling capacity m, the random sampling of m samples is started, the current target Q value and the error thereof are calculated, and all the hyperparameters of the main competition Q network are updated through gradient back propagation.
S9: q network needs to recalculate and update the priority p of stored data in sumtree after updatingiAnd periodically copying the parameters of the main competition Q network to the target Q network, and simultaneously making the current state s equal to st+1. If S is in a termination state or the iteration round number T is reached, the iteration is finished, and the loop returns to S3 for circulation; otherwise go to step S5 to continue the iteration.
5.1 the concrete process of the step S1 is as follows:
the CBESS interacts with the environment under a control target to obtain feedback rewards by continuously sensing the power demand of the microgrid and the market environment. A multi-hidden-layer master competition Q network architecture having a sub-layer of state values of single neurons and a sub-layer of action dominance of K neurons is constructed, as shown in fig. 2. The corresponding target contention Q network architecture is consistent therewith. The activation function selects the ReLu function to speed up the convergence process. The normal initialization interlayer weight ω and the initialization bias b are all constants tending to 0. Forming a state characteristic vector s by the time sequence number, the charge state of the CBESS, the market price, the pre-trading electric quantity of the MG and the CBESS/superior distribution network tOutputting the optimal discretized charge-discharge action value Q as network inputtAnd finally, performing network training by preferentially playing back data to iteratively converge. In the energy storage intelligent decision method based on model-free reinforcement learning and data driving, a priority proportion sample playback method based on a sumtree data structure is adopted, and simultaneously, the strategy precision and the convergence speed can be remarkably improved after the method is compatible with DDQN, so that the algorithm robustness is increased; meanwhile, the application of the competitive network architecture can enable the agent to quickly identify correct actions during the strategy evaluation, and the method has higher calculation efficiency and considerable fitting precision and stronger self-adaptive capacity.
Sumtree is the binary tree structure shown in fig. 3. The root node is at the top level, the branch nodes are at the middle level, and only the leaf nodes at the bottom are responsible for storing samples. Each parent node contains the sum of its two child nodes. Thus, the root node is the sum of all priorities, denoted as ptotal. Since this data structure provides an efficient way to compute the priority cumulative sum, sumtree helps to efficiently store, update, and sample the scale variable. During storage, the acquired data is stored in the leaf nodes from left to right, and once the leaf nodes are filled, the old data overflows one by one from the left. One significant advantage of this approach is that the conversions do not need to be sorted by priority, greatly reducing the computational burden and facilitating real-time training. Before iteration, the capacity size of a sumtree leaf node needs to be determined, and the priority value of the leaf node needs to be initialized.
When the change of the environment state is sensed, the agent controls the CBESS to feed back a corresponding action at. The motion space of CBESS is divided into K discrete charge-discharge options P (K) be, and the motion space A is uniformly discretized
Figure BDA0002611081380000071
Wherein A is a set of all possible actions; pbe (k)Representing the k-th charge/discharge event in the CBESS uniform discrete motion space.
5.2 the concrete process of the step S2 is as follows:
establishing a Markov decision process of the CBESS, and mapping the charging and discharging behaviors of the CBESS into a reinforcement learning process based on action value iterative updating, wherein the method specifically comprises the following steps:
the residual capacity of the BESS changes continuously in the charging and discharging process, and the change quantity of the BESS is related to the charging and discharging capacity and self-discharging in the period. The recursive relationship of energy storage and charging is
SoC(t)=(1-σsdr)·SoC(t-1)+Pbe·(1-Lc)Δt/Ecap
The energy storage discharge process is shown below
SoC(t)=(1-σsdr)·SoC(t-1)-PbeΔt/[Ecap·(1-Ldc)]
In the formula: SoC (t) is the state of charge (SoC) of CBESS during t period; pbe(t) is the charge and discharge power of the CBESS during the period t; sigmasdrIs the self-discharge rate of the energy storage medium; l iscAnd LdcCharging and discharging losses for CBESS, respectively; Δ t is the duration of each calculation window.
The maximum allowable charge-discharge power of the CBESS at the moment t is determined by the charge-discharge characteristics of the CBESS and the residual charge state at the moment t, and simultaneously, the maximum allowable charge-discharge power meets the following constraint in the operation process:
SoCmin≤SoC(t)≤SoCmax
In the formula: SoC (system on chip)maxAnd SoCminRespectively, the upper and lower limits of the CBESS state of charge constraint.
Reinforcement learning is a learning that maps from an environment state to an action with the goal of getting the agent (agent) the maximum accumulated reward during interaction with the environment. The RL utilizes the Markov Decision Process (MDP) to simplify its modeling, the MDP typically being defined as a four-tuple (S, A, r, f), where: s is the set of all environmental states, StE, S represents the state of agent at the time t; a is a set of agent executable actions, atE is A to represent the action taken by agent at the time t; r is a reward function, rt~r(st,at) Indicates agent is in state stPerforming action atAn immediate prize value obtained; f is the state transition probability distribution function, st+1~f(st,at) Indicates agent is in state stPerforming action atTransition to the next state st+1The probability of (c). The goal of the Markov model is to find an optimal planning strategy V that maximizes the sum of expected rewards after an initialization state sπ*
Figure BDA0002611081380000081
In the formula, EπRepresents the expectation of value at strategy pi; 0<γ<1 is a decay coefficient in reinforcement learning that characterizes the importance of future rewards.
When the scale of the problem is small, the algorithm is relatively easy to solve. However, for practical problems, the state space is usually very large, the calculation cost of the conventional iterative solution is too high, and the disadvantages of difficult convergence, slow convergence speed, easy occurrence of over-optimal estimation and the like exist, so that the method provided by the invention needs to be utilized for improved solution. Corresponding to the online control data driving technology of the grid-connected shared energy storage system, the mapping relation is as follows:
(1) Characteristic of environmental state
Defining the environmental state feature vector sensed by CBESS at time t as stIs composed of
st=[t,SOCt be,prict b.pre,prict s.pre,Pt mg.CHE,Pt mg.grid]T,st∈S
Wherein t is a sequence number; prict b.pre/prict s.preRespectively represents the predicted selling and purchasing prices P of the superior power grid at the time of tt mg.CHE/Pt mg.gridRespectively representing the pre-transaction electric quantity between the micro-grid and the CBESS and the superior grid.
(2) Feedback rewards
CBESS in a given environment state s during continuous sensing and learningtAnd selecting action atThereafter, the single step instant prize r is obtainedtIncluded
3) CBESS achieves Energy arbitrage profits by charging during off-peak hours and then discharging during peak hours (EAP). After the actual transaction power with the micro-grid and the upper-level power grid is respectively determined, the reward benefit r is calculated according to the real-time priceEAP
2) In addition to the basic unit cost of CBESS cbeFurthermore, when its charge approaches a limit, it may still continue to operate resulting in increased costs. Finally, CBESS's operation and maintenance assemblyThis Co,mSee the following formula
C1=|Pbe|·cbe
Figure BDA0002611081380000091
4) CBESS has the ability to mitigate the negative impact of MG on the distribution grid. Therefore, a negative reward line with coefficient σ is added as a penalty to suppress the power (P) of the point of connectionexc_grid) Wave motion
rline=-σ·|Pexc_grid|
5) Once the action performed results in SOC exceeding [0,1 ]]A large penalty r must be given excTo prevent the agent from making unreasonable decisions in subsequent learning. Finally, the instant award rtIs defined as
Figure BDA0002611081380000092
5.3 the concrete process of the step S3 is as follows:
before each round of the combination iteration is started, uncertainty data including a load curve of the micro-grid, renewable distributed power generation output, market price signals and the like are initialized. Specifically, the actual values of a load curve, RDG output and market electricity price can be given, and the prediction errors are assumed to be subject to certain normal distribution, so that uncertainty fluctuation is represented.
5.4 the concrete process of the step S4 is as follows:
for the MG model, whose goal is to minimize the operating cost under the predicted price signal, the objective function of the Economic Dispatch (ED) model is as follows:
Figure BDA0002611081380000093
in the formula, T is a planning period; cCDG z is the power generation cost of the z-th CDG, ci esThe operation cost of the ith microgrid for energy storage is; PCDG z, t is the power of the z-th CDGAnd output, wherein Pes i, t is the charge and discharge power stored by the ith microgrid. Grid t/ps.grid t respectively represents the selling and purchasing prices of the superior distribution network at each time interval, Pt b.CHE/Pt s.CHEThe price for selling and purchasing electricity issued by the CBESS operator is indicated, respectively.
The micro-grid adopts a Mixed Integer Linear Programming (MILP) method according to the predicted data to obtain the transaction electric quantity P between the time interval and the CBESS and the superior distribution network t mg.CHE/Pt mg.gridAnd issuing the transaction information to the outside; meanwhile, the agent of CBESS obtains the state feature vector s by sensing the external environmentt=[t,SOCt,prict b.pre,prict s.pre,Pt mg.CHE,Pt mg.grid]
5.5 the concrete process of step S5 is:
using s in a master contention Q networktAs an input, Q value outputs corresponding to all actions are obtained. Selecting a corresponding action a in the current Q value output by adopting epsilon greedy methodtIn a state stPerforming a current action at(ii) a For the ε -greedy policy, first by setting the value of ε ∈ (0,1), then at the corresponding action, greedily select the optimal action a, currently considered to be the maximum Q value, with probability (1- ε)*And randomly exploring potential behaviors from all K discrete optional behaviors with a probability of epsilon
Figure BDA0002611081380000101
Where ε will follow the iterative process from εiniGradually decrease epsilonfinTo encourage more exploration early in the iteration and focus primarily on greedy convergence later so that the algorithm can converge stably.
5.6 the concrete process of step S6 is:
s6: electric quantity SOC of CBESStUpdate to SOCt+1Judging whether the iteration is in a termination state or not, and calculating the instant reward after the actionrt. Taking a binary variable done as an iteration termination judgment index used as an interruption index of each iteration process
Figure BDA0002611081380000102
In the formula, if the state of charge in the energy storage operation process is out of limit, done of the iteration is equal to 1, otherwise, done is 0. And done is 1 to indicate termination and jump out of the iteration, while done is 0 to indicate that the iteration is not terminated.
S7: MG carries on the secondary MILP to plan according to the actual tradable electric quantity that feedbacks of CBESS, confirm the electric quantity of trade with the external system in this time interval, give the advance trade electric quantity Pmg.CHE t +1 of the next time interval at the same time, Pmg.grid t +1 is regarded as the perception state information of the next time interval of acting; at this time, the state of the system is updated to st+1
S8: s obtained every time period t in the process of continuously iterating and updatingt、at、rt、st+1And a quintuple { s } consisting of a termination judgment index donet,at,rt,st+1Done is put in the leaf nodes of the sumtree in turn. If the storage quantity reaches the maximum capacity of the leaf nodes, the old data are overflowed according to the rolling and the new data are stored, so that the effectiveness of the samples is ensured. Once the number of samples reaches the number m of the training samples in the small batch, the random sampling of the m samples from the leaf nodes is started according to a priority playback mechanism
Figure BDA0002611081380000103
(j ═ 1,2 ·, m), and calculating the current target Q value y corresponding to each samplej
Figure BDA0002611081380000104
For the priority playback mechanism, i.e. more important sample data is played back with a higher frequency. Therefore, the TD error δ needs to be calculated and saved, and samples with larger absolute values of δ are easier to sample. A proportional prioritization strategy is adopted, which is a random adoption strategy between a pure greedy strategy and a uniform sampling strategy, namely that P (i) of the extraction probability of the ith sample is P (i)
Figure BDA0002611081380000111
Wherein α ∈ [0,1 ]]The significance of the TD error is converted to a power exponent of priority. If α is 0, it is converted to uniform random sampling. p is a radical ofiIs the priority of the conversion i, calculated as shown in the following equation
Figure BDA0002611081380000114
Wherein
Figure BDA0002611081380000113
Is a small positive deviation to ensure that some edge samples with TD error of 0 can still be extracted. The above process may cause the desired distribution of random updates to change, and thus the convergence solution to change. In view of this, significant sampling (IS) weights are used to correct the bias, resulting in a mean square error loss function L that takes into account sample priorityi(θ i). Finally, all parameters theta of the main competition Q network are updated through gradient back propagation of the neural network
Figure BDA0002611081380000112
ωj=(N·P(j))/maxiωi
θi=θi-1+α▽θiLii)
Wherein omegajIS weight of sample j; β is a hyperparameter that gradually increases to 1. Fig. 3 summarizes the preferred empirical playback algorithm structure.
S9: q network recalculation after update and updating of priority p of stored data in sumtreeiAnd periodically connecting the main competition Q networkCopying the parameters of the network to the target Q network, and simultaneously making the current state s equal to st+1If S is in the termination state, the current iteration is finished, or the iteration number T is reached, all iterations are finished and returned to S3 for circulation, otherwise, the iteration is continued in the step S5.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. The intelligent online control method of the grid-connected shared energy storage system is characterized by comprising the following steps:
step one, building two multi-hidden-layer competition Q network models, namely a main competition Q network and a target competition Q network, and inputting a characteristic vector s of an observation statetThe output corresponds to a in each action set AtOperating value Q(s)t,at);
Step two, establishing a Markov decision process of the CBESS, and mapping the charging and discharging behaviors of the Markov decision process into a reinforcement learning process based on action value iterative updating; determining environmental state characteristics and an instant reward function;
step three, entering loop iterative learning of E rounds, and starting to reinitialize the load curve of the MG, the output of the RDG, the market price and the SOC for sharing energy storage in each round;
Step four, the MG executes the first planning scheduling in the round to obtain a first state vector s obtained by the agent sensing environment of the pre-transaction amount CBESS of the external systemt
Step five, using s in the main competition Q networktObtaining Q value outputs corresponding to all actions as input, and selecting an optimal estimated Q value from the current Q value outputs by an epsilon-greedy method to determine the optimal estimated Q valueCorresponding action atAnd executing;
step six, residual electric quantity SOC of CBESStUpdate to SOCt+1Judging the SOCt+1Whether or not to exceed [0,1 ]]The range is used for judging whether the range exceeds the limit or not, and the termination judgment index done of the iteration is calculated according to the rangetSimultaneously calculating the instant reward r after the actiont
Step seven, the MG performs secondary planning of the current time interval according to the actually fed back transactable electric quantity by the CBESS, determines the transactable electric quantity with an external system, and simultaneously gives the pre-transacted electric quantity Pmg.CHE t +1 and Pmg.grid t +1 of the next time interval to be used as the sensing state information of the agent in the next time interval; and updating the state of the system to st+1
Step eight, calculating st、at、rt、st+1And then compares it with donetAll indexes are sequentially stored in leaf nodes of sumtree; if the quantity of the stored data reaches a preset small batch sampling capacity m, randomly sampling m samples from the stored data, calculating a current target Q value and an error of the current target Q value, and updating all hyper-parameters of the main competition Q network through gradient back propagation;
Step nine, after Q network is updated, recalculating and updating priority p of data stored in sumtreeiCopying the parameters of the main competition Q network to the target Q network, and simultaneously setting the current state s to be st+1(ii) a If s is in a termination state or the number of iteration rounds T is reached, the iteration of the current round is finished, and the step three is returned to for circulation; otherwise, go to step five to continue the iteration.
2. The intelligent online control method of the grid-connected shared energy storage system according to claim 1, wherein the master contention Q network is a multi-hidden-layer master contention Q network architecture having a state value sublayer of a single neuron and an action advantage sublayer of K neurons, and the activation function selects a ReLu function to accelerate a convergence process; normally initializing interlayer weight omega, and initializing bias b to be constant tending to 0; forming a state characteristic vector s by the time sequence number, the charge state of the CBESS, the market price, the pre-trading electric quantity of the MG and the CBESS/superior distribution networktOutputting the optimal discretized charge-discharge action value Q as network inputtAnd finally, performing network training by preferentially playing back data to iteratively converge.
3. The intelligent online control method of a grid-connected shared energy storage system according to claim 1, wherein the action set a is:
Dividing the motion space of CBESS into K discrete charge and discharge options Pbe (k)Uniform discretization of the motion space A
Figure FDA0003547888980000021
Wherein A is a set of all possible actions; pbe (k)Representing the kth charge/discharge event in the CBESS uniform discrete event space.
4. The intelligent online control method of the grid-connected shared energy storage system according to claim 1, wherein the markov decision process for building the CBESS maps the CBESS charging and discharging behavior to a reinforcement learning process based on iterative update of action value, and specifically comprises:
the residual electric quantity of the CBESS is continuously changed in the charging and discharging process, and the change quantity of the residual electric quantity is related to the charging and discharging electric quantity and self-discharging in the period; the recursion relation of the stored energy charging is SoC (t) ═ 1-sigmasdr)·SoC(t-1)+Pbe·(1-Lc)Δt/Ecap
The energy storage discharge process is shown below
SoC(t)=(1-σsdr)·SoC(t-1)-PbeΔt/[Ecap·(1-Ldc)]
In the formula: soc (t) is the residual capacity of CBESS during the period t; pbe(t) is the charge and discharge power of the CBESS during the period t; sigmasdrIs the self-discharge rate of the energy storage medium; l iscAnd LdcCharging and discharging losses for CBESS, respectively; delta t is the duration of each calculation window;
the maximum allowable charge-discharge power of the CBESS at the moment t is determined by the charge-discharge characteristics of the CBESS and the residual charge state at the moment t, and simultaneously, the maximum allowable charge-discharge power meets the following constraint in the operation process:
SoCmin≤SoC(t)≤SoCmax
In the formula: SoC (system on chip)maxAnd SoCminRespectively the upper limit and the lower limit of the CBESS charge state constraint;
the environmental state characteristics are as follows:
defining the environmental state feature vector sensed by CBESS at time t as stIs composed of
Figure FDA0003547888980000022
In the formula, t is a time sequence number;
Figure FDA0003547888980000023
respectively represents the predicted selling price and purchasing price of the superior power grid at the time t,
Figure FDA0003547888980000024
Figure FDA0003547888980000025
respectively representing the pre-transaction electric quantity between the micro-grid and the CBESS and the superior grid;
1) the instant reward function is as follows: CBESS gains energy arbitrage by charging during off-peak hours and then discharging during peak hours; after the actual transaction power with the micro-grid and the upper-level power grid is respectively determined, the reward benefit r is calculated according to the real-time priceEAP
Total cost of operation and maintenance C of CBESSo,mSee the following formula
C1=|Pbe|·cbe
Figure FDA0003547888980000031
Adding a negative reward line with coefficient sigma as penalty to suppress power P of point of connectionexc_gridWave motion
rline=-σ·|Pexc_grid|
If the action performed results in SOC exceeding [0,1 ]]Giving a large penalty of rexcTo prevent the agent from making unreasonable decisions in subsequent learning; instant reward rtComprises the following steps:
Figure FDA0003547888980000032
5. the method of claim 1, wherein the MG performs a first scheduling in a round to obtain a first state vector s obtained by a proxy-aware environment of a pre-transaction CBESS with an external system tThe method comprises the following steps: for the MG model, which aims to minimize the running cost under the predicted price signal, the objective function of the economic dispatch model is as follows:
Figure FDA0003547888980000033
in the formula, T is a planning period;
Figure FDA0003547888980000034
is the power generation cost of the z-th CDG, ci esThe operation cost of the ith microgrid for energy storage is;
Figure FDA0003547888980000035
is the power output of the z-th CDG,
Figure FDA0003547888980000036
the charging and discharging power of the ith microgrid energy storage;
Figure FDA0003547888980000037
Figure FDA0003547888980000038
respectively represents the selling price and the purchasing price of the superior distribution network in each period,
Figure FDA0003547888980000039
respectively representing the selling price and the purchasing price issued by the CBESS operator;
the micro-grid adopts a mixed integer linear programming method according to the prediction data to obtain the transaction electric quantity between the time interval and the CBESS and the superior distribution network
Figure FDA00035478889800000310
And issuing transaction information to the outside; the agent of CBESS obtains the state feature vector by sensing the external environment
Figure FDA00035478889800000311
6. The method according to claim 1, wherein the s is used in a main competitive Q networktObtaining Q value outputs corresponding to all actions as input, selecting an optimal estimated Q value from the current Q value outputs by an epsilon-greedy method, and determining the action a corresponding to the optimal estimated Q valuetAnd executing, including the following processes:
using s in a master contention Q network tAs input, obtaining Q value output corresponding to all actions; selecting a corresponding action a in the current Q value output by adopting epsilon greedy methodtIn a state stPerforming a current action at(ii) a For the ε -greedy policy, first by setting the value of ε ∈ (0,1), then at the corresponding action, greedily select the optimal action a, currently considered to be the maximum Q value, with probability (1- ε)*And the potential behavior is randomly explored from all K discrete alternative behaviors with a probability of epsilon:
Figure FDA0003547888980000041
wherein ε will follow the iterative process from εiniGradually decrease epsilonfin
7. The intelligent online control method of the grid-connected shared energy storage system according to claim 1, wherein the residual capacity SoC of the CBESStUpdate to SoCt+1Judging SoCt+1Exceeds [0,1 ]]The range is used for judging whether the range exceeds the limit or not, and the termination judgment index done of the iteration is calculated according to the rangetSimultaneously calculating the instant reward r after the actiontThe method specifically comprises the following steps: electric quantity SoC of CBESStUpdate to SoCt+1Judging whether the iteration is in a termination state or not, and calculating the instant reward r after the actiont(ii) a Using a binary variable done as an iteration termination judgment index used as an interruption index of each iteration process
Figure FDA0003547888980000042
In the formula, if the state of charge in the energy storage operation process is out of limit, done of the iteration is equal to 1, otherwise, 0; and done is 1 to indicate termination and jump out of the iteration, while done is 0 to indicate that the iteration is not terminated.
8. The method according to claim 1, wherein s is calculated in step eightt、at、rt、st+1And then compares it with donetAll indexes are sequentially stored in leaf nodes of sumtree; if the quantity of the stored data reaches the preset small batch sampling capacity m, randomly sampling m samples from the stored data, calculating the current target Q value and the error of the current target Q value, and updating all the main competitive Q networks through gradient back propagationHyperparametric, wherein said current target Q value yjComprises the following steps:
Figure FDA0003547888980000043
adopting a proportional prioritization strategy, namely P (i) of the extraction probability of the ith sample is:
Figure FDA0003547888980000044
wherein alpha is [0,1 ]]The importance of the TD error is converted into a power exponent of the priority; if alpha is 0, converting into uniform random sampling; p is a radical ofiIs the priority of transition i, calculated as follows:
Figure FDA0003547888980000045
wherein
Figure FDA0003547888980000046
Is a positive deviation;
the bias is corrected using significant sampling weights to obtain a mean square error loss function L that takes into account sample priorityi(thetai), finally updating all parameters theta of the main competitive Q network by gradient back propagation of the neural network:
Figure FDA0003547888980000047
ωj=(N·P(j))/max ωi
Figure FDA0003547888980000051
wherein ω isjIS weight of sample j; beta isA hyperparameter increasing incrementally to 1.
CN202010754472.1A 2020-07-30 2020-07-30 Intelligent on-line control method of grid-connected shared energy storage system Active CN112003269B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010754472.1A CN112003269B (en) 2020-07-30 2020-07-30 Intelligent on-line control method of grid-connected shared energy storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010754472.1A CN112003269B (en) 2020-07-30 2020-07-30 Intelligent on-line control method of grid-connected shared energy storage system

Publications (2)

Publication Number Publication Date
CN112003269A CN112003269A (en) 2020-11-27
CN112003269B true CN112003269B (en) 2022-06-28

Family

ID=73462676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010754472.1A Active CN112003269B (en) 2020-07-30 2020-07-30 Intelligent on-line control method of grid-connected shared energy storage system

Country Status (1)

Country Link
CN (1) CN112003269B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112671033B (en) * 2020-12-14 2022-12-23 广西电网有限责任公司电力科学研究院 Priority-level-considered microgrid active scheduling control method and system
CN112670982B (en) * 2020-12-14 2022-11-08 广西电网有限责任公司电力科学研究院 Active power scheduling control method and system for micro-grid based on reward mechanism
CN113126498A (en) * 2021-04-17 2021-07-16 西北工业大学 Optimization control system and control method based on distributed reinforcement learning
CN114048576B (en) * 2021-11-24 2024-05-10 国网四川省电力公司成都供电公司 Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid
CN114285854B (en) * 2022-03-03 2022-07-05 成都工业学院 Edge computing system and method with storage optimization and security transmission capability
CN116316755B (en) * 2023-03-07 2023-11-14 西南交通大学 Energy management method for electrified railway energy storage system based on reinforcement learning
CN117541036B (en) * 2024-01-10 2024-04-05 中网华信科技股份有限公司 Energy management method and system based on intelligent park

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150184549A1 (en) * 2013-12-31 2015-07-02 General Electric Company Methods and systems for enhancing control of power plant generating units

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
含储能系统的配电网电压调节深度强化学习算法;史景坚等;《电力建设》;20200301(第03期);全文 *

Also Published As

Publication number Publication date
CN112003269A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN112003269B (en) Intelligent on-line control method of grid-connected shared energy storage system
CN111884213B (en) Power distribution network voltage adjusting method based on deep reinforcement learning algorithm
Yan et al. Deep reinforcement learning for continuous electric vehicles charging control with dynamic user behaviors
CN110059844B (en) Energy storage device control method based on ensemble empirical mode decomposition and LSTM
CN106600059B (en) Intelligent power grid short-term load prediction method based on improved RBF neural network
CN112614009B (en) Power grid energy management method and system based on deep expectation Q-learning
CN109878369B (en) Electric vehicle charging and discharging optimal scheduling method based on fuzzy PID real-time electricity price
CN110071502B (en) Calculation method for short-term power load prediction
Huang et al. A control strategy based on deep reinforcement learning under the combined wind-solar storage system
CN110751318A (en) IPSO-LSTM-based ultra-short-term power load prediction method
CN110414725B (en) Wind power plant energy storage system scheduling method and device integrating prediction and decision
CN112434848A (en) Nonlinear weighted combination wind power prediction method based on deep belief network
CN115714382A (en) Active power distribution network real-time scheduling method and device based on security reinforcement learning
CN113919545A (en) Photovoltaic power generation power prediction method and system with integration of multiple data models
CN115345380A (en) New energy consumption electric power scheduling method based on artificial intelligence
CN112381359A (en) Multi-critic reinforcement learning power economy scheduling method based on data mining
CN115795992A (en) Park energy Internet online scheduling method based on virtual deduction of operation situation
CN114648170A (en) Reservoir water level prediction early warning method and system based on hybrid deep learning model
CN115049102A (en) Electricity price prediction method and device, mobile terminal and storage medium
CN113972645A (en) Power distribution network optimization method based on multi-agent depth determination strategy gradient algorithm
CN117060408A (en) New energy power generation prediction method and system
CN116937565A (en) Distributed photovoltaic power generation power prediction method, system, equipment and medium
CN116683530A (en) Wind-light-containing hybrid type pumping and storing station cascade reservoir random optimization scheduling method
CN114648178B (en) Operation and maintenance strategy optimization method of electric energy metering device based on DDPG algorithm
CN114971250B (en) Comprehensive energy economy dispatching system based on deep Q learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant