CN112614009B - Power grid energy management method and system based on deep expectation Q-learning - Google Patents

Power grid energy management method and system based on deep expectation Q-learning Download PDF

Info

Publication number
CN112614009B
CN112614009B CN202011418334.2A CN202011418334A CN112614009B CN 112614009 B CN112614009 B CN 112614009B CN 202011418334 A CN202011418334 A CN 202011418334A CN 112614009 B CN112614009 B CN 112614009B
Authority
CN
China
Prior art keywords
learning
power grid
energy management
neural network
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011418334.2A
Other languages
Chinese (zh)
Other versions
CN112614009A (en
Inventor
陈振
韩晓言
丁理杰
魏巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Sichuan Electric Power Co Ltd
Original Assignee
Electric Power Research Institute of State Grid Sichuan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Sichuan Electric Power Co Ltd filed Critical Electric Power Research Institute of State Grid Sichuan Electric Power Co Ltd
Priority to CN202011418334.2A priority Critical patent/CN112614009B/en
Publication of CN112614009A publication Critical patent/CN112614009A/en
Application granted granted Critical
Publication of CN112614009B publication Critical patent/CN112614009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/004Generation forecast, e.g. methods or systems for forecasting future energy generation
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/50Photovoltaic [PV] energy
    • Y02E10/56Power conversion systems, e.g. maximum power point trackers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The application discloses a power grid energy management method and a power grid energy management system based on a double-depth expected Q-learning network algorithm, wherein the power grid energy management method and the power grid energy management system are characterized in that firstly, a predicted point photovoltaic output uncertainty is modeled based on a Bayesian neural network, and probability distribution of the photovoltaic output is obtained; inputting probability distribution of photovoltaic output into a power grid energy management model based on a double-depth expected Q-learning network algorithm to obtain a corresponding photovoltaic power generation output strategy; the system operates each photovoltaic output device according to a photovoltaic power generation output strategy; according to the application, the economic dispatching problem of the micro-grid is simulated into a Markov decision process, an objective function and constraint conditions are mapped into a reward and punishment function of reinforcement learning, the optimal decision is obtained by utilizing the learning and environment interaction capabilities of the objective function and the constraint conditions, the state random transition is properly considered in the Markov decision process by means of uncertainty modeling of the photovoltaic power generation output in the learning environment through the Bayesian neural network, and the convergence rate of an algorithm is remarkably improved.

Description

Power grid energy management method and system based on deep expectation Q-learning
Technical Field
The application relates to the technical field of power grid energy management systems, in particular to a power grid energy management method and system based on deep expectation Q-learning.
Background
With the development of renewable energy power generation technology, the permeability of a distributed power supply such as a photovoltaic power supply in a power system is continuously improved, and problems and even challenges are brought to the safe and economic operation of the power system. The uncertainty and time variability of the output of the distributed power supply such as the photovoltaic power supply are influenced by the surrounding environment factors such as climate and the like, and difficulty is brought to the establishment of a scheduling plan. How to properly model and efficiently solve the uncertainty of the photovoltaic output is an important issue worthy of research.
In the aspect of uncertainty modeling, the current common methods mainly comprise a random model, a fuzzy model, an interval number model and an opportunity constraint model. The fitting effect of the random model is limited by the kind of the selected distribution function; the interval number model describes an uncertainty set by introducing interval numbers, avoids risks under extreme conditions, but the required strategies are conservative, and the economical efficiency of system operation is sacrificed; the opportunistic constraint model attempts to balance minimizing risk with maximizing economic benefit by translating the scheduling model of uncertainty into a deterministic optimization problem.
Considering that the solution of the uncertainty optimization model is quite complex, the nonlinear optimization model is generally linearized and then solved, and the current common methods comprise mixed integer programming, dynamic programming, random linear programming, improved differential evolution algorithm, moth fire suppression algorithm and the like. The classical optimization algorithm is difficult to obtain the global optimal solution of the nonlinear optimization model, and the heuristic optimization algorithm generally takes a long time. In this context, for a photovoltaic power generation high permeability microgrid, more accurate modeling of photovoltaic power generation output is required and efficient solving algorithms are sought.
Deep reinforcement learning is a rapidly evolving branch of artificial intelligence technology that can automatically adapt to changes in uncertainty factors by constantly improving strategies with environmental interactions, feedback learning. Compared with the traditional algorithm, the deep reinforcement learning algorithm does not need to rely on an explicit objective function, evaluates the decision behavior instead of a reward function, and can give out a corresponding control scheme and an optimization strategy according to different operation requirements and optimization targets to realize real-time decision.
Disclosure of Invention
In order to realize proper modeling and efficient solving of uncertainty of photovoltaic output, the application provides a power grid energy management method and system based on a deep expectation Q reinforcement learning algorithm, and realizes real-time energy and economic dispatching of a micro-grid.
The application is realized by the following technical scheme:
the scheme provides a power grid energy management method based on a double-deep expected Q-learning network algorithm, which comprises the following steps of:
s1, modeling uncertainty of photovoltaic output of a predicted point based on a Bayesian neural network and obtaining probability distribution of the photovoltaic output;
s2, inputting probability distribution of photovoltaic output into a power grid energy management model based on a double-depth expected Q-learning network algorithm to obtain a corresponding photovoltaic power generation output strategy;
and S3, operating all the photovoltaic output devices according to the photovoltaic power generation output strategy.
The further optimization scheme is that the power grid energy management model establishment process based on the double-depth expected Q-learning network algorithm is as follows:
the method comprises the following steps of T1, only considering an energy storage system as a controllable resource, taking the lowest daily operation cost as an objective function and meeting the operation constraint of a micro-grid, and establishing a power grid energy management model;
t2, modeling the power grid energy management model in the T1 as a Markov decision process;
thirdly, based on probability distribution of photovoltaic output, considering a random process of state transition, providing a double-depth expected Q-learning network algorithm by modifying an iteration rule of a Q value on the basis of a traditional model-free algorithm, and solving a Markov decision process;
and T4, setting reasonable parameters to ensure convergence of a neural network learning process, and training a neural network based on a double-depth expected Q-learning network algorithm to obtain a power grid energy management model based on the double-depth expected Q-learning network algorithm.
The further optimization scheme is that the specific modeling process of the predicted point photovoltaic output uncertainty based on the Bayesian neural network in the S1 is as follows:
s11, information of decisive factors, persistence influence factors and bursty influence factors of the predicted points is read, and data preprocessing is carried out;
s12, inputting the preprocessed predicted point decisive factor data and the persistence influence factor data into a deep full-connection layer of the Bayesian neural network, and inputting the preprocessed abrupt influence factor data into a probability layer of the Bayesian neural network for modeling;
s13, obtaining photovoltaic output probability distribution of the predicted point after multiple model training.
The further optimization scheme is that the objective function with the lowest daily operation cost in T1 is as follows: the daily operation cost is the sum of the electricity purchasing cost and the operation cost of the energy storage system in the dispatching period, and is expressed as:
wherein: t is the number of scheduling periods; x is x t The amount of electricity x needed to be exchanged with the main grid for period t t And > 0 represents purchasing electricity from the main power grid, and conversely selling electricity to the main power grid; c b,t /c g,t Representing prices for buying/selling electricity from/to the main grid during period t; τ t For the operation cost of the energy storage system in the period t, || + As a positive function.
The further optimization scheme is that the micro-grid operation constraint in the T1 comprises the following steps: power balance constraints, energy storage system operating constraints, and battery state constraints during a scheduling period.
The further optimization scheme is that the specific modeling process of the Markov decision process in T2 comprises the following steps:
constructing a state space by considering the diversity and the necessity of the system variables;
taking charge and discharge of the energy storage system and the action of buying and selling electric quantity to the power grid into consideration to ensure the power balance inside the system to construct an action space;
mapping the objective function into a rewarding decision function;
the discount rate takes a fixed value of 0.9 in calculation;
the state transition probability is expressed as the probability of the photovoltaic output of the next state.
The further optimization scheme is that the specific method of the step T3 is as follows:
introducing an experience playback mechanism on the basis of a reinforcement learning Q-learning algorithm, storing rewards and state updating conditions obtained by each interaction with the environment, and obtaining an approximate Q value after the parameters of the neural network are converged; selecting decoupling actions of the estimated Q network and the target Q network and calculating a target Q value;
a double-deep expected Q-learning network algorithm is provided on the basis of a double-deep Q-learning network, a Bayesian neural network and deep reinforcement learning are combined, a stochastic process of state transition is represented by the Bayesian neural network, and a Q expected value in a stochastic state is utilized to update the Q network.
The further optimization scheme is that the specific process of updating the Q network by using the Q expected value in the random state is as follows:
firstly, selecting an energy storage system scheduling strategy in an estimated Q network;
then, updating the Q value in the target Q network;
simplifying the model and discretizing the probability density function.
The further optimization scheme is that reasonable parameters are set in the T4 to ensure that an experience playback pool, an exploration rate and a learning rate are required to be considered when the neural network learning process converges.
The application also provides a grid energy management system based on the double-deep expectation Q-learning network algorithm, which comprises:
the probability distribution acquisition device models uncertainty of photovoltaic output of the predicted point based on the Bayesian neural network and acquires probability distribution of the photovoltaic output;
the first modeling device only considers the energy storage system as a controllable resource, takes the lowest daily operation cost as an objective function and meets the operation constraint of the micro-grid, and establishes a power grid energy management model;
the second modeling device models the power grid energy management model into a Markov decision process;
the solving device considers the random process of state transition, proposes a double-depth expected Q-learning network algorithm by modifying the iteration rule of the Q value on the basis of the traditional model-free algorithm, and solves the Markov decision process;
the model training device sets reasonable parameters to ensure convergence of a neural network learning process, trains a neural network based on a double-depth expected Q-learning network algorithm to obtain a power grid energy management model based on the double-depth expected Q-learning network algorithm;
the power grid energy management system controls all photovoltaic output devices based on a photovoltaic power generation output strategy obtained by a power grid energy management model of a double-depth expected Q-learning network algorithm.
The principle of the application is as follows:
1. modeling the uncertainty of the photovoltaic output of the predicted point based on the Bayesian neural network and obtaining probability distribution of the photovoltaic output;
the Bayesian neural network can obtain a relatively stable prediction model according to a relatively small data volume, so that the fitting problem can not occur; meanwhile, the weights and the biases of the neurons of the probability layer obey a certain probability distribution, and the capability of describing uncertainty variables is provided. The method is characterized in that the photovoltaic output prediction based on the Bayesian neural network needs to analyze various influencing factors, the factors influencing the photovoltaic output are of various types, and the step is to model the photovoltaic output in a classification way:
(1) Decisive factor
The intensity of the illumination radiation is a decisive factor influencing the photovoltaic output. The photovoltaic output can be obtained by the following formula.
P PV =φAη
Wherein: phi is the illumination radiation intensity; a is the total area of the photovoltaic array; η is the photoelectric conversion efficiency; a and η are fixed parameters of the photovoltaic panel.
(2) Persistence influencing factor
Sustainability-influencing factors refer to temperature, relative humidity, wind speed, etc. that can affect photovoltaic output over a longer period of time. These factors often cover a period of time greater than the dispatch period for the effect of the photovoltaic output, and thus their effect on the photovoltaic output is mined from historical data. Because the data is complex, the characteristic dimension is high, the relation between the data and the photovoltaic output is not linear, the training difficulty of the neural network can be increased by directly inputting the data, and the data needs to be preprocessed through a regression analysis module and a characteristic extraction module. First, their pearson coefficients with the photovoltaic output are calculated, and the quantitative relationship of the interdependence of temperature, wind speed, relative humidity and photovoltaic output is determined. And then, the correlation coefficient of the persistence influencing factor and the photovoltaic output is determined by the predicted time interval, so that the correlation coefficients of the temperature, the relative humidity, the wind speed and the photovoltaic output in different time periods are obtained through the learning of the historical data. Finally, mapping the multidimensional features into low dimensions through the deep fully connected nerve layer, and guaranteeing the integrity of the features while reducing the complexity of the model and improving the training efficiency.
(3) Factors of bursty influence
The sudden influence factors can influence the photovoltaic output in a short time, such as haze, sports cloud layer and the like. Such factors generally cover a period of time less than the schedule period of time for the photovoltaic output effect. The influence of the sudden influence factors on the photovoltaic output is only reflected between adjacent time periods, namely the photovoltaic output of the predicted point has a certain relation with the output value of the photovoltaic at the moment before the predicted point, and the correlation between the photovoltaic output of the predicted point and the photovoltaic output of the last time period of the predicted point is highest. Therefore, the output data of the predicted point at the previous moment is input into the Bayesian neural network, so that the data redundancy caused by multi-period input is avoided.
And inputting the temperature, wind speed and relative humidity data subjected to regression analysis into a deep full-connection layer to realize feature extraction and data dimension reduction, and inputting the maximum photovoltaic output prediction result and the extracted features into the deep full-connection layer simultaneously to be used as a probability layer input of a Bayesian neural network together with the photovoltaic output of the previous period of the prediction point.
2. Only considering the energy storage system as a controllable resource, taking the lowest daily operation cost as an objective function and meeting the operation constraint of the micro-grid, and establishing a power grid energy management model;
as shown in fig. 3, controllable devices in a microgrid generally include energy storage systems, controllable loads, electric vehicles involved in scheduling, and the like. The application focuses on the modeling and solving of the micro-grid random scheduling based on deep reinforcement learning, so that only the energy storage system is considered as a controllable resource. For scenes containing other controllable devices, only the dimension of the action in the Markov decision process needs to be changed on the basis of the model of the application:
(1) Objective function
And taking the lowest daily operation cost as an objective function, and solving an energy management strategy of the micro-grid. The daily operation cost is the sum of the electricity purchase cost and the operation cost of the energy storage system in the dispatching period, and can be defined as follows:
wherein: t is the number of scheduling periods; x is x t The amount of electricity x needed to be exchanged with the main grid for period t t And > 0 represents purchasing electricity from the main power grid, and conversely selling electricity to the main power grid; c b,t /c g,t Representing prices for buying/selling electricity from/to the main grid during period t; τ t And the operation cost of the energy storage system is t time period. I. + As a positive function.
(2) Constraint conditions
1) Power balance constraint
x t -P t L +P t PV -P t ESS =0
Wherein: p (P) t PV The power generation output of the photovoltaic at the moment t is a random variable; p (P) t ESS For the power of the energy storage battery at the moment t, when P t ESS > 0 represents charging of the energy storage system and vice versa discharging; p (P) t L The load power at time t.
2) Energy storage system operation constraints
β min <β t <β max
β t+1 =β tc P t ch Δt-P t disd Δt
Wherein: beta t Representing the state of charge of the energy storage system at time t, beta min And beta max Respectively representing the minimum value and the maximum value allowed by the charge state of the energy storage system; p (P) t ch And P t dis Respectively representing the charge and discharge power of the energy storage system; η (eta) c And eta d Respectively representing the charge and discharge efficiency of the energy storage system;and->Respectively represent the maximum value of the charge and discharge power of the energy storage system.
Due to the influence of service life attenuation and capacity attenuation of the energy storage system, the electricity-electricity cost of the energy storage system needs to be considered in the process of optimizing and scheduling. The electricity-measuring cost is the energy-storing cost calculated by leveling the cost and the generated energy in the whole life cycle of the energy-storing system. Defining the electricity-measuring cost of the energy storage system when operating as lambda, the operating cost of the energy storage system in the t period can be expressed as:
τ t =λ|P t ESS |
3) Battery state constraints during scheduling periods
β 0 =β T
Wherein: beta T Is the state of charge of the energy storage system at the end of the dispatch period, beta 0 Is the state of charge at which the scheduling period begins.
The model is oriented to a small micro-grid, as shown in figure 2, all electric equipment is powered by the same distribution network feeder line, and the geographic position is relatively close, so that the constraint of power flow is not needed to be considered.
In the model, as the photovoltaic output is an uncertainty variable, the objective function is a desired value, and the corresponding random optimization model can be expressed as follows:
3. modeling a grid energy management model as a markov decision process;
when the deep reinforcement learning algorithm is adopted to solve the economic dispatch model, firstly, the power grid energy management model needs to be modeled as a Markov decision process:
(1) State space
The state is the observable variable. When a state space is constructed, the diversity and the necessity of system variables are simultaneously considered, and the state at the moment t comprises the state of charge of an energy storage system in the micro-grid, real-time load power, real-time photovoltaic power generation power and the predicted value of photovoltaic output at the next moment. The output power of the next period of the photovoltaic with uncertainty is represented by probability distribution output by a Bayesian neural network, and the state at the time t can be represented as: { beta } t ,P t PV ,P t L }
(2) Action space
The action can adjust the variable. In the model, the power balance inside the system is ensured through the actions of charging and discharging the energy storage system and buying and selling the electric quantity to the power grid. Wherein the main network supports the micro-grid to ensure the energy balance inside the micro-grid, so that the action at time t can be expressed as follows:
wherein the first n elements and the last n elements represent the discharge and charge of the energy storage system, respectively.
(3) Rewards
In deep reinforcement learning, optimization objectives are mapped into rewards decision functions. According to an objective function modeled by the power grid energy management model, the rewards at the time t are set as follows:
wherein:indicating rewards for buying/selling electricity from/to the grid,/->Is an energy storage systemThe operation of the system rewards.
Can be expressed as: />
Awarding of energy storage systemsComprising an operation cost tau t And penalty v against operating constraints t Setting a violation punishment item upsilon for the state of charge constraint t The definition is as follows: upsilon (v) t (s∈ψ,P t ESS )=-δ*|P t ESS |
Wherein: delta represents the penalty per unit cost and can be represented by a larger number. And psi is a set of violation states when the system operates, and mainly comprises the state of charge of the energy storage system is out of limit. In period t, the violation state may be expressed as:
Δβ>β maxt
Δβ>β tmin
wherein Δβ=η c |P t ESS | + Δt+|-P t ESS | +d Δt。
For battery state constraints during a scheduling period, a larger penalty Γ is set if the battery state is not equal to the initial state at the end of the period. the battery run rewards for period t can be expressed as:
the rewards within one scheduling period may be expressed as:
(4) State transition probability and discount rate
In the Markov decision process, the discount rate is the attention degree of future rewards, and a fixed value of 0.9 is taken in calculation. After the state s and the action a are selected, the state of charge of the energy storage system in the next state can be obtained by the operation constraint of the energy storage system, the real-time load power can be directly read, and the state transition probability can be expressed as the probability of the photovoltaic output of the next state.
4. Considering a state transition random process, providing a double-depth expected Q-learning network algorithm by modifying an iteration rule of a Q value on the basis of a traditional model-free algorithm, and solving a Markov decision process;
the model-free reinforcement learning algorithm obtains a single fixed state transition process according to the interaction of the intelligent agent and the environment, and ignores the random problem of state transition in the learning environment. When the state variable of the reinforcement learning contains uncertain factors, the random transition of the neglected state can influence the convergence speed of the deep reinforcement learning algorithm, so the application provides a double-deep expected Q-learning network algorithm. The dual-deep expectation Q-learning network algorithm combines Bayesian neural networks with deep reinforcement learning, and updates the Q network with the Q expectation in the random state by representing the random process of state transition with the Bayesian neural network. The flow of the algorithm is shown in fig. 4;
(1) Q-learning algorithm in reinforcement learning
Based on the state s, the learning body selects the action a by using an epsilon-greedy method to obtain the reward r (s, a), and updates the cost function Q (s, a) in the state s after entering the state s', which can be expressed as follows:
wherein epsilon is the exploration probability and gamma is the attenuation factor.
(2) Dual deep Q learning network (double deep Q network, DDQN) algorithm
By introducing an experience playback mechanism, rewards and state updating conditions obtained by interaction with the environment each time are saved, and when the neural network parameters are converged, an approximate Q value is obtained, and because the Q value is often overestimated, the overestimation phenomenon is avoided by utilizing selection of decoupling actions of an estimated Q network and a target Q network and calculation of the target Q value.
The specific algorithm can be expressed as follows:
Q(s,a;θ t )=r(s,a)+γ*Q(s',a;θ t )
wherein: θ e To estimate parameters of Q network, θ t Is a parameter of the target Q network. Transmitting parameters of the estimated Q network to a target Q network every training for a certain step number, namely: θ t ←θ e
(3) Dual-deep expectation Q-learning network (double deep expected Q network, DDEQN) algorithm
Based on DDQN, a DDEQN algorithm is provided, a Bayesian neural network is combined with deep reinforcement learning, a stochastic process of state transition is represented by the Bayesian neural network, and a Q expected value in a stochastic state is utilized to update the Q network.
First, an energy storage system scheduling policy is selected in an estimated Q network:
then, the Q value is updated in the target Q network, and the calculation formula is as follows:
Q(s,a;θ t )=r(s,a)+γ*E(Q(s',a;θ t ))
wherein E (Q (s', a; θ) t ) A) selects the desired target Q value for action a for the next state s'. In the period t, the Bayesian neural network predicts the photovoltaic output of the next period, the probability density function of the Bayesian neural network is ρ (s '), E (Q (s', a; θ) t ) Any of the above-mentioned)Expressed as:
simplifying the model and discretizing the probability density function. And sampling the prediction result of the Bayesian neural network, and dividing 2m intervals according to the obtained maximum and minimum values. The predicted value of the section is represented by the left value of the section. After multiple samples, the probability to each interval is estimated, and the expected Q is expressed as:
thus, the action and cost function may be rewritten as:
5. setting reasonable parameters to ensure convergence of a neural network learning process, and training a neural network based on a double-deep expected Q-learning network algorithm to obtain a power grid energy management model based on the double-deep expected Q-learning network algorithm.
Experience playback pools in the neural network training process, the exploration rate and the learning rate can influence the convergence performance of the neural network, so that parameters must be reasonably set to ensure the convergence of the neural network learning process.
(1) Experience playback pool: experience playback mainly avoids correlation of experience data, and training is performed by randomly sampling from a previous state transition set. In the training process, the model has more action sets, and a larger experience playback pool is required to be arranged so as to meet the diversity and comprehensiveness of random sampling action sets during small-batch training.
(2) Search rate: the fixed epsilon in the epsilon greedy approach may result in late misconvergence of the neural network training. The application sets epsilon to gradually reduce along with training times to explore the environment so as to achieve better convergence effect.
(3) Learning rate: too high a learning rate can lead to an overfitting phenomenon, otherwise the convergence rate can be slow or even stagnant. It is necessary to set an appropriate learning rate through a plurality of attempts. The parameters of the target Q network are obtained by estimating Q network replication, so the appropriate replication frequency should also be set to avoid overestimation.
And finally, controlling the operation of each photovoltaic output device based on the photovoltaic power generation output strategy obtained in the power grid energy management model of the double-depth expected Q-learning network algorithm.
Compared with the prior art, the application has the following advantages and beneficial effects:
the application provides a power grid energy management method and a system based on a double-depth expected Q-learning network algorithm, which are used for simulating a micro-grid economic scheduling problem into a Markov decision process, mapping an objective function and constraint conditions into a reinforced learning reward and punishment function, and realizing real-time optimal decision by utilizing the learning and environment interaction capabilities of the objective function and constraint conditions; uncertainty modeling of photovoltaic power generation output in a learning environment is conducted through a Bayesian neural network, state random transfer is properly considered in a Markov decision process, and the convergence rate of an algorithm is remarkably improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application.
FIG. 1 is a flow chart of the method of the present application;
FIG. 2 is a schematic diagram of a photovoltaic output prediction flow based on a Bayesian neural network;
FIG. 3 is a schematic diagram of a microgrid system composition;
FIG. 4 is a neural network training flow diagram of an algorithm;
FIG. 5 is a graph of park photovoltaic output;
FIG. 6 is a plot of campus load;
FIG. 7 is a graph of typical solar farm photovoltaic output versus load;
FIG. 8 is a graph comparing predicted results with actual values of photovoltaic output in different days;
FIG. 9 is a comparison of mode one and mode two convergence behavior;
fig. 10 is a state of charge diagram of the energy storage system in three modes.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present application, the present application will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present application and the descriptions thereof are for illustrating the present application only and are not to be construed as limiting the present application.
Example 1
As shown in fig. 1, a grid energy management method based on a dual-deep expectation Q-learning network algorithm includes the following steps:
s1, modeling uncertainty of photovoltaic output of a predicted point based on a Bayesian neural network and obtaining probability distribution of the photovoltaic output;
s2, inputting probability distribution of photovoltaic output into a power grid energy management model based on a double-depth expected Q-learning network algorithm to obtain a corresponding photovoltaic power generation output strategy;
and S3, operating all the photovoltaic output devices according to the photovoltaic power generation output strategy.
The further optimization scheme is that the power grid energy management model establishment process based on the double-depth expected Q-learning network algorithm is as follows:
the method comprises the following steps of T1, only considering an energy storage system as a controllable resource, taking the lowest daily operation cost as an objective function and meeting the operation constraint of a micro-grid, and establishing a power grid energy management model;
t2, modeling the power grid energy management model in the T1 as a Markov decision process;
thirdly, based on probability distribution of photovoltaic output, considering a random process of state transition, providing a double-depth expected Q-learning network algorithm by modifying an iteration rule of a Q value on the basis of a traditional model-free algorithm, and solving a Markov decision process;
and T4, setting reasonable parameters to ensure convergence of a neural network learning process, and training a neural network based on a double-depth expected Q-learning network algorithm to obtain a power grid energy management model based on the double-depth expected Q-learning network algorithm.
The further optimization scheme is that the specific modeling process of the predicted point photovoltaic output uncertainty based on the Bayesian neural network in the S1 is as follows:
s11, information of decisive factors, persistence influence factors and bursty influence factors of the predicted points is read, and data preprocessing is carried out;
s12, inputting the preprocessed predicted point decisive factor data and the persistence influence factor data into a deep full-connection layer of the Bayesian neural network, and inputting the preprocessed abrupt influence factor data into a probability layer of the Bayesian neural network for modeling;
s13, obtaining photovoltaic output probability distribution of the predicted point after multiple model training.
The further optimization scheme is that the objective function with the lowest daily operation cost in T1 is as follows: the daily operation cost is the sum of the electricity purchasing cost and the operation cost of the energy storage system in the dispatching period, and is expressed as:
wherein: t is the number of scheduling periods; x is x t The amount of electricity x needed to be exchanged with the main grid for period t t And > 0 represents purchasing electricity from the main power grid, and conversely selling electricity to the main power grid; c b,t /c g,t Representing prices for buying/selling electricity from/to the main grid during period t; τ t For the operation cost of the energy storage system in the period t, || + As a positive function.
The further optimization scheme is that the micro-grid operation constraint in the T1 comprises the following steps: power balance constraints, energy storage system operating constraints, and battery state constraints during a scheduling period.
The further optimization scheme is that the specific modeling process of the Markov decision process in T2 comprises the following steps:
constructing a state space by considering the diversity and the necessity of the system variables;
taking charge and discharge of the energy storage system and the action of buying and selling electric quantity to the power grid into consideration to ensure the power balance inside the system to construct an action space;
mapping the objective function into a rewarding decision function;
the discount rate takes a fixed value of 0.9 in calculation;
the state transition probability is expressed as the probability of the photovoltaic output of the next state.
The further optimization scheme is that the specific method of the step T3 is as follows:
introducing an experience playback mechanism on the basis of a reinforcement learning Q-learning algorithm, storing rewards and state updating conditions obtained by each interaction with the environment, and obtaining an approximate Q value after the parameters of the neural network are converged; selecting decoupling actions of the estimated Q network and the target Q network and calculating a target Q value;
a double-deep expected Q-learning network algorithm is provided on the basis of a double-deep Q-learning network, a Bayesian neural network and deep reinforcement learning are combined, a stochastic process of state transition is represented by the Bayesian neural network, and a Q expected value in a stochastic state is utilized to update the Q network.
The further optimization scheme is that the specific process of updating the Q network by using the Q expected value in the random state is as follows:
firstly, selecting an energy storage system scheduling strategy in an estimated Q network;
then, updating the Q value in the target Q network;
simplifying the model and discretizing the probability density function.
The further optimization scheme is that reasonable parameters are set in the T4 to ensure that an experience playback pool, an exploration rate and a learning rate are required to be considered when the neural network learning process converges.
Example 2
The embodiment provides a grid energy management system based on a double-deep expectation Q-learning network algorithm, which comprises:
the probability distribution acquisition device models uncertainty of photovoltaic output of the predicted point based on the Bayesian neural network and acquires probability distribution of the photovoltaic output;
the first modeling device only considers the energy storage system as a controllable resource, takes the lowest daily operation cost as an objective function and meets the operation constraint of the micro-grid, and establishes a power grid energy management model;
the second modeling device models the power grid energy management model into a Markov decision process;
the solving device considers the random process of state transition, proposes a double-depth expected Q-learning network algorithm by modifying the iteration rule of the Q value on the basis of the traditional model-free algorithm, and solves the Markov decision process;
the model training device sets reasonable parameters to ensure convergence of a neural network learning process, trains a neural network based on a double-depth expected Q-learning network algorithm to obtain a power grid energy management model based on the double-depth expected Q-learning network algorithm;
the power grid energy management system controls all photovoltaic output devices based on a photovoltaic power generation output strategy obtained by a power grid energy management model of a double-depth expected Q-learning network algorithm.
Example 3
Practical application of the present application will be explained taking as an example the data based on photovoltaic output and total load of a garden for 5 months to 12 months in a small industrial garden.
Assuming that the photovoltaic output and load power of the industrial park are shown in fig. 5 and fig. 6 and 7, other parameters are listed in table 1.
TABLE 1 energy storage system parameters
Through multiple attempts, the sample storage amount of an experience playback mechanism in the DDEQN algorithm is set to 4800, and the sampling scale of each small batch is set to 600; the initial exploration rate is 0.1, the final exploration rate is 0.001, and the exploration step number is 24000; learning rate is 0.001; the target Q network parameters are updated once every 10 trains.
And a Python language is used, a PyTorch package is called to write a Bayesian neural network photovoltaic power generation output prediction program, a DDEQN algorithm program is written based on a TensorFlow framework, and an optimization algorithm is an Adam algorithm capable of adaptively changing the learning rate, so that the method has a faster convergence speed and a better convergence effect. The hardware condition of the computer is Core i7-8550U and RAM 8GB. The training steps of the Bayesian neural network are 10000 times, the training time is 22h, the training steps of the DDEQN algorithm neural network are 70000 times, and the training time is 49h.
(1) Bayesian neural network training results
In the bayesian neural network, the full-connection-layer neuron used for feature extraction is 30, the next full-connection-layer neuron is 50, and the probability-layer neuron is 55. Two days, 7 months 10 days (sunny day) and 9 months 6 days (rainy day), were selected to verify the prediction results.
As can be taken from fig. 8, the bayesian neural network has a high prediction accuracy. On a sunny day, the Bayesian neural network prediction mean value is basically equal to the actual value, the 95% confidence interval is smaller, and the prediction accuracy is higher; in rainy days, due to the complexity and variability of surrounding environment factors, the error of the predicted value of the Bayesian neural network is larger at 6:00 points, but the predicted value still has higher precision at other moments. Although the prediction accuracy is reduced compared with a sunny day, the change of the prediction value completely accords with the change trend of the actual output, and the prediction error is within an acceptable range.
(2) Validity verification of DDEQN algorithm
The following three modes were designed for comparative analysis:
mode one: adopting a DDQN algorithm, taking uncertainty of photovoltaic output into consideration, and randomly extracting a predicted result of the Bayesian neural network as input to train the deep neural network;
mode two: adopting a DDEQN algorithm, namely the algorithm provided by the application;
mode three: a random optimization algorithm based on a scene method and considering uncertainty of photovoltaic output.
The scheduling period is one day, divided into 24 time periods. Photovoltaic output and load demand over a scheduling period are shown in appendix D. In the process of training the neural network, a two-step training method is adopted, action optimizing training is firstly carried out on a single period, and then overall training is carried out on a scheduling period, so that the convergence rate of an algorithm can be effectively improved.
After each training, the neural network is tested, Q values corresponding to the optimal actions in one scheduling period are accumulated, and normalization processing is carried out to represent the convergence degree of the neural network. Defining Θ as the convergence rate, the convergence rate of the neural network after training in step i can be expressed as:
q in * And the accumulated Q value after the convergence of the neural network.
In order to examine the convergence performance of the proposed method, Q should also be predetermined * . Through multiple tests, the first mode and the second mode can be converged after being trained in 70000 steps, so that Q is caused to be * The method is accumulation of Q values corresponding to optimal actions in a scheduling period when the step 70000 is trained.
The training results for mode one and mode two are shown in fig. 9:
when the convergence rate Θ reaches 0.995, the neural network is considered to converge. As can be taken from fig. 2, the pattern two converges at the step of training 35000, and the pattern one converges at the step of training 67000 or so. Therefore, the DDEQN provided by the application has better convergence performance.
(3) Comparison with random optimization algorithm
In order to simulate uncertainty of photovoltaic power generation output, 10000 scenes are sampled by a Bayesian neural network and used as scene sets of photovoltaic power output in a random optimization model. In contrast, the random optimization algorithm is consistent with the charge and discharge power of the energy storage system in the deep reinforcement learning algorithm. As can be seen from table 2, the deep reinforcement learning algorithm can better adapt to uncertainty of photovoltaic output, and the optimization result is more economical compared with the random optimization algorithm. In addition, compared with the traditional DDQN algorithm, the DDEQN algorithm provided by the application achieves lower running cost, which is mainly caused by better convergence performance of the DDEQN algorithm.
TABLE 2 comparison of the economics of the different modes
In order to further analyze and compare the economy of the deep reinforcement learning algorithm and the random optimization algorithm, the charge and discharge strategies of the energy storage system in one scheduling period in three modes are compared with the charge state. The comparison result is shown in FIG. 10.
In fig. 10, 0 to 3 represent four gear positions in which the energy storage system is discharged, 4 represents the energy storage system is inactive, and 5 to 8 represent four gear positions in which the energy storage system is charged. As can be seen from fig. 10, the DDQN algorithm and the energy storage system of the DDEQN algorithm are very similar in operation, because the DDEQN algorithm according to the present application models the state transition based on the DDQN algorithm to accelerate the convergence speed of the algorithm, and the two algorithms must converge to the same point.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims (5)

1. A deep expectation Q-learning based grid energy management method, comprising the steps of:
s1, modeling uncertainty of photovoltaic output of a predicted point based on a Bayesian neural network and obtaining probability distribution of the photovoltaic output;
the modeling of the uncertainty of the photovoltaic output of the predicted point based on the Bayesian neural network comprises the following specific processes:
s11, information of decisive factors, persistence influence factors and bursty influence factors of the predicted points is read, and data preprocessing is carried out;
s12, inputting the preprocessed predicted point decisive factor data and the persistence influence factor data into a deep full-connection layer of the Bayesian neural network, and inputting the preprocessed abrupt influence factor data into a probability layer of the Bayesian neural network for modeling;
s13, obtaining photovoltaic output probability distribution of the predicted point after multiple model training;
s2, inputting probability distribution of photovoltaic output into a power grid energy management model based on a double-depth expected Q-learning network algorithm to obtain a corresponding photovoltaic power generation output strategy;
s3, the system operates all the photovoltaic output devices according to a photovoltaic power generation output strategy;
the power grid energy management model establishment process based on the double-depth expected Q-learning network algorithm comprises the following steps:
the method comprises the following steps of T1, only considering an energy storage system as a controllable resource, taking the lowest daily operation cost as an objective function and meeting the operation constraint of a micro-grid, and establishing a power grid energy management model; the objective function with the lowest daily running cost in T1 is as follows: the daily operation cost is the sum of the electricity purchasing cost and the operation cost of the energy storage system in the dispatching period, and is expressed as:
wherein: t is the number of scheduling periods; x is x t The amount of electricity x needed to be exchanged with the main grid for period t t And > 0 represents purchasing electricity from the main power grid, and conversely selling electricity to the main power grid; c b,t Representing the price of buying electricity from a main power grid in the period t; c g,t Representing the price of selling electricity to the main power grid in the period t; τ t For the operation cost of the energy storage system in the period t, || + Taking a positive function;
the microgrid operational constraints include: power balance constraints, energy storage system operating constraints, and battery state constraints during a scheduling period
T2, modeling the power grid energy management model in the T1 as a Markov decision process;
the Markov decision process specific modeling process includes:
constructing a state space by considering the diversity and the necessity of the system variables;
taking charge and discharge of the energy storage system and the action of buying and selling electric quantity to the power grid into consideration to ensure the power balance inside the system to construct an action space;
mapping the objective function into a rewarding decision function;
the discount rate takes a fixed value of 0.9 in calculation;
the state transition probability is expressed as the probability of the photovoltaic output of the next state;
thirdly, based on probability distribution of photovoltaic output, considering a random process of state transition, providing a double-depth expected Q-learning network algorithm by modifying an iteration rule of a Q value on the basis of a traditional model-free algorithm, and solving a Markov decision process;
and T4, setting reasonable parameters to ensure convergence of a neural network learning process, and training a neural network based on a double-depth expected Q-learning network algorithm to obtain a power grid energy management model based on the double-depth expected Q-learning network algorithm.
2. The method for managing power grid energy based on deep expectation Q-learning according to claim 1, wherein the specific method of step T3 is as follows:
introducing an experience playback mechanism on the basis of a reinforcement learning Q-learning algorithm, storing rewards and state updating conditions obtained by each interaction with the environment, and obtaining an approximate Q value after the parameters of the neural network are converged; selecting decoupling actions of the estimated Q network and the target Q network and calculating a target Q value;
a double-deep expected Q-learning network algorithm is provided on the basis of a double-deep Q-learning network, a Bayesian neural network and deep reinforcement learning are combined, a stochastic process of state transition is represented by the Bayesian neural network, and a Q expected value in a stochastic state is utilized to update the Q network.
3. The deep expectation Q-learning based grid energy management method according to claim 2, wherein the specific process of updating the Q network with the Q expectation value in the random state is:
firstly, selecting an energy storage system scheduling strategy in an estimated Q network;
then, updating the Q value in the target Q network;
simplifying the model and discretizing the probability density function.
4. The power grid energy management method based on deep expectation Q-learning according to claim 1, wherein when reasonable parameters are set in T4 to ensure convergence of the neural network learning process, experience playback pools, exploration rates and learning rates need to be considered.
5. A deep desired Q-learning based grid energy management system for implementing the deep desired Q-learning based grid energy management method of any one of claims 1-4, comprising:
the probability distribution acquisition device models uncertainty of photovoltaic output of the predicted point based on the Bayesian neural network and acquires probability distribution of the photovoltaic output;
the first modeling device only considers the energy storage system as a controllable resource, takes the lowest daily operation cost as an objective function and meets the operation constraint of the micro-grid, and establishes a power grid energy management model;
the second modeling device models the power grid energy management model into a Markov decision process;
the solving device considers the random process of state transition, proposes a double-depth expected Q-learning network algorithm by modifying the iteration rule of the Q value on the basis of the traditional model-free algorithm, and solves the Markov decision process;
the model training device sets reasonable parameters to ensure convergence of a neural network learning process, trains a neural network based on a double-depth expected Q-learning network algorithm to obtain a power grid energy management model based on the double-depth expected Q-learning network algorithm;
the power grid energy management system controls all photovoltaic output devices based on a photovoltaic power generation output strategy obtained by a power grid energy management model of a double-depth expected Q-learning network algorithm.
CN202011418334.2A 2020-12-07 2020-12-07 Power grid energy management method and system based on deep expectation Q-learning Active CN112614009B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011418334.2A CN112614009B (en) 2020-12-07 2020-12-07 Power grid energy management method and system based on deep expectation Q-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011418334.2A CN112614009B (en) 2020-12-07 2020-12-07 Power grid energy management method and system based on deep expectation Q-learning

Publications (2)

Publication Number Publication Date
CN112614009A CN112614009A (en) 2021-04-06
CN112614009B true CN112614009B (en) 2023-08-25

Family

ID=75229451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011418334.2A Active CN112614009B (en) 2020-12-07 2020-12-07 Power grid energy management method and system based on deep expectation Q-learning

Country Status (1)

Country Link
CN (1) CN112614009B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113110052B (en) * 2021-04-15 2022-07-26 浙大宁波理工学院 Hybrid energy management method based on neural network and reinforcement learning
CN113139682B (en) * 2021-04-15 2023-10-10 北京工业大学 Micro-grid energy management method based on deep reinforcement learning
CN113098007B (en) * 2021-04-25 2022-04-08 山东大学 Distributed online micro-grid scheduling method and system based on layered reinforcement learning
CN113141017B (en) * 2021-04-29 2022-08-09 福州大学 Control method for energy storage system to participate in primary frequency modulation of power grid based on DDPG algorithm and SOC recovery
CN113572157B (en) * 2021-07-27 2023-08-29 东南大学 User real-time autonomous energy management optimization method based on near-end policy optimization
CN113885330B (en) * 2021-10-26 2022-06-17 哈尔滨工业大学 Information physical system safety control method based on deep reinforcement learning
CN114280491B (en) * 2021-12-23 2024-01-05 中山大学 Retired battery residual capacity estimation method based on active learning
CN114172840B (en) * 2022-01-17 2022-09-30 河海大学 Multi-microgrid system energy routing method based on graph theory and deep reinforcement learning
CN114938372B (en) * 2022-05-20 2023-04-18 天津大学 Federal learning-based micro-grid group request dynamic migration scheduling method and device
CN115334165B (en) * 2022-07-11 2023-10-17 西安交通大学 Underwater multi-unmanned platform scheduling method and system based on deep reinforcement learning
CN116388279B (en) * 2023-05-23 2024-01-23 安徽中超光电科技有限公司 Grid-connected control method and control system for solar photovoltaic power generation system
CN117132089B (en) * 2023-10-27 2024-03-08 邯郸欣和电力建设有限公司 Power utilization strategy optimization scheduling method and device
CN117216720B (en) * 2023-11-07 2024-02-23 天津市普迅电力信息技术有限公司 Multi-system data fusion method for distributed photovoltaic active power
CN117613983B (en) * 2024-01-23 2024-04-16 国网冀北电力有限公司 Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067190A (en) * 2017-05-18 2017-08-18 厦门大学 The micro-capacitance sensor power trade method learnt based on deeply
CN108932671A (en) * 2018-06-06 2018-12-04 上海电力学院 A kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune
CN109063841A (en) * 2018-08-27 2018-12-21 北京航空航天大学 A kind of failure mechanism intelligent analysis method based on Bayesian network and deep learning algorithm
CN109581282A (en) * 2018-11-06 2019-04-05 宁波大学 Indoor orientation method based on the semi-supervised deep learning of Bayes
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning
CN111461321A (en) * 2020-03-12 2020-07-28 南京理工大学 Improved deep reinforcement learning method and system based on Double DQN

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067190A (en) * 2017-05-18 2017-08-18 厦门大学 The micro-capacitance sensor power trade method learnt based on deeply
CN108932671A (en) * 2018-06-06 2018-12-04 上海电力学院 A kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune
CN109063841A (en) * 2018-08-27 2018-12-21 北京航空航天大学 A kind of failure mechanism intelligent analysis method based on Bayesian network and deep learning algorithm
CN109581282A (en) * 2018-11-06 2019-04-05 宁波大学 Indoor orientation method based on the semi-supervised deep learning of Bayes
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning
CN111461321A (en) * 2020-03-12 2020-07-28 南京理工大学 Improved deep reinforcement learning method and system based on Double DQN

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的微电网储能调度策略研究;王亚东等;《可再生能源》;20190831;第37卷(第8期);1220-1227页 *

Also Published As

Publication number Publication date
CN112614009A (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN112614009B (en) Power grid energy management method and system based on deep expectation Q-learning
Atef et al. Assessment of stacked unidirectional and bidirectional long short-term memory networks for electricity load forecasting
Tan et al. Multi-objective energy management of multiple microgrids under random electric vehicle charging
CN112186743B (en) Dynamic power system economic dispatching method based on deep reinforcement learning
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
Cai et al. Wind speed forecasting based on extreme gradient boosting
CN112491094B (en) Hybrid-driven micro-grid energy management method, system and device
CN112217195B (en) Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology
Huang et al. A control strategy based on deep reinforcement learning under the combined wind-solar storage system
CN116187601B (en) Comprehensive energy system operation optimization method based on load prediction
CN113887141A (en) Micro-grid group operation strategy evolution method based on federal learning
CN114156951B (en) Control optimization method and device of source network load storage system
Fan et al. Multi-objective LSTM ensemble model for household short-term load forecasting
CN115374692A (en) Double-layer optimization scheduling decision method for regional comprehensive energy system
CN114723230A (en) Micro-grid double-layer scheduling method and system for new energy power generation and energy storage
Bartels et al. Influence of hydrogen on grid investments for smart microgrids
Yu et al. Short-term cooling and heating loads forecasting of building district energy system based on data-driven models
CN111313449B (en) Cluster electric vehicle power optimization management method based on machine learning
Fu et al. Predictive control of power demand peak regulation based on deep reinforcement learning
CN115115145B (en) Demand response scheduling method and system for distributed photovoltaic intelligent residence
CN115511218A (en) Intermittent type electrical appliance load prediction method based on multi-task learning and deep learning
CN115169839A (en) Heating load scheduling method based on data-physics-knowledge combined drive
CN115705608A (en) Virtual power plant load sensing method and device
Zandi et al. An automatic learning framework for smart residential communities
CN113326994A (en) Virtual power plant energy collaborative optimization method considering source load storage interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant