CN110598925A - Energy storage in-trading market decision optimization method based on double-Q learning algorithm - Google Patents

Energy storage in-trading market decision optimization method based on double-Q learning algorithm Download PDF

Info

Publication number
CN110598925A
CN110598925A CN201910832395.4A CN201910832395A CN110598925A CN 110598925 A CN110598925 A CN 110598925A CN 201910832395 A CN201910832395 A CN 201910832395A CN 110598925 A CN110598925 A CN 110598925A
Authority
CN
China
Prior art keywords
energy storage
decision
double
learning algorithm
market
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910832395.4A
Other languages
Chinese (zh)
Inventor
余运俊
蔡振奋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang University
Original Assignee
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang University filed Critical Nanchang University
Priority to CN201910832395.4A priority Critical patent/CN110598925A/en
Publication of CN110598925A publication Critical patent/CN110598925A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0226Incentive systems for frequent usage, e.g. frequent flyer miles programs or point systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A decision optimization method for energy storage in trading market based on a double-Q learning algorithm comprises the following steps: establishing a mathematical model for energy storage in a trading market decision; describing the energy storage operation as a Markov decision process; performing iterative training on the two data sets by adopting real historical market trading price data and applying a Double-Q learning algorithm to obtain a trained Q table; the stored energy executes the action of decision-making goal maximization in the trained Q table, and the accumulated reward under the joint arbitrage is obtained. The Double-Q learning algorithm adopts two functions to update the Q table in an iterative manner, so that the influence of the Q-learning algorithm on overestimation is reduced, the designed arbitrage strategy is more stable, and the energy storage long-term arbitrage benefit is higher; the arbitrage source is not limited to the electric power market, and the carbon market is added, so that the arbitrage income is remarkably increased.

Description

Energy storage in-trading market decision optimization method based on double-Q learning algorithm
Technical Field
The invention belongs to the technical field of engineering.
Background
With the increasing penetration of renewable resources, it is important to achieve this balance efficiently, considering the high uncertainty of wind and solar energy. The energy storage system can continuously absorb energy and timely release the energy so as to meet a large amount of requirements of users on electric quantity, relieve overload of electric power on a power grid, optimize configuration of the power grid system, maintain complete stable operation of the power grid, meet requirements of different users on the electric power, supplement changeable renewable energy sources, and increasingly attach importance to economic feasibility. One of the most commonly discussed sources of revenue for energy storage is real-time price arbitrage, i.e., energy storage utilizes the price difference of real-time electricity market prices, charging at low electricity prices, and discharging at high electricity prices to realize profitability.
Due to the fact that price fluctuation of a real-time power market is large due to the increasing popularization of intermittent renewable power generation, the decision of energy storage in a trading market is greatly concerned by the research community. However, even if the price spread rises, it is still not a simple matter to design a good strategy to obtain significant profits. The first conceivable method is to predict price, but prediction accuracy is difficult to guarantee. Later, approximate dynamic programming was used to derive bidding strategies for stored energy without prior knowledge of price distribution. But such strategies tend to be computationally expensive due to the high dimensionality of the state space. Reinforcement learning is an online learning technology different from supervised learning and unsupervised learning, and a Q-learning algorithm in reinforcement learning provides an energy storage decision strategy for people in a data-driven framework.
The existing energy storage decision method based on reinforcement learning has the following defects: only the electricity price information is decided, and the source of the decision information is single; the Q-learning algorithm has the overestimation problem and the performance of the algorithm is unstable.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an energy storage in trading market decision optimization method based on a double-Q learning algorithm.
The invention is realized by the following technical scheme.
The invention relates to an energy storage in-trading market decision optimization method based on a double-Q learning algorithm, which comprises the following steps:
step 1: establishing a mathematical model for energy storage in a trading market decision;
step 2: describing the energy storage operation as a Markov decision process;
and step 3: performing iterative training on the two data sets by adopting real historical market trading price data and applying a Double-Q learning algorithm to obtain a trained Q table;
and 4, step 4: the stored energy executes the action of decision-making goal maximization in the trained Q table, and the accumulated reward under the joint arbitrage is obtained.
Further, the step 1 comprises the following steps:
step 1-1: determining an objective function of an energy storage decision;
step 1-2: determining a stored electricity constraint of the energy storage system;
step 1-3: and determining charge and discharge power constraints of the energy storage system.
Further, the step 2 comprises the following steps:
step 2-1: setting the action of storing energy as a function of price;
step 2-2: determining an energy storage state space;
step 2-3: determining an energy storage action space;
step 2-4: an action reward function is determined.
Further, the step 3 includes the following steps:
step 3-1: determining the state of the energy storage system;
step 3-2: selecting an energy storage action according to an element-greedy strategy;
step 3-3: one of the two functions is randomly selected to update the Q value table, and the Q value table is obtained after 3000 iterations.
Compared with the prior art, the invention has the beneficial effects that: (1) the Double-Q learning algorithm adopts two functions to update the Q table in an iterative mode, the influence of the Q-learning algorithm on overestimation is reduced, the designed arbitrage strategy is more stable, and the energy storage long-term decision-making benefit is higher. (2) Price data is not limited to the electricity market only, but is added to the carbon market, so that the cumulative prize is significantly increased.
Drawings
FIG. 1 is a block diagram of a method for energy storage in trading market decision-making.
Figure 2 is a block diagram of a markov decision process.
FIG. 3 is a decision flow diagram of the Double-Q learning algorithm.
Detailed Description
The following description will be made with reference to the drawings.
The invention provides a double-Q learning algorithm-based energy storage in trading market decision optimization method, which makes a decision in electric power and carbon market trading in a certain place by using a double-Q learning algorithm to realize the maximization of accumulated reward, wherein a flow chart of the method is shown as an attached figure 1, and the method specifically comprises the following steps:
step 1: establishing a mathematical model for making decisions of energy storage in an electric power market and a carbon market;
step 2: describing the energy storage operation as a Markov decision process;
and step 3: performing iterative training on the two data sets by adopting real historical carbon price and electricity price data of a certain market, and applying a Double-Q learning algorithm to obtain a trained Q table;
and 4, step 4: the stored energy executes the action of maximizing the decision target in the trained Q table, and the accumulated reward under the decision of the method is obtained.
Further, the step 1 comprises the following steps:
step 1-1: determining an objective function of the energy storage combined arbitrage as follows:
step 1-2: determining a stored charge constraint of the energy storage system as:
step 1-3: determining charge and discharge power constraints of the energy storage system:
further, to describe the energy storage operation as a markov decision process:
step 2-1: the action of storing energy is set as a function of price:
step 2-2: determining an energy storage state space function;
S=(P,Q)*E (5)
step 2-3: determining an energy storage action space function;
step 2-4: an action reward function is determined.
Further, the step 3, according to the algorithm flowchart of fig. 3, includes the following steps:
step 3-1: acquiring historical price data of electric power and carbon market trading in a certain place, determining the state S of an energy storage system according to a state space function (5), and adding another price information into a decision state space to make a decision in two price distributions, wherein the difference from the prior art is that the decision is made;
step 3-2: the reward values for all actions of each state are calculated according to the reward function (7) for determining the selection of subsequent actions, as shown in table 1:
TABLE 1
Reward value table Action A1 Action A2
State S1 R(S1,A1) R(S1,A2)
State S2 R(S2,A1) R(s2,A2)
…. …. ….
State Sn R(Sn,A1) R(Sn,A2)
Step 3-3: according to the strategy of E-greedy, the algorithm has the probability of E0, 1 to randomly select the action in the action function (6). The action with the maximum reward value in the reward value table is selected with the probability of (1-epsilon), so that the algorithm iteration local optimization is avoided;
step 3-4: Q-Learning is a model-free reinforcement Learning technique that finds an optimal action selection strategy in the MDP problem. It learns through an action-cost function and can ultimately give the desired action based on the current state and the optimal strategy. One advantage of this is that it does not require knowledge of the model of a certain environment to compare expected values for actions. The max operation in standard Q-learning uses the same value to select and weigh an action. This is in fact more likely to select an estimate that is too high, resulting in a too optimistic estimate of the value. To avoid this, we can decouple the selection and the measurement, and each state update randomly selects one of the functions (8) and (9) to update the Q table value, so as to avoid overestimating the action value, and iterate 3000 times to obtain the Q table, and as shown in table 2, the action with the largest Q value in the Q table is selected to obtain the accumulated reward.
TABLE 2
Q value table Action A1 Action A2
State S1 Q(S1,A1) Q(S1,A2)
State S2 Q(S2,A1) Q(s2,A2)
…. …. ….
State Sn Q(Sn,A1) Q(Sn,A2)
Figure 2 is a block diagram of a markov decision process, the objective of which is to find an optimal strategy, i.e. a sequence of actions that maximizes the merit function. For state S at each time, the agent will choose the appropriate action through the optimal strategy. To achieve the maximization of the cumulative reward for the decision objective, we describe the energy storage operation as a markov decision process. Determining the elements of state, action, strategy and reward. And defining the charging and discharging decision as a function related to price information, and designing a double-Q learning strategy to optimally control the real-time decision of the energy storage system in the power market and the carbon market.
Fig. 3 is a flow chart of a double Q learning algorithm, which is more stable in performance than the Q learning algorithm and reduces overestimation after price data is input into a state space for training. When the method is applied to energy storage combined arbitrage, the method is initialized firstly, then the current state is determined, the energy storage action is selected, an epsilon-greedy action selection strategy is adopted, the algorithm has the probability of epsilon 0 and epsilon 1 to randomly select the action, the probability of 1-epsilon selects the optimal action, and the key point is two updating functions adopted by double-Q learning, wherein one updating function is used for determining the value generated by the action, and the other updating function is used for updating a Q value table.

Claims (4)

1. A decision optimization method of energy storage in trading market based on double Q learning algorithm is characterized by comprising the following steps:
step 1: establishing a mathematical model for energy storage in a trading market decision;
step 2: describing the energy storage operation as a Markov decision process;
and step 3: performing iterative training on the two data sets by adopting real historical market trading price data and applying a Double-Qlearning algorithm to obtain a trained Q table;
and 4, step 4: the stored energy executes the action of decision-making goal maximization in the trained Q table, and the accumulated reward under the joint arbitrage is obtained.
2. The method for energy storage and market trading decision optimization based on the double-Q learning algorithm as claimed in claim 1, wherein the step 1 comprises the following steps:
step 1-1: determining an objective function of an energy storage decision;
step 1-2: determining a stored electricity constraint of the energy storage system;
step 1-3: and determining charge and discharge power constraints of the energy storage system.
3. The method for energy storage and market trading decision optimization based on the double-Q learning algorithm as claimed in claim 1, wherein the step 2 comprises the following steps:
step 2-1: setting the action of storing energy as a function of price;
step 2-2: determining an energy storage state space;
step 2-3: determining an energy storage action space;
step 2-4: an action reward function is determined.
4. The method for energy storage and market trading decision optimization based on the double-Q learning algorithm as claimed in claim 1, wherein the step 3 comprises the following steps:
step 3-1: determining the state of the energy storage system;
step 3-2: selecting an energy storage action according to an element-greedy strategy;
step 3-3: one of the two functions is randomly selected to update the Q value table, and the Q value table is obtained after 3000 iterations.
CN201910832395.4A 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm Pending CN110598925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910832395.4A CN110598925A (en) 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910832395.4A CN110598925A (en) 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm

Publications (1)

Publication Number Publication Date
CN110598925A true CN110598925A (en) 2019-12-20

Family

ID=68857593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910832395.4A Pending CN110598925A (en) 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm

Country Status (1)

Country Link
CN (1) CN110598925A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529610A (en) * 2020-11-23 2021-03-19 天津大学 End-to-end electric energy trading market user decision method based on reinforcement learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529610A (en) * 2020-11-23 2021-03-19 天津大学 End-to-end electric energy trading market user decision method based on reinforcement learning

Similar Documents

Publication Publication Date Title
US11581740B2 (en) Method, system and storage medium for load dispatch optimization for residential microgrid
Qi et al. Optimal configuration of concentrating solar power in multienergy power systems with an improved variational autoencoder
Bakhtiari et al. Predicting the stochastic behavior of uncertainty sources in planning a stand-alone renewable energy-based microgrid using Metropolis–coupled Markov chain Monte Carlo simulation
CN113991742B (en) Distributed photovoltaic double-layer collaborative optimization investment decision-making method for power distribution network
CN114511132A (en) Photovoltaic output short-term prediction method and prediction system
CN112994092B (en) Independent wind-solar storage micro-grid system size planning method based on power prediction
CN117077960A (en) Day-ahead scheduling optimization method for regional comprehensive energy system
Lin et al. Long-term multi-objective optimal scheduling for large cascaded hydro-wind-photovoltaic complementary systems considering short-term peak-shaving demands
Gao et al. A hybrid improved whale optimization algorithm with support vector machine for short-term photovoltaic power prediction
CN115600757A (en) Coordination optimization method and system for offshore wind power sharing energy storage participation spot market trading
CN116581792A (en) Wind-solar energy storage system capacity planning method based on data model driving
Bødal et al. Capacity expansion planning with stochastic rolling horizon dispatch
Luo et al. A cascaded deep learning framework for photovoltaic power forecasting with multi-fidelity inputs
CN108694475B (en) Short-time-scale photovoltaic cell power generation capacity prediction method based on hybrid model
Liu et al. A novel electricity load forecasting based on probabilistic least absolute shrinkage and selection operator-Quantile regression neural network
CN118432039A (en) New energy in-situ digestion capability assessment method considering distributed shared energy storage
CN110598925A (en) Energy storage in-trading market decision optimization method based on double-Q learning algorithm
Huang et al. Power prediction method of distributed photovoltaic digital twin system based on GA-BP
Zhan et al. Comparing model predictive control and reinforcement learning for the optimal operation of building-PV-battery systems
CN116128211A (en) Wind-light-water combined short-term optimization scheduling method based on wind-light uncertainty prediction scene
CN115456286A (en) Short-term photovoltaic power prediction method
Wang et al. Gradient boosting dendritic network for ultra-short-term PV power prediction
Dong et al. Design and optimal scheduling of forecasting-based campus multi-energy complementary energy system
Abdulla et al. Accounting for forecast uncertainty in the optimized operation of energy storage
CN112510678A (en) Time sequence simulation-based power selling company distributed power capacity configuration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220