CN110598925A - Energy storage in-trading market decision optimization method based on double-Q learning algorithm - Google Patents

Energy storage in-trading market decision optimization method based on double-Q learning algorithm Download PDF

Info

Publication number
CN110598925A
CN110598925A CN201910832395.4A CN201910832395A CN110598925A CN 110598925 A CN110598925 A CN 110598925A CN 201910832395 A CN201910832395 A CN 201910832395A CN 110598925 A CN110598925 A CN 110598925A
Authority
CN
China
Prior art keywords
energy storage
decision
double
learning algorithm
market
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910832395.4A
Other languages
Chinese (zh)
Inventor
余运俊
蔡振奋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang University
Original Assignee
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang University filed Critical Nanchang University
Priority to CN201910832395.4A priority Critical patent/CN110598925A/en
Publication of CN110598925A publication Critical patent/CN110598925A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0226Incentive systems for frequent usage, e.g. frequent flyer miles programs or point systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于双Q学习算法的储能在交易市场决策优化方法,包括如下步骤:建立储能在交易市场决策的数学模型;将储能操作描述为一个马尔可夫决策过程;采用真实历史市场交易价格数据,运用Double‑Q learning算法,对两个数据集进行迭代训练,得到训练后的Q表;储能在训练好的Q表中执行决策目标最大化的动作,得到在联合套利下的累计奖励。本发明Double‑Q learning算法采用两个函数对Q表进行迭代更新,减少了Q‑learning算法高估问题的影响,设计的套利策略更加稳定,使得储能长期套利收益更高;套利来源不只限于电力市场,加入了碳市场,使得套利收入显著增加。

A method for optimizing decision-making of energy storage in a trading market based on a double-Q learning algorithm, comprising the following steps: establishing a mathematical model for decision-making of energy storage in a trading market; describing energy storage operations as a Markov decision-making process; using real historical market transactions For the price data, use the Double-Q learning algorithm to iteratively train the two data sets to obtain the trained Q table; the energy storage executes the action of maximizing the decision-making goal in the trained Q table, and obtains the cumulative value under the joint arbitrage award. The Double-Q learning algorithm of the present invention uses two functions to iteratively update the Q table, which reduces the impact of the Q-learning algorithm overestimation problem, and the designed arbitrage strategy is more stable, making the long-term arbitrage income of energy storage higher; the source of arbitrage is not limited to The electricity market, joining the carbon market, has significantly increased arbitrage income.

Description

一种基于双Q学习算法的储能在交易市场决策优化方法A decision-making optimization method for energy storage in the trading market based on double Q learning algorithm

技术领域technical field

本发明属于工程技术领域。The invention belongs to the technical field of engineering.

背景技术Background technique

随着可再生资源的日益深入,考虑到风能和太阳能的高度不确定性,有效地实现这一平衡是很重要的。储能系统够不断吸收能量并适时释放能量,以满足用户对电量的大量需求,能够缓解电力对于电网的超负荷,还能优化电网系统的配置,维持电网完全稳定运行,满足不同用户对于电力的需求,是对多变的可再生能源的一种补充,其经济可行性日益受到重视。储能最常被讨论的收入来源之一是实时价格套利,即储能利用实时电力市场价格的价差,在低电价时充电,高电价时放电实现盈利。With the increasing penetration of renewable resources, it is important to efficiently achieve this balance given the high uncertainty of wind and solar energy. The energy storage system can continuously absorb energy and release energy in a timely manner to meet the large demand of users for electricity, relieve the overload of electricity on the power grid, optimize the configuration of the power grid system, maintain the complete and stable operation of the power grid, and meet the needs of different users for power. demand, is a complement to variable renewable energy, and its economic viability is gaining increasing attention. One of the most frequently discussed sources of income for energy storage is real-time price arbitrage, that is, energy storage uses the price difference in real-time electricity market prices to charge when electricity prices are low and discharge when electricity prices are high to achieve profitability.

由于间歇性可再生发电的日益普及导致实时电力市场价格波动较大,储能在交易市场中的决策受到了研究界的极大关注。然而,即使价格差价上升,设计良好的策略以获得显著利润仍然不是一件易事。最先被想到的方法是对价格进行预测,但预测准确度难以保障。后来人们曾采用近似动态规划来推导储能的竞价策略,无需事先了解价格分布。但是这种策略往往由于状态空间的高维性导致计算代价昂贵。强化学习是一种不同于监督学习和无监督学习的在线学习技术,强化学习中Q-learning算法为人们提供了一种在数据驱动框架下储能决策策略。Due to the high volatility of real-time electricity market prices caused by the increasing popularity of intermittent renewable generation, the decision-making of energy storage in trading markets has received great attention from the research community. However, even with rising price spreads, designing a good strategy to make significant profits is still not an easy task. The first method that comes to mind is to predict the price, but the accuracy of the prediction is difficult to guarantee. Later, approximate dynamic programming was used to derive bidding strategies for energy storage without prior knowledge of the price distribution. However, this strategy is often computationally expensive due to the high dimensionality of the state space. Reinforcement learning is an online learning technology that is different from supervised learning and unsupervised learning. The Q-learning algorithm in reinforcement learning provides people with an energy storage decision-making strategy under a data-driven framework.

现有基于强化学习的储能决策方法存在以下缺陷:只对电价信息进行决策,决策信息来源单一;Q-learning算法存在高估问题,算法性能不稳定。The existing energy storage decision-making methods based on reinforcement learning have the following defects: only electricity price information is used for decision-making, and the source of decision-making information is single; the Q-learning algorithm has an overestimation problem, and the performance of the algorithm is unstable.

发明内容Contents of the invention

本发明的目的是克服现有技术的不足,提出一种基于双Q学习算法的储能在交易市场决策优化方法。The purpose of the present invention is to overcome the deficiencies of the prior art, and propose a decision-making optimization method for energy storage in the trading market based on a double-Q learning algorithm.

本发明是通过以下技术方案实现的。The present invention is achieved through the following technical solutions.

本发明所述的一种基于双Q学习算法的储能在交易市场决策优化方法,包括如下步骤:A method for optimizing decision-making of energy storage in a trading market based on a double-Q learning algorithm according to the present invention comprises the following steps:

步骤1:建立储能在交易市场决策的数学模型;Step 1: Establish a mathematical model for decision-making of energy storage in the trading market;

步骤2:将储能操作描述为一个马尔可夫决策过程;Step 2: Describe the energy storage operation as a Markov decision process;

步骤3:采用真实历史市场交易价格数据,运用Double-Q learning算法,对两个数据集进行迭代训练,得到训练后的Q表;Step 3: Use the real historical market transaction price data and use the Double-Q learning algorithm to iteratively train the two data sets to obtain the trained Q table;

步骤4:储能在训练好的Q表中执行决策目标最大化的动作,得到在联合套利下的累计奖励。Step 4: Energy storage executes the action of maximizing the decision-making goal in the trained Q table, and obtains the cumulative reward under joint arbitrage.

进一步地,所述步骤1,包括以下步骤:Further, said step 1 includes the following steps:

步骤1-1:确定储能决策的目标函数;Step 1-1: Determine the objective function of energy storage decision-making;

步骤1-2:确定储能系统的的存储电量约束;Step 1-2: Determine the energy storage constraints of the energy storage system;

步骤1-3:确定储能系统的充放电功率约束。Step 1-3: Determine the charging and discharging power constraints of the energy storage system.

进一步地,所述的步骤2,包括以下步骤:Further, said step 2 includes the following steps:

步骤2-1:将储能的动作设为关于价格的函数;Step 2-1: Set the energy storage action as a function of price;

步骤2-2:确定储能状态空间;Step 2-2: Determine the energy storage state space;

步骤2-3:确定储能动作空间;Step 2-3: Determine the energy storage action space;

步骤2-4:确定动作奖赏函数。Steps 2-4: Determine the action reward function.

进一步地,所述步骤3,包括以下步骤:Further, said step 3 includes the following steps:

步骤3-1:确定储能系统状态;Step 3-1: Determine the state of the energy storage system;

步骤3-2:根据∈-greedy策略选择储能动作;Step 3-2: Select an energy storage action according to the ∈-greedy strategy;

步骤3-3:随机选择两个函数之一更新Q表值,迭代3000次,得到Q值表。Step 3-3: Randomly select one of the two functions to update the Q table value, iterate 3000 times, and obtain the Q value table.

与现有技术相比,本发明的有益效果是:(1)Double-Q learning算法采用两个函数对Q表进行迭代更新,减少了Q-learning算法高估问题的影响,设计的套利策略更加稳定,使得储能长期决策收益更高。(2)价格数据不只限于电力市场,加入了碳市场,使得累计奖励显著增加。Compared with the prior art, the beneficial effects of the present invention are: (1) the Double-Q learning algorithm uses two functions to iteratively update the Q table, which reduces the impact of the Q-learning algorithm overestimation problem, and the designed arbitrage strategy is more Stability makes the long-term decision-making benefits of energy storage higher. (2) The price data is not limited to the electricity market, but has joined the carbon market, which has significantly increased the cumulative reward.

附图说明Description of drawings

附图1为储能在交易市场决策方法流程框图。Attached Figure 1 is a flow chart of the decision-making method for energy storage in the trading market.

附图2为马尔科夫决策过程框图。Accompanying drawing 2 is the block diagram of Markov decision process.

附图3为Double-Q learning算法决策流程图。Accompanying drawing 3 is the decision-making flowchart of Double-Q learning algorithm.

具体实施方式Detailed ways

下面结合附图工作原理对具体实施方式进行说明。The specific implementation will be described below in conjunction with the working principle of the accompanying drawings.

本发明提出的基于双Q学习算法的储能在交易市场决策优化方法,利用双Q学习算法在某地电力和碳市场交易中作出决策,实现累计奖励最大化,方法流程图如附图1所示,具体包括以下步骤:The double-Q learning algorithm-based energy storage decision optimization method in the trading market proposed by the present invention uses the double-Q learning algorithm to make decisions in the electricity and carbon market transactions in a certain place to maximize the cumulative reward. The method flow chart is shown in Figure 1 Specifically, the following steps are included:

步骤1:建立储能在电力市场和碳市场决策的数学模型;Step 1: Establish a mathematical model for decision-making of energy storage in the electricity market and carbon market;

步骤2:将储能操作描述为一个马尔可夫决策过程;Step 2: Describe the energy storage operation as a Markov decision process;

步骤3:采用某地交易市场真实的历史碳价和电价数据,运用Double-Q learning算法,对两个数据集进行迭代训练,得到训练后的Q表;Step 3: Use the real historical carbon price and electricity price data of a trading market in a certain place, and use the Double-Q learning algorithm to iteratively train the two data sets to obtain the trained Q table;

步骤4:储能在训练好的Q表中执行决策目标最大化的动作,得到在此方法决策下的累计奖励。Step 4: Energy storage executes the action of maximizing the decision-making goal in the trained Q table, and obtains the cumulative reward under the decision-making of this method.

进一步地,所述步骤1,包括以下步骤:Further, said step 1 includes the following steps:

步骤1-1:确定储能联合套利的目标函数为:Step 1-1: Determine the objective function of energy storage joint arbitrage as:

步骤1-2:确定储能系统的的存储电量约束为:Step 1-2: Determine the energy storage constraints of the energy storage system as:

步骤1-3:确定储能系统的充放电功率约束:Step 1-3: Determine the charging and discharging power constraints of the energy storage system:

进一步地,为了将储能操作描述为马尔科夫决策过程:Further, to describe the energy storage operation as a Markov decision process:

步骤2-1:将储能的动作设为关于价格的函数:Step 2-1: Set the energy storage action as a function of price:

步骤2-2:确定储能状态空间函数;Step 2-2: Determine the energy storage state space function;

S=(P,Q)*E (5)S=(P,Q)*E (5)

步骤2-3:确定储能动作空间函数;Step 2-3: Determine the energy storage action space function;

步骤2-4:确定动作奖赏函数。Steps 2-4: Determine the action reward function.

进一步地,所述步骤3,根据图3的算法流程图包括以下步骤:Further, said step 3, according to the algorithm flow chart of Fig. 3 includes the following steps:

步骤3-1:获取某地电力和碳市场交易历史价格数据,根据状态空间函数(5)确定储能系统状态S,与以往不同的是,我们在决策状态空间里加入另外一种价格信息,在两种价格分布中进行决策;Step 3-1: Obtain the historical price data of electricity and carbon market transactions in a certain place, and determine the state S of the energy storage system according to the state space function (5). What is different from the past is that we add another kind of price information in the decision state space, Make a decision between two price distributions;

步骤3-2:根据奖励函数(7)计算出每种状态的所有动作的奖励值,用于确定后续动作的选择,如表1所示:Step 3-2: Calculate the reward value of all actions in each state according to the reward function (7), which is used to determine the choice of subsequent actions, as shown in Table 1:

表1Table 1

奖励值表Award Value Table 动作A1Action A1 动作A2Action A2 状态S1State S1 R(S1,A1)R(S1,A1) R(S1,A2)R(S1,A2) 状态S2State S2 R(S2,A1)R(S2,A1) R(s2,A2)R(s2,A2) …..... …..... …..... 状态SnState Sn R(Sn,A1)R(Sn,A1) R(Sn,A2)R(Sn,A2)

步骤3-3:根据∈-greedy策略,算法会有∈∈[0,1]的概率随机选择动作函数(6)中的动作。有(1-∈)的概率选择奖励值表中奖励值最大的动作,这将避免算法迭代局部最优;Step 3-3: According to the ∈-greedy strategy, the algorithm will randomly select the action in the action function (6) with the probability ∈∈[0,1]. There is a probability of (1-∈) to choose the action with the largest reward value in the reward value table, which will avoid the algorithm iteration local optimum;

步骤3-4:Q-Learning是一项无模型的增强学习技术,它可以在MDP问题中寻找一个最优的动作选择策略。它通过一个动作-价值函数来进行学习,并且最终能够根据当前状态及最优策略给出期望的动作。它的一个优点就是它不需要知道某个环境的模型也可以对动作进行期望值比较。在标准的Q-学习中的max操作使用同样的值来进行选择和衡量一个行动。这实际上更可能选择过高的估计值,从而导致过于乐观的值估计。为了避免这种情况的出现,我们可以对选择和衡量进行解耦,每一次状态的更新都随机选择(8)和(9)函数之一更新Q表值,避免动作值被高估,迭代3000次,得到Q值表,如表2所示,选择Q值表中最大Q值的动作,得到累计奖励。Step 3-4: Q-Learning is a model-free reinforcement learning technique that can find an optimal action selection strategy in MDP problems. It learns through an action-value function, and finally can give the desired action according to the current state and the optimal policy. One of its advantages is that it does not require knowledge of a model of the environment to perform expected value comparisons of actions. The max operation in standard Q-learning uses the same value to select and measure an action. This is actually more likely to pick an estimate that is too high, leading to an overly optimistic value estimate. In order to avoid this situation, we can decouple the selection and measurement. Every time the state is updated, one of the functions (8) and (9) is randomly selected to update the Q table value, so as to avoid the overestimation of the action value. Iteration 3000 times, get the Q value table, as shown in Table 2, choose the action with the largest Q value in the Q value table, and get the cumulative reward.

表2Table 2

Q值表Q value table 动作A1Action A1 动作A2Action A2 状态S1State S1 Q(S1,A1)Q(S1,A1) Q(S1,A2)Q(S1,A2) 状态S2State S2 Q(S2,A1)Q(S2,A1) Q(s2,A2)Q(s2,A2) …..... …..... …..... 状态SnState Sn Q(Sn,A1)Q(Sn,A1) Q(Sn,A2)Q(Sn,A2)

图2是马尔可夫决策过程框图,马尔可夫决策问题的目的是寻求一个最优策略,即使评价函数最大化的一系列动作。对于每一时刻的状态S,智能体均会通过最优策略选取适当的动作。为了实现决策目标的累计奖励最大化,我们将储能操作描述为马尔可夫决策过程。确定状态,动作,策略和奖励等要素。将充放电决策定义为关于价格信息的函数,设计双Q学习策略来最优地控制储能系统在电力市场与碳市场中的实时决策。Figure 2 is a block diagram of the Markov decision process. The purpose of the Markov decision problem is to find an optimal strategy, that is, a series of actions that maximize the evaluation function. For the state S at each moment, the agent will select the appropriate action through the optimal strategy. To maximize the cumulative reward of the decision objective, we describe the energy storage operation as a Markov decision process. Identify elements such as states, actions, strategies, and rewards. The charging and discharging decision is defined as a function of price information, and a double Q learning strategy is designed to optimally control the real-time decision of the energy storage system in the electricity market and carbon market.

图3是双Q学习算法流程图,当价格数据输入到状态空间进行训练后,双Q学习算法比Q学习算法的性能更加稳定,减少了高估。当应用于储能联合套利时,首先初始化,然后确定当前状态,选择储能动作,采用ε-greedy动作选择策略,算法会有ε∈[0,1]的概率随机选择动作,1-ε的概率选择最优动作,关键在于双Q学习采用的两个更新函数,一个用于确定动作产生的值,另一个用于更新Q值表。Figure 3 is a flow chart of the double Q learning algorithm. When the price data is input into the state space for training, the performance of the double Q learning algorithm is more stable than that of the Q learning algorithm, reducing overestimation. When applied to energy storage joint arbitrage, first initialize, then determine the current state, select the energy storage action, and adopt the ε-greedy action selection strategy. The algorithm will randomly select the action with the probability of ε∈[0,1], and the 1-ε Probabilistically select the optimal action, the key lies in the two update functions used in double Q learning, one is used to determine the value generated by the action, and the other is used to update the Q value table.

Claims (4)

1. A decision optimization method of energy storage in trading market based on double Q learning algorithm is characterized by comprising the following steps:
step 1: establishing a mathematical model for energy storage in a trading market decision;
step 2: describing the energy storage operation as a Markov decision process;
and step 3: performing iterative training on the two data sets by adopting real historical market trading price data and applying a Double-Qlearning algorithm to obtain a trained Q table;
and 4, step 4: the stored energy executes the action of decision-making goal maximization in the trained Q table, and the accumulated reward under the joint arbitrage is obtained.
2. The method for energy storage and market trading decision optimization based on the double-Q learning algorithm as claimed in claim 1, wherein the step 1 comprises the following steps:
step 1-1: determining an objective function of an energy storage decision;
step 1-2: determining a stored electricity constraint of the energy storage system;
step 1-3: and determining charge and discharge power constraints of the energy storage system.
3. The method for energy storage and market trading decision optimization based on the double-Q learning algorithm as claimed in claim 1, wherein the step 2 comprises the following steps:
step 2-1: setting the action of storing energy as a function of price;
step 2-2: determining an energy storage state space;
step 2-3: determining an energy storage action space;
step 2-4: an action reward function is determined.
4. The method for energy storage and market trading decision optimization based on the double-Q learning algorithm as claimed in claim 1, wherein the step 3 comprises the following steps:
step 3-1: determining the state of the energy storage system;
step 3-2: selecting an energy storage action according to an element-greedy strategy;
step 3-3: one of the two functions is randomly selected to update the Q value table, and the Q value table is obtained after 3000 iterations.
CN201910832395.4A 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm Pending CN110598925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910832395.4A CN110598925A (en) 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910832395.4A CN110598925A (en) 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm

Publications (1)

Publication Number Publication Date
CN110598925A true CN110598925A (en) 2019-12-20

Family

ID=68857593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910832395.4A Pending CN110598925A (en) 2019-09-04 2019-09-04 Energy storage in-trading market decision optimization method based on double-Q learning algorithm

Country Status (1)

Country Link
CN (1) CN110598925A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529610A (en) * 2020-11-23 2021-03-19 天津大学 End-to-end electric energy trading market user decision method based on reinforcement learning
CN119494467A (en) * 2024-11-04 2025-02-21 北京瑞智德信息技术有限公司 A method for energy system fault prediction based on knowledge graph

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529610A (en) * 2020-11-23 2021-03-19 天津大学 End-to-end electric energy trading market user decision method based on reinforcement learning
CN119494467A (en) * 2024-11-04 2025-02-21 北京瑞智德信息技术有限公司 A method for energy system fault prediction based on knowledge graph
CN119494467B (en) * 2024-11-04 2025-05-13 北京瑞智德信息技术有限公司 A knowledge graph-based method for energy system fault prediction

Similar Documents

Publication Publication Date Title
US11581740B2 (en) Method, system and storage medium for load dispatch optimization for residential microgrid
Way et al. Empirically grounded technology forecasts and the energy transition
Han et al. Mid-to-long term wind and photovoltaic power generation prediction based on copula function and long short term memory network
Qin et al. Do the benefits outweigh the disadvantages? Exploring the role of artificial intelligence in renewable energy
CN112288164B (en) Wind power combined prediction method considering spatial correlation and correcting numerical weather forecast
CN107563539A (en) Short-term and long-medium term power load forecasting method based on machine learning model
CN110598929B (en) Wind power nonparametric probability interval ultrashort term prediction method
CN102479347B (en) Wind power plant short-term wind speed prediction method and system based on data driving
CN110837915B (en) Low-voltage load point prediction and probability prediction method for power system based on hybrid integrated deep learning
CN113449919A (en) Power consumption prediction method and system based on feature and trend perception
CN110188915A (en) Method and system for optimal configuration of energy storage system in virtual power plant based on scenario set
CN110009160A (en) An Electricity Price Prediction Method Based on Improved Deep Belief Network
CN115374995A (en) Distributed photovoltaic and small wind power station power prediction method
CN114676941B (en) Electric-heat load joint adaptive prediction method and device for integrated energy system in the park
CN109118120B (en) A Multi-objective Decision-Making Method Considering the Sustainable Utilization of Reservoir Scheduling Scheme
CN114511132A (en) Photovoltaic output short-term prediction method and prediction system
Li et al. Research on a novel photovoltaic power forecasting model based on parallel long and short-term time series network
Sun et al. Enhancing financial risk management through lstm and extreme value theory: A high-frequency trading volume approach
Gao et al. Spatio-temporal interpretable neural network for solar irradiation prediction using transformer
CN115049115A (en) RDPG wind speed correction method considering NWP wind speed transverse and longitudinal errors
CN110598925A (en) Energy storage in-trading market decision optimization method based on double-Q learning algorithm
Liu et al. Physics-informed reinforcement learning for probabilistic wind power forecasting under extreme events
CN116362136A (en) Self-dispatching optimization method and system for independent energy storage system
CN114861555A (en) Regional comprehensive energy system short-term load prediction method based on Copula theory
CN115713252B (en) A method for optimizing comprehensive benefit evaluation scheme of hydro-wind-solar-storage multi-energy complementary system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220

RJ01 Rejection of invention patent application after publication