CN113469839A - Smart park optimization strategy based on deep reinforcement learning - Google Patents

Smart park optimization strategy based on deep reinforcement learning Download PDF

Info

Publication number
CN113469839A
CN113469839A CN202110748404.9A CN202110748404A CN113469839A CN 113469839 A CN113469839 A CN 113469839A CN 202110748404 A CN202110748404 A CN 202110748404A CN 113469839 A CN113469839 A CN 113469839A
Authority
CN
China
Prior art keywords
optimization
park
decision
intelligent
day
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110748404.9A
Other languages
Chinese (zh)
Inventor
崔勇
王伟红
肖飞
顾军
曹亮
王治华
章渊
金敏杰
艾芊
李昭昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
State Grid Shanghai Electric Power Co Ltd
Original Assignee
Shanghai Jiaotong University
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University, State Grid Shanghai Electric Power Co Ltd filed Critical Shanghai Jiaotong University
Priority to CN202110748404.9A priority Critical patent/CN113469839A/en
Publication of CN113469839A publication Critical patent/CN113469839A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a smart park optimization strategy based on deep reinforcement learning, which relates to the field of smart park optimization and comprises the following steps: constructing a model of a smart park, wherein the smart park comprises a park decision center, a micro gas turbine, a PV power generation system, an energy storage system and a park load, and the park load comprises a rigid load and a flexible load; and (3) realizing the optimization decision of the intelligent park by adopting a deep reinforcement learning method aiming at the time scale before the day and the time scale in the day. The invention adopts a mode of combining two time scales, and adopts a deep reinforcement learning method based on a deep Q network algorithm aiming at the time scale before the day to realize the optimization process of a discrete action space; aiming at the time scale in the day, a depth reinforcement learning method based on a dominant motion comment algorithm is adopted to realize the optimization decision of a continuous motion space; the decision behavior of the day-ahead optimization is considered in the day-in optimization, so that the algorithm convergence is accelerated, and the training efficiency is improved.

Description

Smart park optimization strategy based on deep reinforcement learning
Technical Field
The invention relates to the field of intelligent park optimization, in particular to an intelligent park optimization strategy based on deep reinforcement learning.
Background
Reinforcement Learning (RL) is a type of Learning problem constructed in the context of Markov Decision Process (MDP) planning, and is a research hotspot in the field of machine Learning, and is currently widely applied in the fields of industrial manufacturing, optimization and scheduling, game playing and the like. In the RL, the agent observes and acquires environmental state information by constantly interacting with the environment, and formulates an action policy based on the acquired information. The goal of the smarts in the RL is to compute the mapping between policies, environmental states and actions to achieve maximization of long-term rewards. At present, reinforcement learning develops partial application in the aspect of micro-grid optimization, David Dominguez-Barbero et al discusses the application of reinforcement learning technology in micro-grid operation, and analyzes and considers the influence of different definitions of system states by expanding a variable set for defining the system states.
However, non-approximate methods generally cannot predict better actions in states that have not been explored in the past, and all action-reward results need to be stored for each explored state, which causes huge computational burden and memory overhead, and is not suitable for the operation process of the smart park. Thus, Deep Reinforcement Learning (DRL) that combines neural networks with RLs can be employed to solve optimal decision problems, process unstructured environments, and predict actions in previously unaccessed states in an end-to-end manner. The DRL is very strong in universality, and the Ismantel Samadai provides a grid-connected micro grid distributed energy management method which is based on multiple intelligent agents and consists of wind energy and photovoltaic resources, a diesel generator, electric energy storage, cogeneration and the like, so that the behaviors and the running cost of the DRL are optimized. The nikta tomin uses deep reinforcement learning to solve the optimal activation problem of the flexible energy (short-term and long-term energy storage capacity) of the microgrid.
However, the application of deep reinforcement learning is mainly directed at a microgrid scene, the deep reinforcement learning for intelligent park optimization is less in application, the multi-time scale optimization application based on the deep reinforcement learning is still deficient, and the deep reinforcement learning is mainly a research for a single time scale optimization strategy at present.
Therefore, those skilled in the art are working to develop a multi-time scale intelligent campus optimization strategy based on deep reinforcement learning.
Disclosure of Invention
In view of the above defects in the prior art, the technical problem to be solved by the present invention is to consider a deep reinforcement learning optimization strategy of two time scales in a day before in the smart park optimization.
In order to achieve the purpose, the invention provides an intelligent park optimization strategy based on deep reinforcement learning, which comprises the following steps:
step 1, constructing a model of an intelligent park, wherein the intelligent park comprises a park decision center, a micro gas turbine, a PV power generation system, an energy storage system and a park load, and the park load comprises a rigid load and a flexible load;
and 2, realizing the optimization decision of the intelligent park by adopting a deep reinforcement learning method according to the time scale before the day and the time scale in the day.
Further, the optimization decision of the intelligent park comprises the steps that an external power grid and the intelligent park are used as environments, the park decision center is used as an intelligent agent, and the optimization decision process is continuously iterated and completed through an interactive relation between the intelligent agent and the environments; the interaction relationship is that the environment receives the actions taken by the agent and gives the agent the environmental status and rewards.
Further, the decision process of the action taken by the agent is a Markov decision process; the markov decision process causes the state of the agent at the next moment in the interaction between the agent and the environment to depend only on the current state and the action made by the agent.
Further, the decision process of the optimization decision of the intelligent park comprises the steps that under the time scale of the day ahead and the time scale of the day in, the park decision center obtains reward feedback given by the environment for the action and the updated environment state, carries out a new round of decision process and acts the action on the environment; and the day time scale optimization decision adopts the start-stop result of the day-ahead time scale optimization decision.
Further, the optimization decision of the time scale in the day-ahead comprises using a deep Q network algorithm to determine a start-stop strategy of the micro gas turbine unit; the deep Q network algorithm comprises fitting a value function nonlinearly by introducing a neural network on the basis of Q-learning, and continuously learning a Q value corresponding to an action through an interaction process to further obtain an optimal strategy; the optimization decision of the time scale in the day comprises the steps that a dominant action comment algorithm is used for continuous action variable selection, and the intelligent park optimization strategy formulation is realized; the dominant action comment algorithm comprises an action network, a comment network and a dominant function, wherein the action network realizes action selection based on probability, and the comment network carries out action judgment and value feedback; the advantage function reflects the advantage of the action value function over the state value function by the difference between the action value function and the state value function.
Further, the optimization objective of the optimization decision is that the total operating cost of the intelligent park is minimum, specifically:
Figure BDA0003141166690000021
in the formula (I), the compound is shown in the specification,
Figure BDA0003141166690000022
Ltrespectively representing the change of energy storage charging and discharging, the output of the micro gas turbine, the interaction electric quantity with a large power grid and the load state, ptIs the state transition probability.
Further, the constraint conditions of the optimization decision include a campus supply and demand balance constraint and an energy storage constraint, and specifically include:
Figure BDA0003141166690000023
Figure BDA0003141166690000031
Figure BDA0003141166690000032
in the formula (I), the compound is shown in the specification,
Figure BDA0003141166690000033
representing the power generation of the nth PV generator set at the time t, NpIs a gardenThe number of PV generator sets in the area;
Figure BDA0003141166690000034
representing the interaction power with the large grid,
Figure BDA0003141166690000035
represents the charge and discharge power of the nth energy storage device at the time t, NESThe number of energy storage devices in the park;
Figure BDA0003141166690000036
respectively representing the power demand of the rigid load and the flexible load at the moment t, Nrl,NflRespectively the number of rigid loads and flexible loads in the garden; pES(t) and EES(t) charge and discharge power and capacity of stored energy respectively;
Figure BDA0003141166690000037
and
Figure BDA0003141166690000038
respectively are the charge-discharge upper limit and the discharge lower limit of the stored energy,
Figure BDA0003141166690000039
and
Figure BDA00031411666900000310
respectively an upper limit and a lower limit of the capacity of the stored energy.
Further, the environment comprises photovoltaic output, time-of-use electricity price, rigid load, flexible load and the state of charge of the stored energy, and the action comprises starting and stopping of the micro gas turbine, the output level and the energy change of the stored energy and the flexible load.
Further, the optimization decision of the time scale of day further includes discretizing the motion space and processing based on the discretized result.
Further, the function expression of the reward comprises an optimization target F and a Penalty function Penalty, the Penalty function is introduced to punish the condition that the constraint is not met, and the specific reward function is expressed as: reward ═ - (F + Penalty).
The invention has the beneficial effects that:
1. according to the invention, a mode of combining two time scales is adopted, and aiming at the time scale before the day and the time scale in the day, optimization strategies based on a deep reinforcement learning method are respectively designed, and the decision behavior of the optimization before the day is considered in the optimization in the day, so that the algorithm convergence is accelerated, and the training efficiency is improved.
2. The intelligent park optimization decision-making method based on the deep reinforcement learning is based on the characteristics of the intelligent park and based on the established intelligent park model considering the park decision-making center, the micro gas turbine, the photovoltaic, the energy storage and the adjustable load.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a diagram of the intelligent campus architecture in accordance with a preferred embodiment of the present invention;
FIG. 2 is a diagram of the environment and agent interaction process in accordance with a preferred embodiment of the present invention.
Detailed Description
The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
In the drawings, structurally identical elements are represented by like reference numerals, and structurally or functionally similar elements are represented by like reference numerals throughout the several views. The size and thickness of each component shown in the drawings are arbitrarily illustrated, and the present invention is not limited to the size and thickness of each component. The thickness of the components may be exaggerated where appropriate in the figures to improve clarity.
FIG. 1 shows an architectural model of a smart campus. The intelligent park mainly comprises a park decision center, a micro gas turbine, a Photovoltaic (PV) power generation system, an energy storage system and a park load. Loads in the intelligent park system can be divided into rigid loads and flexible loads according to the difference of management modes, wherein the rigid loads cannot be adjusted, the power demand of the loads needs to be firstly met in the scheduling process, the flexible loads can be adjusted, and the loads and the energy storage system are used as the controllable resources on the demand side to participate in park optimization decision.
When deep reinforcement learning is applied to campus optimization, an external power grid is assumed as an environment, a smart campus decision center is a decision-making main agent intelligent agent, the external power grid and the smart campus decision center interact with each other to obtain an environment state, actions taken by the intelligent agent and rewards given to the intelligent agent, an optimization decision process is completed through continuous iteration, and the interaction relationship is shown in fig. 2. The decision process of the agent action is assumed to be a markov decision process, i.e. in the interaction between the network and the environment, the state of the network at the next moment only depends on the current state and the decision action made by the network. The optimization decision process takes into account both the day-ahead and the day-in time scales. And under each time scale, the intelligent park decision center acquires reward feedback given by the environment aiming at the action and the updated environment state, performs a new decision process and acts the action on the environment. The day-to-day optimization decision considers the start-stop result of the day-to-day decision, and is used for accelerating algorithm convergence and improving training efficiency.
Then, a multi-time scale intelligent park optimization decision method is designed based on a Deep Q Network (DQN) and an Advantage action review (A2C) algorithm. The deep Q network fits a value function nonlinearly by introducing a neural network on the basis of Q-learning, and learns the Q value corresponding to the action continuously through an interaction process, so as to obtain an optimal strategy. The DQN algorithm is only applicable to discrete action spaces and is therefore used in smart park optimization for determining start-stop strategies for micro gas turbine plants at longer time scales (e.g. day-ahead time scales). The dominant action comment algorithm comprises two networks, wherein the action network realizes action selection based on probability, and the comment network carries out action judgment and value feedback. An advantage function is designed in the algorithm, and the advantage of the action value function compared with the advantage of the state value function is reflected by the difference value of the action value function and the state value function. If the merit function is positive, then the action taken is reflected better than the average action; otherwise, the effect is worse than the average action. The A2C algorithm is suitable for continuous action variable selection, and thus can realize intelligent campus optimization strategy formulation of short time scale (such as time-of-day scale).
In addition, the optimization target considers economic benefits, and the minimum total operating cost of the intelligent park is selected as the target along with the change of the economic benefits when the state of the intelligent park is switched based on different strategies.
Figure BDA0003141166690000041
In the formula (I), the compound is shown in the specification,
Figure BDA0003141166690000042
Ltrespectively representing the change of energy storage charging and discharging, the output of the micro gas turbine, the interaction electric quantity with a large power grid and the load state, ptIs the state transition probability.
And the constraint condition of the optimized operation of the park considers the balance constraint of supply and demand and the energy storage constraint of the park.
Figure BDA0003141166690000043
Figure BDA0003141166690000051
Figure BDA0003141166690000052
In the formula (I), the compound is shown in the specification,
Figure BDA0003141166690000053
representing the power generation of the nth PV generator set at the time t, NpThe number of PV generator sets in the park is;
Figure BDA0003141166690000054
representing the interaction power with the large grid,
Figure BDA0003141166690000055
represents the charge and discharge power of the nth energy storage device at the time t, NESThe number of energy storage devices in the park;
Figure BDA0003141166690000056
respectively representing the power demand of the rigid load and the flexible load at the moment t, Nrl,NflRespectively the number of rigid loads and flexible loads in the garden; pES(t),EES(t) charge and discharge power and capacity of stored energy respectively;
Figure BDA0003141166690000057
and
Figure BDA0003141166690000058
respectively are the charge-discharge upper limit and the discharge lower limit of the stored energy,
Figure BDA0003141166690000059
and
Figure BDA00031411666900000510
respectively an upper limit and a lower limit of the capacity of the stored energy.
And finally, defining a state space, an action space and a reward function aiming at the deep reinforcement learning algorithm. Wherein, for the intelligent park, the state information provided by the environment to the decision-making intelligent agent is selected as the photovoltaic output Ppv(t), Time-of-use price (TOU), TOU (t), rigid load Lrl(t) Flexible load Lfl(t) and a state of charge (SOC) SOC (t) for the stored energy, the state space of the smart campus is defined as: state is [ P ]pv(t),TOU(t),Lrl(t),Lfl(t),SOC(t)]. The decision-making intelligent agent acts as the start-stop U of the micro gas turbine according to the state information provided by the environmentgt(t) and the output level Pgt(t), and the energy change Δ P of the stored energyES(t) and energy Change of Flexible load Δ Lrl(t) then SmartThe action space of the park is as follows: action ═ Ugt(t),Pgt(t),ΔPES(t),ΔLrl(t)]. For the DQN method adopted for a long time scale, since the algorithm cannot process continuous variables, it is necessary to discretize the motion space and process based on the discretized results. The reward function is designed to include an optimization target F and a Penalty function (Penalty), Penalty is introduced to punish the condition that the constraint is not met, and then the reward function of the intelligent park can be expressed as: reward ═ - (F + Penalty).
Based on the model, the optimization decision of multiple time scales of the intelligent park can be realized.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A smart campus optimization strategy based on deep reinforcement learning is characterized by comprising the following steps:
step 1, constructing a model of an intelligent park, wherein the intelligent park comprises a park decision center, a micro gas turbine, a PV power generation system, an energy storage system and a park load, and the park load comprises a rigid load and a flexible load;
and 2, realizing the optimization decision of the intelligent park by adopting a deep reinforcement learning method according to the time scale before the day and the time scale in the day.
2. The intelligent park optimization strategy based on deep reinforcement learning as claimed in claim 1, wherein the optimization decision of the intelligent park comprises that an external power grid and the intelligent park are used as environments, the park decision center is used as an intelligent agent, and the optimization decision process is continuously and iteratively completed through an interactive relationship between the intelligent agent and the environments; the interaction relationship is that the environment receives the actions taken by the agent and gives the agent the environmental status and rewards.
3. The intelligent campus optimization strategy of claim 2 wherein the decision process of the action taken by the agent is a markov decision process; the markov decision process causes the state of the agent at the next moment in the interaction between the agent and the environment to depend only on the current state and the action made by the agent.
4. The intelligent campus optimization strategy based on deep reinforcement learning as claimed in claim 3, wherein the decision process of the intelligent campus optimization strategy comprises, at the time scale of day before and time scale of day, the campus decision center obtaining reward feedback given by environment for action, and updated environment state, performing a new round of decision process, and applying action to environment; and the day time scale optimization decision adopts the start-stop result of the day-ahead time scale optimization decision.
5. The intelligent campus optimization strategy based on deep reinforcement learning of claim 4 wherein the optimization decision of the time scale of day includes using a deep Q network algorithm for determining the start-stop strategy of the micro gas turbine unit; the deep Q network algorithm comprises fitting a value function nonlinearly by introducing a neural network on the basis of Q-learning, and continuously learning a Q value corresponding to an action through an interaction process to further obtain an optimal strategy; the optimization decision of the time scale in the day comprises the steps that a dominant action comment algorithm is used for continuous action variable selection, and the intelligent park optimization strategy formulation is realized; the dominant action comment algorithm comprises an action network, a comment network and a dominant function, wherein the action network realizes action selection based on probability, and the comment network carries out action judgment and value feedback; the advantage function reflects the advantage of the action value function over the state value function by the difference between the action value function and the state value function.
6. The intelligent campus optimization strategy of claim 1 wherein the optimization objective of the optimization decision is to minimize the total operating cost of the intelligent campus, specifically:
Figure FDA0003141166680000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003141166680000022
Ltrespectively representing the change of energy storage charging and discharging, the output of the micro gas turbine, the interaction electric quantity with a large power grid and the load state, ptIs the state transition probability.
7. The intelligent campus optimization strategy of claim 1 wherein the constraints of the optimization decision include campus supply-demand balance constraints and energy storage constraints, and specifically are:
Figure FDA0003141166680000023
Figure FDA0003141166680000024
Figure FDA0003141166680000025
in the formula (I), the compound is shown in the specification,
Figure FDA0003141166680000026
representing the power generation of the nth PV generator set at the time t, NpIs a gardenThe number of PV generator sets in the area;
Figure FDA0003141166680000027
representing the interaction power with the large grid,
Figure FDA0003141166680000028
represents the charge and discharge power of the nth energy storage device at the time t, NESThe number of energy storage devices in the park;
Figure FDA0003141166680000029
respectively representing the power demand of the rigid load and the flexible load at the moment t, Nrl,NflRespectively the number of rigid loads and flexible loads in the garden; pES(t) and EES(t) charge and discharge power and capacity of stored energy respectively;
Figure FDA00031411666800000210
and
Figure FDA00031411666800000211
respectively are the charge-discharge upper limit and the discharge lower limit of the stored energy,
Figure FDA00031411666800000212
and
Figure FDA00031411666800000213
respectively an upper limit and a lower limit of the capacity of the stored energy.
8. The intelligent campus optimization strategy of claim 2 wherein the environment comprises photovoltaic power output, time of use, rigid load, flexible load, and state of charge of stored energy, and the actions comprise start-stop and power output levels of micro gas turbines and energy changes of stored energy and flexible load.
9. The intelligent campus optimization strategy of claim 5 wherein the optimization decision of the time scale further comprises discretizing the motion space and processing based on the discretized results.
10. The intelligent park optimization strategy based on deep reinforcement learning as claimed in claim 2, wherein the reward function comprises an optimization objective F and a Penalty function Penalty, and penalizes the condition that the constraint is not satisfied by introducing the Penalty function, and the reward function is expressed as: reward ═ - (F + Penalty).
CN202110748404.9A 2021-06-30 2021-06-30 Smart park optimization strategy based on deep reinforcement learning Pending CN113469839A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110748404.9A CN113469839A (en) 2021-06-30 2021-06-30 Smart park optimization strategy based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110748404.9A CN113469839A (en) 2021-06-30 2021-06-30 Smart park optimization strategy based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN113469839A true CN113469839A (en) 2021-10-01

Family

ID=77877384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110748404.9A Pending CN113469839A (en) 2021-06-30 2021-06-30 Smart park optimization strategy based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113469839A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780688A (en) * 2021-11-10 2021-12-10 中国电力科学研究院有限公司 Optimized operation method, system, equipment and medium of electric heating combined system
CN115577647A (en) * 2022-12-09 2023-01-06 南方电网数字电网研究院有限公司 Power grid fault type identification method and intelligent agent construction method
CN117613919A (en) * 2023-11-24 2024-02-27 浙江大学 Intelligent control method for peak-valley difference of electricity consumption of industrial and commercial park
CN118066651B (en) * 2024-04-25 2024-06-25 上海迪捷通数字科技有限公司 Intelligent park air conditioning method and system based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN110518580A (en) * 2019-08-15 2019-11-29 上海电力大学 A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing
WO2020000399A1 (en) * 2018-06-29 2020-01-02 东莞理工学院 Multi-agent deep reinforcement learning proxy method based on intelligent grid
CN112186743A (en) * 2020-09-16 2021-01-05 北京交通大学 Dynamic power system economic dispatching method based on deep reinforcement learning
CN112580867A (en) * 2020-12-15 2021-03-30 国网福建省电力有限公司经济技术研究院 Park comprehensive energy system low-carbon operation method based on Q learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020000399A1 (en) * 2018-06-29 2020-01-02 东莞理工学院 Multi-agent deep reinforcement learning proxy method based on intelligent grid
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN110518580A (en) * 2019-08-15 2019-11-29 上海电力大学 A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing
CN112186743A (en) * 2020-09-16 2021-01-05 北京交通大学 Dynamic power system economic dispatching method based on deep reinforcement learning
CN112580867A (en) * 2020-12-15 2021-03-30 国网福建省电力有限公司经济技术研究院 Park comprehensive energy system low-carbon operation method based on Q learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李怡瑾: "源荷不确定冷热电联供微网能量调度的建模与学习优化", 控制理论与应用, vol. 35, no. 1, pages 56 - 64 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780688A (en) * 2021-11-10 2021-12-10 中国电力科学研究院有限公司 Optimized operation method, system, equipment and medium of electric heating combined system
CN113780688B (en) * 2021-11-10 2022-02-18 中国电力科学研究院有限公司 Optimized operation method, system, equipment and medium of electric heating combined system
CN115577647A (en) * 2022-12-09 2023-01-06 南方电网数字电网研究院有限公司 Power grid fault type identification method and intelligent agent construction method
CN117613919A (en) * 2023-11-24 2024-02-27 浙江大学 Intelligent control method for peak-valley difference of electricity consumption of industrial and commercial park
CN117613919B (en) * 2023-11-24 2024-05-24 浙江大学 Intelligent control method for peak-valley difference of electricity consumption of industrial and commercial park
CN118066651B (en) * 2024-04-25 2024-06-25 上海迪捷通数字科技有限公司 Intelligent park air conditioning method and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
Yang et al. Reinforcement learning in sustainable energy and electric systems: A survey
Liu et al. Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming
Moghaddam et al. Multi-objective operation management of a renewable MG (micro-grid) with back-up micro-turbine/fuel cell/battery hybrid power source
Huang et al. A self-learning scheme for residential energy system control and management
Motevasel et al. Expert energy management of a micro-grid considering wind energy uncertainty
Faisal et al. Particle swarm optimised fuzzy controller for charging–discharging and scheduling of battery energy storage system in MG applications
CN113469839A (en) Smart park optimization strategy based on deep reinforcement learning
Palma-Behnke et al. A microgrid energy management system based on the rolling horizon strategy
Moghaddam et al. Multi-operation management of a typical micro-grids using Particle Swarm Optimization: A comparative study
Boaro et al. Adaptive dynamic programming algorithm for renewable energy scheduling and battery management
Özkan et al. A hybrid multicriteria decision making methodology based on type-2 fuzzy sets for selection among energy storage alternatives
CN108347062A (en) Microgrid energy based on gesture game manages distributed multiple target Cooperative Optimization Algorithm
CN105337310B (en) A kind of more microgrid Economical Operation Systems of cascaded structure light storage type and method
CN110070292B (en) Micro-grid economic dispatching method based on cross variation whale optimization algorithm
Zhou et al. A new framework for peer-to-peer energy sharing and coordination in the energy internet
CN113435793A (en) Micro-grid optimization scheduling method based on reinforcement learning
Cheng et al. Adaptive robust method for dynamic economic emission dispatch incorporating renewable energy and energy storage
Ebell et al. Reinforcement learning control algorithm for a pv-battery-system providing frequency containment reserve power
Aiswariya et al. Optimal microgrid battery scheduling using simulated annealing
Leo et al. Multi agent reinforcement learning based distributed optimization of solar microgrid
El Zerk et al. Decentralised strategy for energy management of collaborative microgrids using multi‐agent system
TWI639962B (en) Particle Swarm Optimization Fuzzy Logic Control Charging Method Applied to Smart Grid
CN113780622A (en) Multi-micro-grid power distribution system distributed scheduling method based on multi-agent reinforcement learning
Welch et al. Comparison of two optimal control strategies for a grid independent photovoltaic system
Yu et al. A fuzzy Q-learning algorithm for storage optimization in islanding microgrid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination