CN113469839A - Smart park optimization strategy based on deep reinforcement learning - Google Patents
Smart park optimization strategy based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113469839A CN113469839A CN202110748404.9A CN202110748404A CN113469839A CN 113469839 A CN113469839 A CN 113469839A CN 202110748404 A CN202110748404 A CN 202110748404A CN 113469839 A CN113469839 A CN 113469839A
- Authority
- CN
- China
- Prior art keywords
- optimization
- park
- decision
- intelligent
- day
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 73
- 230000002787 reinforcement Effects 0.000 title claims abstract description 30
- 230000009471 action Effects 0.000 claims abstract description 58
- 238000000034 method Methods 0.000 claims abstract description 38
- 230000008569 process Effects 0.000 claims abstract description 28
- 238000004146 energy storage Methods 0.000 claims abstract description 20
- 238000010248 power generation Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 33
- 230000003993 interaction Effects 0.000 claims description 16
- 230000008901 benefit Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 7
- 150000001875 compounds Chemical class 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 4
- 238000007599 discharging Methods 0.000 claims description 3
- 238000009472 formulation Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 230000006399 behavior Effects 0.000 abstract description 3
- 238000012549 training Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000002193 Pain Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/04—Constraint-based CAD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/04—Power grid distribution networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a smart park optimization strategy based on deep reinforcement learning, which relates to the field of smart park optimization and comprises the following steps: constructing a model of a smart park, wherein the smart park comprises a park decision center, a micro gas turbine, a PV power generation system, an energy storage system and a park load, and the park load comprises a rigid load and a flexible load; and (3) realizing the optimization decision of the intelligent park by adopting a deep reinforcement learning method aiming at the time scale before the day and the time scale in the day. The invention adopts a mode of combining two time scales, and adopts a deep reinforcement learning method based on a deep Q network algorithm aiming at the time scale before the day to realize the optimization process of a discrete action space; aiming at the time scale in the day, a depth reinforcement learning method based on a dominant motion comment algorithm is adopted to realize the optimization decision of a continuous motion space; the decision behavior of the day-ahead optimization is considered in the day-in optimization, so that the algorithm convergence is accelerated, and the training efficiency is improved.
Description
Technical Field
The invention relates to the field of intelligent park optimization, in particular to an intelligent park optimization strategy based on deep reinforcement learning.
Background
Reinforcement Learning (RL) is a type of Learning problem constructed in the context of Markov Decision Process (MDP) planning, and is a research hotspot in the field of machine Learning, and is currently widely applied in the fields of industrial manufacturing, optimization and scheduling, game playing and the like. In the RL, the agent observes and acquires environmental state information by constantly interacting with the environment, and formulates an action policy based on the acquired information. The goal of the smarts in the RL is to compute the mapping between policies, environmental states and actions to achieve maximization of long-term rewards. At present, reinforcement learning develops partial application in the aspect of micro-grid optimization, David Dominguez-Barbero et al discusses the application of reinforcement learning technology in micro-grid operation, and analyzes and considers the influence of different definitions of system states by expanding a variable set for defining the system states.
However, non-approximate methods generally cannot predict better actions in states that have not been explored in the past, and all action-reward results need to be stored for each explored state, which causes huge computational burden and memory overhead, and is not suitable for the operation process of the smart park. Thus, Deep Reinforcement Learning (DRL) that combines neural networks with RLs can be employed to solve optimal decision problems, process unstructured environments, and predict actions in previously unaccessed states in an end-to-end manner. The DRL is very strong in universality, and the Ismantel Samadai provides a grid-connected micro grid distributed energy management method which is based on multiple intelligent agents and consists of wind energy and photovoltaic resources, a diesel generator, electric energy storage, cogeneration and the like, so that the behaviors and the running cost of the DRL are optimized. The nikta tomin uses deep reinforcement learning to solve the optimal activation problem of the flexible energy (short-term and long-term energy storage capacity) of the microgrid.
However, the application of deep reinforcement learning is mainly directed at a microgrid scene, the deep reinforcement learning for intelligent park optimization is less in application, the multi-time scale optimization application based on the deep reinforcement learning is still deficient, and the deep reinforcement learning is mainly a research for a single time scale optimization strategy at present.
Therefore, those skilled in the art are working to develop a multi-time scale intelligent campus optimization strategy based on deep reinforcement learning.
Disclosure of Invention
In view of the above defects in the prior art, the technical problem to be solved by the present invention is to consider a deep reinforcement learning optimization strategy of two time scales in a day before in the smart park optimization.
In order to achieve the purpose, the invention provides an intelligent park optimization strategy based on deep reinforcement learning, which comprises the following steps:
step 1, constructing a model of an intelligent park, wherein the intelligent park comprises a park decision center, a micro gas turbine, a PV power generation system, an energy storage system and a park load, and the park load comprises a rigid load and a flexible load;
and 2, realizing the optimization decision of the intelligent park by adopting a deep reinforcement learning method according to the time scale before the day and the time scale in the day.
Further, the optimization decision of the intelligent park comprises the steps that an external power grid and the intelligent park are used as environments, the park decision center is used as an intelligent agent, and the optimization decision process is continuously iterated and completed through an interactive relation between the intelligent agent and the environments; the interaction relationship is that the environment receives the actions taken by the agent and gives the agent the environmental status and rewards.
Further, the decision process of the action taken by the agent is a Markov decision process; the markov decision process causes the state of the agent at the next moment in the interaction between the agent and the environment to depend only on the current state and the action made by the agent.
Further, the decision process of the optimization decision of the intelligent park comprises the steps that under the time scale of the day ahead and the time scale of the day in, the park decision center obtains reward feedback given by the environment for the action and the updated environment state, carries out a new round of decision process and acts the action on the environment; and the day time scale optimization decision adopts the start-stop result of the day-ahead time scale optimization decision.
Further, the optimization decision of the time scale in the day-ahead comprises using a deep Q network algorithm to determine a start-stop strategy of the micro gas turbine unit; the deep Q network algorithm comprises fitting a value function nonlinearly by introducing a neural network on the basis of Q-learning, and continuously learning a Q value corresponding to an action through an interaction process to further obtain an optimal strategy; the optimization decision of the time scale in the day comprises the steps that a dominant action comment algorithm is used for continuous action variable selection, and the intelligent park optimization strategy formulation is realized; the dominant action comment algorithm comprises an action network, a comment network and a dominant function, wherein the action network realizes action selection based on probability, and the comment network carries out action judgment and value feedback; the advantage function reflects the advantage of the action value function over the state value function by the difference between the action value function and the state value function.
Further, the optimization objective of the optimization decision is that the total operating cost of the intelligent park is minimum, specifically:
in the formula (I), the compound is shown in the specification,Ltrespectively representing the change of energy storage charging and discharging, the output of the micro gas turbine, the interaction electric quantity with a large power grid and the load state, ptIs the state transition probability.
Further, the constraint conditions of the optimization decision include a campus supply and demand balance constraint and an energy storage constraint, and specifically include:
in the formula (I), the compound is shown in the specification,representing the power generation of the nth PV generator set at the time t, NpIs a gardenThe number of PV generator sets in the area;representing the interaction power with the large grid,represents the charge and discharge power of the nth energy storage device at the time t, NESThe number of energy storage devices in the park;respectively representing the power demand of the rigid load and the flexible load at the moment t, Nrl,NflRespectively the number of rigid loads and flexible loads in the garden; pES(t) and EES(t) charge and discharge power and capacity of stored energy respectively;andrespectively are the charge-discharge upper limit and the discharge lower limit of the stored energy,andrespectively an upper limit and a lower limit of the capacity of the stored energy.
Further, the environment comprises photovoltaic output, time-of-use electricity price, rigid load, flexible load and the state of charge of the stored energy, and the action comprises starting and stopping of the micro gas turbine, the output level and the energy change of the stored energy and the flexible load.
Further, the optimization decision of the time scale of day further includes discretizing the motion space and processing based on the discretized result.
Further, the function expression of the reward comprises an optimization target F and a Penalty function Penalty, the Penalty function is introduced to punish the condition that the constraint is not met, and the specific reward function is expressed as: reward ═ - (F + Penalty).
The invention has the beneficial effects that:
1. according to the invention, a mode of combining two time scales is adopted, and aiming at the time scale before the day and the time scale in the day, optimization strategies based on a deep reinforcement learning method are respectively designed, and the decision behavior of the optimization before the day is considered in the optimization in the day, so that the algorithm convergence is accelerated, and the training efficiency is improved.
2. The intelligent park optimization decision-making method based on the deep reinforcement learning is based on the characteristics of the intelligent park and based on the established intelligent park model considering the park decision-making center, the micro gas turbine, the photovoltaic, the energy storage and the adjustable load.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a diagram of the intelligent campus architecture in accordance with a preferred embodiment of the present invention;
FIG. 2 is a diagram of the environment and agent interaction process in accordance with a preferred embodiment of the present invention.
Detailed Description
The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
In the drawings, structurally identical elements are represented by like reference numerals, and structurally or functionally similar elements are represented by like reference numerals throughout the several views. The size and thickness of each component shown in the drawings are arbitrarily illustrated, and the present invention is not limited to the size and thickness of each component. The thickness of the components may be exaggerated where appropriate in the figures to improve clarity.
FIG. 1 shows an architectural model of a smart campus. The intelligent park mainly comprises a park decision center, a micro gas turbine, a Photovoltaic (PV) power generation system, an energy storage system and a park load. Loads in the intelligent park system can be divided into rigid loads and flexible loads according to the difference of management modes, wherein the rigid loads cannot be adjusted, the power demand of the loads needs to be firstly met in the scheduling process, the flexible loads can be adjusted, and the loads and the energy storage system are used as the controllable resources on the demand side to participate in park optimization decision.
When deep reinforcement learning is applied to campus optimization, an external power grid is assumed as an environment, a smart campus decision center is a decision-making main agent intelligent agent, the external power grid and the smart campus decision center interact with each other to obtain an environment state, actions taken by the intelligent agent and rewards given to the intelligent agent, an optimization decision process is completed through continuous iteration, and the interaction relationship is shown in fig. 2. The decision process of the agent action is assumed to be a markov decision process, i.e. in the interaction between the network and the environment, the state of the network at the next moment only depends on the current state and the decision action made by the network. The optimization decision process takes into account both the day-ahead and the day-in time scales. And under each time scale, the intelligent park decision center acquires reward feedback given by the environment aiming at the action and the updated environment state, performs a new decision process and acts the action on the environment. The day-to-day optimization decision considers the start-stop result of the day-to-day decision, and is used for accelerating algorithm convergence and improving training efficiency.
Then, a multi-time scale intelligent park optimization decision method is designed based on a Deep Q Network (DQN) and an Advantage action review (A2C) algorithm. The deep Q network fits a value function nonlinearly by introducing a neural network on the basis of Q-learning, and learns the Q value corresponding to the action continuously through an interaction process, so as to obtain an optimal strategy. The DQN algorithm is only applicable to discrete action spaces and is therefore used in smart park optimization for determining start-stop strategies for micro gas turbine plants at longer time scales (e.g. day-ahead time scales). The dominant action comment algorithm comprises two networks, wherein the action network realizes action selection based on probability, and the comment network carries out action judgment and value feedback. An advantage function is designed in the algorithm, and the advantage of the action value function compared with the advantage of the state value function is reflected by the difference value of the action value function and the state value function. If the merit function is positive, then the action taken is reflected better than the average action; otherwise, the effect is worse than the average action. The A2C algorithm is suitable for continuous action variable selection, and thus can realize intelligent campus optimization strategy formulation of short time scale (such as time-of-day scale).
In addition, the optimization target considers economic benefits, and the minimum total operating cost of the intelligent park is selected as the target along with the change of the economic benefits when the state of the intelligent park is switched based on different strategies.
In the formula (I), the compound is shown in the specification,Ltrespectively representing the change of energy storage charging and discharging, the output of the micro gas turbine, the interaction electric quantity with a large power grid and the load state, ptIs the state transition probability.
And the constraint condition of the optimized operation of the park considers the balance constraint of supply and demand and the energy storage constraint of the park.
In the formula (I), the compound is shown in the specification,representing the power generation of the nth PV generator set at the time t, NpThe number of PV generator sets in the park is;representing the interaction power with the large grid,represents the charge and discharge power of the nth energy storage device at the time t, NESThe number of energy storage devices in the park;respectively representing the power demand of the rigid load and the flexible load at the moment t, Nrl,NflRespectively the number of rigid loads and flexible loads in the garden; pES(t),EES(t) charge and discharge power and capacity of stored energy respectively;andrespectively are the charge-discharge upper limit and the discharge lower limit of the stored energy,andrespectively an upper limit and a lower limit of the capacity of the stored energy.
And finally, defining a state space, an action space and a reward function aiming at the deep reinforcement learning algorithm. Wherein, for the intelligent park, the state information provided by the environment to the decision-making intelligent agent is selected as the photovoltaic output Ppv(t), Time-of-use price (TOU), TOU (t), rigid load Lrl(t) Flexible load Lfl(t) and a state of charge (SOC) SOC (t) for the stored energy, the state space of the smart campus is defined as: state is [ P ]pv(t),TOU(t),Lrl(t),Lfl(t),SOC(t)]. The decision-making intelligent agent acts as the start-stop U of the micro gas turbine according to the state information provided by the environmentgt(t) and the output level Pgt(t), and the energy change Δ P of the stored energyES(t) and energy Change of Flexible load Δ Lrl(t) then SmartThe action space of the park is as follows: action ═ Ugt(t),Pgt(t),ΔPES(t),ΔLrl(t)]. For the DQN method adopted for a long time scale, since the algorithm cannot process continuous variables, it is necessary to discretize the motion space and process based on the discretized results. The reward function is designed to include an optimization target F and a Penalty function (Penalty), Penalty is introduced to punish the condition that the constraint is not met, and then the reward function of the intelligent park can be expressed as: reward ═ - (F + Penalty).
Based on the model, the optimization decision of multiple time scales of the intelligent park can be realized.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (10)
1. A smart campus optimization strategy based on deep reinforcement learning is characterized by comprising the following steps:
step 1, constructing a model of an intelligent park, wherein the intelligent park comprises a park decision center, a micro gas turbine, a PV power generation system, an energy storage system and a park load, and the park load comprises a rigid load and a flexible load;
and 2, realizing the optimization decision of the intelligent park by adopting a deep reinforcement learning method according to the time scale before the day and the time scale in the day.
2. The intelligent park optimization strategy based on deep reinforcement learning as claimed in claim 1, wherein the optimization decision of the intelligent park comprises that an external power grid and the intelligent park are used as environments, the park decision center is used as an intelligent agent, and the optimization decision process is continuously and iteratively completed through an interactive relationship between the intelligent agent and the environments; the interaction relationship is that the environment receives the actions taken by the agent and gives the agent the environmental status and rewards.
3. The intelligent campus optimization strategy of claim 2 wherein the decision process of the action taken by the agent is a markov decision process; the markov decision process causes the state of the agent at the next moment in the interaction between the agent and the environment to depend only on the current state and the action made by the agent.
4. The intelligent campus optimization strategy based on deep reinforcement learning as claimed in claim 3, wherein the decision process of the intelligent campus optimization strategy comprises, at the time scale of day before and time scale of day, the campus decision center obtaining reward feedback given by environment for action, and updated environment state, performing a new round of decision process, and applying action to environment; and the day time scale optimization decision adopts the start-stop result of the day-ahead time scale optimization decision.
5. The intelligent campus optimization strategy based on deep reinforcement learning of claim 4 wherein the optimization decision of the time scale of day includes using a deep Q network algorithm for determining the start-stop strategy of the micro gas turbine unit; the deep Q network algorithm comprises fitting a value function nonlinearly by introducing a neural network on the basis of Q-learning, and continuously learning a Q value corresponding to an action through an interaction process to further obtain an optimal strategy; the optimization decision of the time scale in the day comprises the steps that a dominant action comment algorithm is used for continuous action variable selection, and the intelligent park optimization strategy formulation is realized; the dominant action comment algorithm comprises an action network, a comment network and a dominant function, wherein the action network realizes action selection based on probability, and the comment network carries out action judgment and value feedback; the advantage function reflects the advantage of the action value function over the state value function by the difference between the action value function and the state value function.
6. The intelligent campus optimization strategy of claim 1 wherein the optimization objective of the optimization decision is to minimize the total operating cost of the intelligent campus, specifically:
7. The intelligent campus optimization strategy of claim 1 wherein the constraints of the optimization decision include campus supply-demand balance constraints and energy storage constraints, and specifically are:
in the formula (I), the compound is shown in the specification,representing the power generation of the nth PV generator set at the time t, NpIs a gardenThe number of PV generator sets in the area;representing the interaction power with the large grid,represents the charge and discharge power of the nth energy storage device at the time t, NESThe number of energy storage devices in the park;respectively representing the power demand of the rigid load and the flexible load at the moment t, Nrl,NflRespectively the number of rigid loads and flexible loads in the garden; pES(t) and EES(t) charge and discharge power and capacity of stored energy respectively;andrespectively are the charge-discharge upper limit and the discharge lower limit of the stored energy,andrespectively an upper limit and a lower limit of the capacity of the stored energy.
8. The intelligent campus optimization strategy of claim 2 wherein the environment comprises photovoltaic power output, time of use, rigid load, flexible load, and state of charge of stored energy, and the actions comprise start-stop and power output levels of micro gas turbines and energy changes of stored energy and flexible load.
9. The intelligent campus optimization strategy of claim 5 wherein the optimization decision of the time scale further comprises discretizing the motion space and processing based on the discretized results.
10. The intelligent park optimization strategy based on deep reinforcement learning as claimed in claim 2, wherein the reward function comprises an optimization objective F and a Penalty function Penalty, and penalizes the condition that the constraint is not satisfied by introducing the Penalty function, and the reward function is expressed as: reward ═ - (F + Penalty).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110748404.9A CN113469839A (en) | 2021-06-30 | 2021-06-30 | Smart park optimization strategy based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110748404.9A CN113469839A (en) | 2021-06-30 | 2021-06-30 | Smart park optimization strategy based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113469839A true CN113469839A (en) | 2021-10-01 |
Family
ID=77877384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110748404.9A Pending CN113469839A (en) | 2021-06-30 | 2021-06-30 | Smart park optimization strategy based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113469839A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780688A (en) * | 2021-11-10 | 2021-12-10 | 中国电力科学研究院有限公司 | Optimized operation method, system, equipment and medium of electric heating combined system |
CN115577647A (en) * | 2022-12-09 | 2023-01-06 | 南方电网数字电网研究院有限公司 | Power grid fault type identification method and intelligent agent construction method |
CN117613919A (en) * | 2023-11-24 | 2024-02-27 | 浙江大学 | Intelligent control method for peak-valley difference of electricity consumption of industrial and commercial park |
CN118066651B (en) * | 2024-04-25 | 2024-06-25 | 上海迪捷通数字科技有限公司 | Intelligent park air conditioning method and system based on artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108964042A (en) * | 2018-07-24 | 2018-12-07 | 合肥工业大学 | Regional power grid operating point method for optimizing scheduling based on depth Q network |
CN110518580A (en) * | 2019-08-15 | 2019-11-29 | 上海电力大学 | A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing |
WO2020000399A1 (en) * | 2018-06-29 | 2020-01-02 | 东莞理工学院 | Multi-agent deep reinforcement learning proxy method based on intelligent grid |
CN112186743A (en) * | 2020-09-16 | 2021-01-05 | 北京交通大学 | Dynamic power system economic dispatching method based on deep reinforcement learning |
CN112580867A (en) * | 2020-12-15 | 2021-03-30 | 国网福建省电力有限公司经济技术研究院 | Park comprehensive energy system low-carbon operation method based on Q learning |
-
2021
- 2021-06-30 CN CN202110748404.9A patent/CN113469839A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020000399A1 (en) * | 2018-06-29 | 2020-01-02 | 东莞理工学院 | Multi-agent deep reinforcement learning proxy method based on intelligent grid |
CN108964042A (en) * | 2018-07-24 | 2018-12-07 | 合肥工业大学 | Regional power grid operating point method for optimizing scheduling based on depth Q network |
CN110518580A (en) * | 2019-08-15 | 2019-11-29 | 上海电力大学 | A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing |
CN112186743A (en) * | 2020-09-16 | 2021-01-05 | 北京交通大学 | Dynamic power system economic dispatching method based on deep reinforcement learning |
CN112580867A (en) * | 2020-12-15 | 2021-03-30 | 国网福建省电力有限公司经济技术研究院 | Park comprehensive energy system low-carbon operation method based on Q learning |
Non-Patent Citations (1)
Title |
---|
李怡瑾: "源荷不确定冷热电联供微网能量调度的建模与学习优化", 控制理论与应用, vol. 35, no. 1, pages 56 - 64 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780688A (en) * | 2021-11-10 | 2021-12-10 | 中国电力科学研究院有限公司 | Optimized operation method, system, equipment and medium of electric heating combined system |
CN113780688B (en) * | 2021-11-10 | 2022-02-18 | 中国电力科学研究院有限公司 | Optimized operation method, system, equipment and medium of electric heating combined system |
CN115577647A (en) * | 2022-12-09 | 2023-01-06 | 南方电网数字电网研究院有限公司 | Power grid fault type identification method and intelligent agent construction method |
CN117613919A (en) * | 2023-11-24 | 2024-02-27 | 浙江大学 | Intelligent control method for peak-valley difference of electricity consumption of industrial and commercial park |
CN117613919B (en) * | 2023-11-24 | 2024-05-24 | 浙江大学 | Intelligent control method for peak-valley difference of electricity consumption of industrial and commercial park |
CN118066651B (en) * | 2024-04-25 | 2024-06-25 | 上海迪捷通数字科技有限公司 | Intelligent park air conditioning method and system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Reinforcement learning in sustainable energy and electric systems: A survey | |
Liu et al. | Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming | |
Moghaddam et al. | Multi-objective operation management of a renewable MG (micro-grid) with back-up micro-turbine/fuel cell/battery hybrid power source | |
Huang et al. | A self-learning scheme for residential energy system control and management | |
Motevasel et al. | Expert energy management of a micro-grid considering wind energy uncertainty | |
Faisal et al. | Particle swarm optimised fuzzy controller for charging–discharging and scheduling of battery energy storage system in MG applications | |
CN113469839A (en) | Smart park optimization strategy based on deep reinforcement learning | |
Palma-Behnke et al. | A microgrid energy management system based on the rolling horizon strategy | |
Moghaddam et al. | Multi-operation management of a typical micro-grids using Particle Swarm Optimization: A comparative study | |
Boaro et al. | Adaptive dynamic programming algorithm for renewable energy scheduling and battery management | |
Özkan et al. | A hybrid multicriteria decision making methodology based on type-2 fuzzy sets for selection among energy storage alternatives | |
CN108347062A (en) | Microgrid energy based on gesture game manages distributed multiple target Cooperative Optimization Algorithm | |
CN105337310B (en) | A kind of more microgrid Economical Operation Systems of cascaded structure light storage type and method | |
CN110070292B (en) | Micro-grid economic dispatching method based on cross variation whale optimization algorithm | |
Zhou et al. | A new framework for peer-to-peer energy sharing and coordination in the energy internet | |
CN113435793A (en) | Micro-grid optimization scheduling method based on reinforcement learning | |
Cheng et al. | Adaptive robust method for dynamic economic emission dispatch incorporating renewable energy and energy storage | |
Ebell et al. | Reinforcement learning control algorithm for a pv-battery-system providing frequency containment reserve power | |
Aiswariya et al. | Optimal microgrid battery scheduling using simulated annealing | |
Leo et al. | Multi agent reinforcement learning based distributed optimization of solar microgrid | |
El Zerk et al. | Decentralised strategy for energy management of collaborative microgrids using multi‐agent system | |
TWI639962B (en) | Particle Swarm Optimization Fuzzy Logic Control Charging Method Applied to Smart Grid | |
CN113780622A (en) | Multi-micro-grid power distribution system distributed scheduling method based on multi-agent reinforcement learning | |
Welch et al. | Comparison of two optimal control strategies for a grid independent photovoltaic system | |
Yu et al. | A fuzzy Q-learning algorithm for storage optimization in islanding microgrid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |