CN117374937A - Multi-micro-grid collaborative optimization operation method, device, equipment and medium - Google Patents
Multi-micro-grid collaborative optimization operation method, device, equipment and medium Download PDFInfo
- Publication number
- CN117374937A CN117374937A CN202311315801.2A CN202311315801A CN117374937A CN 117374937 A CN117374937 A CN 117374937A CN 202311315801 A CN202311315801 A CN 202311315801A CN 117374937 A CN117374937 A CN 117374937A
- Authority
- CN
- China
- Prior art keywords
- agent
- action
- micro
- space
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000005457 optimization Methods 0.000 title claims abstract description 45
- 230000009471 action Effects 0.000 claims abstract description 94
- 230000006870 function Effects 0.000 claims abstract description 61
- 230000002787 reinforcement Effects 0.000 claims abstract description 48
- 238000004146 energy storage Methods 0.000 claims abstract description 33
- 230000005611 electricity Effects 0.000 claims abstract description 31
- 238000012946 outsourcing Methods 0.000 claims abstract description 18
- 239000003795 chemical substances by application Substances 0.000 claims description 176
- 238000013528 artificial neural network Methods 0.000 claims description 48
- 238000012549 training Methods 0.000 claims description 39
- 230000007613 environmental effect Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 14
- 238000010276 construction Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 12
- 238000001816 cooling Methods 0.000 claims description 10
- 238000010438 heat treatment Methods 0.000 claims description 10
- 238000007599 discharging Methods 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 2
- 238000005265 energy consumption Methods 0.000 abstract description 10
- 238000010586 diagram Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 3
- 229910052799 carbon Inorganic materials 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013486 operation strategy Methods 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/003—Load forecast, e.g. methods or systems for forecasting future load demand
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/004—Generation forecast, e.g. methods or systems for forecasting future energy generation
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/008—Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/28—Arrangements for balancing of the load in a network by storage of energy
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/22—The renewable source being solar energy
- H02J2300/24—The renewable source being solar energy of photovoltaic origin
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Power Engineering (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Biomedical Technology (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Public Health (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a multi-micro-grid collaborative optimization operation method, a device, equipment and a medium, which are characterized in that a state space S and an action space A of a multi-micro-grid are constructed, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the plurality of intelligent agents, and the plurality of intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, so that the energy consumption cost is reduced.
Description
Technical Field
The invention belongs to the field of micro-grid collaborative optimization operation, and particularly relates to a multi-micro-grid collaborative optimization operation method, device, equipment and medium.
Background
The micro-grid comprises a plurality of distributed energy sources and energy storage systems, and can realize gradient utilization of energy and finally realize high-efficiency and flexible utilization of energy by proper system configuration and proper operation strategy on the premise of meeting load requirements. Each micro-grid is cooperated and assisted by energy among areas, so that the running cost of the system is reduced, and the energy utilization efficiency is improved.
In the prior art, for a multi-micro-grid group, each micro-grid belongs to different benefit subjects, and in the multi-micro-grid collaborative optimization operation process, how to collaboratively meet regional power consumption requirements, so that the energy consumption cost is reduced, and the problem of multi-micro-grid collaborative operation is solved.
Disclosure of Invention
The invention provides a multi-micro-grid collaborative optimization operation method, device, equipment and medium, which aim to collaborative meet regional power consumption requirements and reduce energy consumption cost.
In a first aspect, the present invention provides a multi-micro grid collaborative optimization operation method, where the collaborative optimization operation method includes:
constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A;
and constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
Preferably, the motion space a is represented by the following formula:
A={a elec ,a cooling ,a heating }
wherein a is elec ,a cooling ,a heating The charging and discharging actions of the electric energy storage device, the cold energy storage device and the hot energy storage device are performed; a epsilon [ -1,1]Indicating 100 x a percent charging (+)/discharging (-) of the current capacity.
Preferably, the multi-agent deep reinforcement learning reward function is as follows:
in the formula e i Represents the outsourcing electricity quantity of the ith intelligent agent, n represents the number of the intelligent agents, and the rewarding function of each intelligent agent comprises the outsourcing electricity quantity e of the intelligent agent i And a component Σe containing other agent information i 。
Preferably, the collaborative optimization operation method further comprises: multiple agent deep reinforcement learning action selection flow based on sequential iteration:
randomly sequencing all agents;
first agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations;
estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
When m=k, the multi-agent finally outputs a group of action sequences
Preferably, the multi-agent deep reinforcement learning training process includes:
constructing a neural network comprising a state value neural network q 1 、q 2 The policy neural network pi and the corresponding target neural network are π target ;
Setting training round number epoode, training iteration number num_steps of each round, playback pool buffer capacity, batch.size, learning rate and discount factor based on the constructed neural network;
when the number of environmental steps is smaller than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) Storing into buffer, and executing this step circularly;
when the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And storing the data into a buffer, starting training the network, and circularly executing the step.
Wherein,
sampling of batch size samples from buffer (s, a, r, s t+1 ) Converting into a tensor training language;
state s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 );
Based on the obtained values, calculate π target Value and calculate neural network q 1 、q 2 A pi loss function, and performing gradient update based on an optimizer selected by the neural network;
based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target ;
Stopping network training when the set times of the training round ep are reached;
the trained multi-agent deep reinforcement learning directly gives out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
In a second aspect, an embodiment of the present invention provides a device for collaborative optimization operation of multiple micro-grids, where the device includes:
the space construction module is used for constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;
and the rewarding function construction module is used for constructing a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
In a third aspect, an embodiment of the invention provides an electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method according to any of the first aspects when executing the program.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the first aspects.
Advantageous effects
The embodiment of the invention provides a multi-micro-grid collaborative optimization operation method, a device, equipment and a medium, which are implemented by constructing a state space S and an action space A of the multi-micro-grid, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
fig. 1 is a flowchart of a multi-micro grid collaborative optimization operation method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method of collaborative optimization operation for multiple micro-grids according to an embodiment of the present invention;
FIG. 3 is a flow chart of a multi-agent deep reinforcement learning training process according to an embodiment of the present invention;
FIG. 4 is a flow chart of a multi-agent deep reinforcement learning action selection process based on sequential iteration in accordance with an embodiment of the present invention;
FIG. 5 is a graph showing convergence of the reward function of the algorithm training result according to the embodiment of the present invention;
fig. 6 is a graph showing an energy storage charging and discharging operation of a system including 3 multi-energy micro-grid co-operation for 24 hours according to an embodiment of the present invention;
fig. 7 is a block diagram of a multi-micro grid collaborative optimization operation device according to an embodiment of the present invention;
fig. 8 is a block diagram of another multi-micro grid collaborative optimization operation device according to an embodiment of the present invention;
fig. 9 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings in connection with embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The following detailed description is exemplary and is intended to provide further details of the invention. Unless defined otherwise, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the invention.
Multi-agent deep reinforcement learning is an optimization method based on artificial intelligence technology. According to the method, each micro-grid is regarded as an agent, and the information interaction and cooperative operation mechanism among the agents is designed, so that the benefit requirements of each micro-grid are met on the premise of protecting the data privacy, and the cooperative and optimal operation of multiple micro-grids is finally realized. Meanwhile, the multi-agent deep reinforcement learning is a data driving method, an accurate system model is not required to be constructed based on a physical formula, and an optimal control strategy is obtained under the guidance of a reward and punishment mechanism through interaction with the environment, so that the problems of high dimension, multiple parameters, nonlinearity and the like in the traditional optimization method are solved.
Referring to fig. 1, an embodiment of the present invention provides a method for collaborative optimization operation of multiple micro-grids, which specifically includes:
s20, constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;
s40, constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely the electric quantity consumption of the whole region.
By constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.
The following describes the advantageous effects of the present invention by way of a specific example:
please refer to fig. 2:
s1, constructing a multi-micro-grid collaborative optimization operation environment, wherein the multi-micro-grid collaborative optimization operation environment comprises operation parameters such as renewable energy sources, energy storage devices, energy conversion devices and the like contained in the micro-grid, electricity prices, carbon emission prices and the like.
S2, constructing a multi-agent deep reinforcement learning state space S, wherein variables contained in the S are shown in the following table:
TABLE 1 State space variable constitution
Sequence number | Amount of state space | Sequence number | Amount of state space |
1 | Date of day | 6 | Carbon emission unit price |
2 | External electricity purchasing and selling amount of micro-grid | 7 | Current electricity price and 24-hour predictive value |
3 | Energy storage state SOC | 8 | Predicted value of temperature and 24 hours |
4 | Electric load | 9 | Humidity and 24-hour predictive value |
5 | Photovoltaic output | 10 | Solar radiation and 24-hour prediction value |
S3, constructing a multi-agent deep reinforcement learning action space A, wherein the action space A is shown in the following formula:
A={a elec ,a cooling ,a heating } (1)
wherein a is elec ,a cooling ,a heating The device is used for charging and discharging the electric energy storage device, the cold energy storage device and the hot energy storage device. a epsilon [ -1,1]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.
S4, constructing a multi-agent deep reinforcement learning reward function, wherein the multi-agent deep reinforcement learning reward function is shown in the following formula:
in the formula e i Represents the outsourcing electricity quantity of the ith agent, and n represents the number of agents. For each agent, the reward function contains its own outsourcing power e i And a component Σe containing other agent information i I.e. the whole areaIs a power consumption of the battery. Sigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced. Reinforcement learning reward functions including individual agent electricity consumption and regional electricity consumption are disclosedSigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.
S5, designing a multi-agent deep reinforcement learning action selection flow based on sequential iteration, as shown in FIG. 5
1) Randomly sequencing all agents;
2) First agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations,
3) Estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
4) Next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
5) When m=k, i.e. iterate k times, the multi-agent finally outputs a set of action sequences
6) Action sequenceAnd outputting.
S6, a multi-agent deep reinforcement learning training process is shown in fig. 4
1) 6 neural networks are built, namely a state value neural network q 1 、q 2 The method comprises the steps of carrying out a first treatment on the surface of the Policy neural network pi. The parameters are respectively as follows: θ 1 、θ 2 、Their corresponding target neural network is +.> π target 。
2) The neural network is initialized. Including parameter value settings such as weights of the neural network, optimizer settings used for updating the neural network, and the like.
3) Training parameters such as the number of training rounds, num_steps of training iteration times per round, playback pool buffer capacity, batch.size, learning rate, discount factors and the like are set.
4) When the number of environmental steps is less than num_sWhen teps is set, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (4) storing the data into a buffer, and circularly executing the step 4).
5) When the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (5) storing the data in a buffer, then starting training the network, and circularly executing the step 5).
The network training process is as follows:
(1) at step t, sample size samples (s, a, r, s) t+1 ) Conversion to a tensor training language
(2) State s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 )
(3) Based on the obtained value of (2), calculate π target Value and calculate neural network q 1 、q 2 And (3) performing gradient updating based on the loss function of pi and the optimizer selected by the neural network.
(4) Based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target 。
6) And stopping network training when the set times of the training round epoode are reached.
The trained multi-agent deep reinforcement learning can directly give out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
Compared with other reinforcement learning methods, the method adopts a multi-agent soft action-critic (MASAC) algorithm, and the entropy is added in an objective function of the operation of the micro-grid, so that the multi-agent of the micro-grid synchronously explores more possible operation optimization strategies in the process of pursuing the maximization of the benefit of the multi-agent of the micro-grid, the problems that the operation optimization problem is easy to fall into an optimal solution, the robustness to the environment change is not strong and the like are overcome. Secondly, in the selection of the multi-agent flexible evaluator-actor algorithm to the multi-agent actions, a sequential iteration method is utilized to obtain the multi-micro-grid optimized operation action strategy, so that the selection of the multi-agent action strategy considers the self operation state information and the operation conditions of other agents, and on the premise of meeting the self benefits, the regional micro-grid operation cost is reduced, and the energy utilization efficiency is improved.
The advantageous effects of the invention are further illustrated in the following preferred embodiments:
the algorithm flow chart is shown in fig. 2.
S1: and constructing a multi-micro-grid collaborative optimization operation system containing 3 micro-grids. The micro-grid selects a California intelligent building system, and specifically comprises photovoltaic power generation, cold-hot electricity energy storage, an electric heater, an electric heat pump and other devices. The data selects 3 intelligent building systems 24 hours of information.
S2: the state space of the multiple micro-grids is constructed, as shown by the state quantity in table 1, each micro-grid comprises 10 state quantities of 1 intelligent building system, and the multi-micro-grid cooperative operation system comprises 30 state quantities in total.
S3: the method comprises the steps of constructing an action space of a plurality of micro-grids, wherein each intelligent agent selects electric energy storage as an action variable, so that the multi-micro-grid cooperative operation system totally comprises 3 action amounts, and the action space is represented by the following formula:
in the formula, subscripts 1-3 represent what number of agents, and the upper elec represents an electric energy storage action.
S4: and constructing a multi-agent reward function. As shown in formula (2), where n=2.
S5: action selection based on sequential iterations.
1) And selecting a GBDT-based neural network estimated power consumption value.
2) The number of iterations is selected to be 100.
S6: network and parameter setting for multi-micro-grid collaborative operation model training based on multi-agent deep reinforcement learning.
1) Status value neural network q 1 、q 2 The method comprises the steps of carrying out a first treatment on the surface of the The policy neural network pi and the corresponding target network thereof all adopt 4 layers of fully connected neural networks,
for the state value neural network, 11 neurons of the input layer correspond to the sum of 10 input environment state variables and 1 action variable; 1 output layer neuron, the state action value of the corresponding output; for a policy neural network, 10 neurons of an input layer correspond to 10 input environment state variables; 2 output layer neurons which correspond to the output action mean; entropy log_prob.
The hidden layer neuron number of the neural network is set to 256 and 256, respectively.
2) The neural network update uses an optimizer Adam, the loss function selects the root mean square.
3) The number of training wheels is set to 300, and 8000 steps are set for each training wheel;
4) The playback pool buffer capacity is set to 10000; the size is set to 256;
5) Learning rate 0.003, discount factor 0.99
After model training is completed, the model starts to verify. The calculation results are shown in fig. 5-6.
The invention discloses a multi-agent flexible evaluator-actor algorithm (MASAC) multi-micro-grid collaborative optimization operation algorithm, which increases entropy in an objective function, improves the capability of the algorithm to explore a better strategy, and enhances the robustness of the collaborative optimization algorithm.
The invention discloses a multi-agent deep reinforcement learning action selection method based on sequential iteration, which not only can protect privacy of each agent, but also can meet benefit requirements of each agent, and realizes multi-agent collaborative optimization operation.
The invention discloses a reinforcement learning rewarding function containing the electricity consumption of each intelligent agent and the electricity consumption of a regionSigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.
Based on the same inventive concept, the embodiment of the present invention further provides a multi-micro-grid collaborative optimization operation device, which can be used to implement a multi-micro-grid collaborative optimization operation method described in the above embodiment, as described in the following embodiments: because the principle of solving the problem of the multi-micro-grid collaborative optimization operation device is similar to that of a multi-micro-grid collaborative optimization operation method, the implementation of the multi-micro-grid collaborative optimization operation device can be referred to the implementation of the multi-micro-grid collaborative optimization operation method, and the repetition is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Referring to fig. 7, the apparatus includes:
the space construction module 200 is configured to construct a state space S and an action space a of a plurality of micro-grids, where each micro-grid is an agent, the state space S includes a plurality of state variables, and each agent selects an electric energy storage as an action variable of the action space a;
the reward function construction module 400 is configured to construct a multi-agent deep reinforcement learning reward function, and guide the multi-agents to cooperatively meet the regional power consumption requirement through rewards, wherein the reward function of each agent includes the outsourcing power of the agent and the component containing information of other agents, namely the power consumption of the whole region.
The multi-micro-grid collaborative optimization operation device constructs a state space S and an action space A of the multi-micro-grid through a space construction module, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A; the rewarding function construction module further constructs a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely the electric quantity consumption of the whole region. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.
The multi-micro-grid collaborative optimization operation device of the invention is described in a preferred embodiment as follows:
referring to fig. 8, the operation environment construction module 100 is specifically configured to construct a multi-micro grid collaborative optimization operation environment, where the operation environment includes renewable energy sources, energy storage devices, energy conversion devices, and other operation parameters contained in the micro grid, electricity prices, carbon emission prices, and the like.
The state space construction module 201 is configured to construct a multi-agent deep reinforcement learning state space S, where variables included in S are as follows:
TABLE 1 State space variable constitution
The action space construction module 202 is configured to construct a multi-agent deep reinforcement learning action space a, as shown in the following formula:
A={a elec ,a cooling ,a heating } (1)
wherein a is elec ,a cooling ,a heating The device is used for charging and discharging the electric energy storage device, the cold energy storage device and the hot energy storage device. a epsilon [ -1,1]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.
The reward function construction module 400 is configured to construct a multi-agent deep reinforcement learning reward function, as shown in the following formula:
in the formula e i Represents the outsourcing electricity quantity of the ith agent, and n represents the number of agents. For each agent, the reward function contains its own outsourcing power e i And a component Σe containing other agent information i I.e. the power consumption of the whole area. Sigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.
The flow design module 500 is configured to design a multi-agent deep reinforcement learning action selection flow based on sequential iteration, as shown in fig. 2:
1) Randomly sequencing all agents;
2) First agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations,
3) Estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
4) Next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
5) When m=k, i.e. iterate k times, the multi-agent finally outputs a set of action sequences
6) Action sequenceAnd outputting.
Training module 600, specifically a multi-agent deep reinforcement learning training process, as shown in FIG. 3
1) 6 neural networks are built, namely a state value neural network q 1 、q 2 The method comprises the steps of carrying out a first treatment on the surface of the Policy neural network pi. The parameters are respectively as follows: θ 1 、θ 2 、Their corresponding target neural network is +.> π target 。
2) The neural network is initialized. Including parameter value settings such as weights of the neural network, optimizer settings used for updating the neural network, and the like.
3) Training parameters such as the number of training rounds, num_steps of training iteration times per round, playback pool buffer capacity, batch.size, learning rate, discount factors and the like are set.
4) When the number of environmental steps is smaller than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (4) storing the data into a buffer, and circularly executing the step 4).
5) When the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (5) storing the data in a buffer, then starting training the network, and circularly executing the step 5).
The network training process is as follows:
(1) at step t, sample size samples (s, a, r, s) t+1 ) Conversion to a tensor training language
(2) State s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 )
(3) Based on the obtained value of (2), calculate π target Value and calculate neural network q 1 、q 2 And (3) performing gradient updating based on the loss function of pi and the optimizer selected by the neural network.
(4) Based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target 。
6) And stopping network training when the set times of the training round epoode are reached.
The trained multi-agent deep reinforcement learning can directly give out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
The embodiment of the present invention also provides a computer electronic device, fig. 9 shows a schematic diagram of a structure of an electronic device to which the embodiment of the present invention can be applied, and as shown in fig. 9, the computer electronic device includes a central processing module (CPU) 901 which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 910 as needed, so that a computer program read out therefrom is installed into the storage section 908 as needed.
As another aspect, the present invention further provides a computer readable storage medium, which may be a computer readable storage medium included in the multi-microgrid collaborative optimization operation device in the above embodiment; or may be a computer-readable storage medium, alone, that is not incorporated into an electronic device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the multi-microgrid co-optimization operation methods described in the present invention.
It will be appreciated by those skilled in the art that the present invention can be carried out in other embodiments without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosed embodiments are illustrative in all respects, and not exclusive. All changes that come within the scope of the invention or equivalents thereto are intended to be embraced therein.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.
Claims (10)
1. The multi-micro-grid collaborative optimization operation method is characterized by comprising the following steps of:
constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A;
and constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
2. The collaborative optimization operation method of claim 1, wherein the action space a is represented by the following formula:
A={a elec ,a cooling ,a heating }
wherein a is elec ,a cooling ,a heating The charging and discharging actions of the electric energy storage device, the cold energy storage device and the hot energy storage device are performed; a epsilon [ -1,1]Indicating 100 x a percent charging (+)/discharging (-) of the current capacity.
3. The collaborative optimization operation method of claim 2, wherein the multi-agent deep reinforcement learning reward function is represented by the following formula:
in the formula e i Represents the outsourcing electricity quantity of the ith intelligent agent, n represents the number of the intelligent agents, and the rewarding function of each intelligent agent comprises the outsourcing electricity quantity e of the intelligent agent i And a component Σe containing other agent information i 。
4. The collaborative optimization operation method according to claim 3, further comprising: multiple agent deep reinforcement learning action selection flow based on sequential iteration:
randomly sequencing all agents;
first agent selectionActionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations;
estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
When m=k, the agent finally outputs a group of action sequences
5. The collaborative optimization operation method according to claim 4, wherein the multi-agent deep reinforcement learning training process comprises:
constructing a neural network comprising a state value neural network q 1 、q 2 The policy neural network pi and the corresponding target neural network areπ target ;
Setting training round number epoode, training iteration number num_steps of each round, playback pool buffer capacity, batch.size, learning rate and discount factor based on the constructed neural network;
when the number of environmental steps is smaller than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) Storing into buffer, and executing this step circularly;
when the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And storing the data into a buffer, starting training the network, and circularly executing the step.
6. The collaborative optimization operation method according to claim 5, wherein,
sampling of batch size samples from buffer (s, a, r, s t+1 ) Converting into a tensor training language;
state s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 );
Based on the obtained values, calculateπ target Value and calculate neural network q 1 、q 2 A pi loss function, and performing gradient update based on an optimizer selected by the neural network;
based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target ;
Stopping network training when the set times of the training round ep are reached;
the trained multi-agent deep reinforcement learning directly gives out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
7. A multi-microgrid collaborative optimization operation device, characterized in that the device comprises:
the space construction module is used for constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;
and the rewarding function construction module is used for constructing a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
8. The multi-microgrid co-optimal operation device according to claim 7, wherein said space construction module comprises:
the action space construction module is used for constructing a multi-agent deep reinforcement learning action space A, and the action space A is shown as the following formula:
A={a elec ,a cooling ,a heating } (1)
wherein a is elec ,a cooling ,a heating Charging and discharging actions a E < -1 >, 1 for electric energy storage, cold energy storage and hot energy storage devices]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.
9. An apparatus comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the computer program, implements the method of any of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311315801.2A CN117374937A (en) | 2023-10-11 | 2023-10-11 | Multi-micro-grid collaborative optimization operation method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311315801.2A CN117374937A (en) | 2023-10-11 | 2023-10-11 | Multi-micro-grid collaborative optimization operation method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117374937A true CN117374937A (en) | 2024-01-09 |
Family
ID=89405262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311315801.2A Pending CN117374937A (en) | 2023-10-11 | 2023-10-11 | Multi-micro-grid collaborative optimization operation method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117374937A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117808174A (en) * | 2024-03-01 | 2024-04-02 | 山东大学 | Micro-grid operation optimization method and system based on reinforcement learning under network attack |
CN117997152A (en) * | 2024-04-03 | 2024-05-07 | 深圳市德兰明海新能源股份有限公司 | Bottom layer control method of modularized multi-level converter based on reinforcement learning |
-
2023
- 2023-10-11 CN CN202311315801.2A patent/CN117374937A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117808174A (en) * | 2024-03-01 | 2024-04-02 | 山东大学 | Micro-grid operation optimization method and system based on reinforcement learning under network attack |
CN117808174B (en) * | 2024-03-01 | 2024-05-28 | 山东大学 | Micro-grid operation optimization method and system based on reinforcement learning under network attack |
CN117997152A (en) * | 2024-04-03 | 2024-05-07 | 深圳市德兰明海新能源股份有限公司 | Bottom layer control method of modularized multi-level converter based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112186799B (en) | Distributed energy system autonomous control method and system based on deep reinforcement learning | |
Wang et al. | Deep reinforcement learning method for demand response management of interruptible load | |
Zheng et al. | Optimal chiller loading by improved invasive weed optimization algorithm for reducing energy consumption | |
Guo et al. | Optimal energy management of multi-microgrids connected to distribution system based on deep reinforcement learning | |
CN112614009B (en) | Power grid energy management method and system based on deep expectation Q-learning | |
Wang et al. | Virtual power plant containing electric vehicles scheduling strategies based on deep reinforcement learning | |
Wan et al. | Residential energy management with deep reinforcement learning | |
Bai et al. | Double-layer staged training echo-state networks for wind speed prediction using variational mode decomposition | |
Andervazh et al. | Emission‐economic dispatch of thermal power generation units in the presence of hybrid electric vehicles and correlated wind power plants | |
CN117374937A (en) | Multi-micro-grid collaborative optimization operation method, device, equipment and medium | |
CN104636985A (en) | Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network | |
CN112491094B (en) | Hybrid-driven micro-grid energy management method, system and device | |
CN106779177A (en) | Multiresolution wavelet neutral net electricity demand forecasting method based on particle group optimizing | |
Wu et al. | Strategic bidding in a competitive electricity market: An intelligent method using Multi-Agent Transfer Learning based on reinforcement learning | |
Dinh et al. | Supervised-learning-based hour-ahead demand response for a behavior-based home energy management system approximating MILP optimization | |
CN117057553A (en) | Deep reinforcement learning-based household energy demand response optimization method and system | |
Zhang et al. | A cooperative EV charging scheduling strategy based on double deep Q-network and Prioritized experience replay | |
CN115714382A (en) | Active power distribution network real-time scheduling method and device based on security reinforcement learning | |
CN115409645A (en) | Comprehensive energy system energy management method based on improved deep reinforcement learning | |
Zhang et al. | Physical-model-free intelligent energy management for a grid-connected hybrid wind-microturbine-PV-EV energy system via deep reinforcement learning approach | |
Nourianfar et al. | Economic emission dispatch considering electric vehicles and wind power using enhanced multi-objective exchange market algorithm | |
Dou et al. | Double‐deck optimal schedule of micro‐grid based on demand‐side response | |
Zhang et al. | A double-deck deep reinforcement learning-based energy dispatch strategy for an integrated electricity and district heating system embedded with thermal inertial and operational flexibility | |
Pandian et al. | Solving Economic Load Dispatch ProblemConsidering Transmission Losses by a HybridEP-EPSO Algorithm for Solving both Smoothand Non-Smooth Cost Function | |
CN113555888B (en) | Micro-grid energy storage coordination control method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |