CN117374937A - Multi-micro-grid collaborative optimization operation method, device, equipment and medium - Google Patents

Multi-micro-grid collaborative optimization operation method, device, equipment and medium Download PDF

Info

Publication number
CN117374937A
CN117374937A CN202311315801.2A CN202311315801A CN117374937A CN 117374937 A CN117374937 A CN 117374937A CN 202311315801 A CN202311315801 A CN 202311315801A CN 117374937 A CN117374937 A CN 117374937A
Authority
CN
China
Prior art keywords
agent
action
micro
space
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311315801.2A
Other languages
Chinese (zh)
Inventor
赵琦
乔骥
陈予尧
陈盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electric Power Research Institute Co Ltd CEPRI
Original Assignee
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electric Power Research Institute Co Ltd CEPRI filed Critical China Electric Power Research Institute Co Ltd CEPRI
Priority to CN202311315801.2A priority Critical patent/CN117374937A/en
Publication of CN117374937A publication Critical patent/CN117374937A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/004Generation forecast, e.g. methods or systems for forecasting future energy generation
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Power Engineering (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Biomedical Technology (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a multi-micro-grid collaborative optimization operation method, a device, equipment and a medium, which are characterized in that a state space S and an action space A of a multi-micro-grid are constructed, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the plurality of intelligent agents, and the plurality of intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, so that the energy consumption cost is reduced.

Description

Multi-micro-grid collaborative optimization operation method, device, equipment and medium
Technical Field
The invention belongs to the field of micro-grid collaborative optimization operation, and particularly relates to a multi-micro-grid collaborative optimization operation method, device, equipment and medium.
Background
The micro-grid comprises a plurality of distributed energy sources and energy storage systems, and can realize gradient utilization of energy and finally realize high-efficiency and flexible utilization of energy by proper system configuration and proper operation strategy on the premise of meeting load requirements. Each micro-grid is cooperated and assisted by energy among areas, so that the running cost of the system is reduced, and the energy utilization efficiency is improved.
In the prior art, for a multi-micro-grid group, each micro-grid belongs to different benefit subjects, and in the multi-micro-grid collaborative optimization operation process, how to collaboratively meet regional power consumption requirements, so that the energy consumption cost is reduced, and the problem of multi-micro-grid collaborative operation is solved.
Disclosure of Invention
The invention provides a multi-micro-grid collaborative optimization operation method, device, equipment and medium, which aim to collaborative meet regional power consumption requirements and reduce energy consumption cost.
In a first aspect, the present invention provides a multi-micro grid collaborative optimization operation method, where the collaborative optimization operation method includes:
constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A;
and constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
Preferably, the motion space a is represented by the following formula:
A={a elec ,a cooling ,a heating }
wherein a is elec ,a cooling ,a heating The charging and discharging actions of the electric energy storage device, the cold energy storage device and the hot energy storage device are performed; a epsilon [ -1,1]Indicating 100 x a percent charging (+)/discharging (-) of the current capacity.
Preferably, the multi-agent deep reinforcement learning reward function is as follows:
in the formula e i Represents the outsourcing electricity quantity of the ith intelligent agent, n represents the number of the intelligent agents, and the rewarding function of each intelligent agent comprises the outsourcing electricity quantity e of the intelligent agent i And a component Σe containing other agent information i
Preferably, the collaborative optimization operation method further comprises: multiple agent deep reinforcement learning action selection flow based on sequential iteration:
randomly sequencing all agents;
first agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations;
estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
When m=k, the multi-agent finally outputs a group of action sequences
Preferably, the multi-agent deep reinforcement learning training process includes:
constructing a neural network comprising a state value neural network q 1 、q 2 The policy neural network pi and the corresponding target neural network are π target
Setting training round number epoode, training iteration number num_steps of each round, playback pool buffer capacity, batch.size, learning rate and discount factor based on the constructed neural network;
when the number of environmental steps is smaller than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) Storing into buffer, and executing this step circularly;
when the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And storing the data into a buffer, starting training the network, and circularly executing the step.
Wherein,
sampling of batch size samples from buffer (s, a, r, s t+1 ) Converting into a tensor training language;
state s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 );
Based on the obtained values, calculate π target Value and calculate neural network q 1 、q 2 A pi loss function, and performing gradient update based on an optimizer selected by the neural network;
based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target
Stopping network training when the set times of the training round ep are reached;
the trained multi-agent deep reinforcement learning directly gives out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
In a second aspect, an embodiment of the present invention provides a device for collaborative optimization operation of multiple micro-grids, where the device includes:
the space construction module is used for constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;
and the rewarding function construction module is used for constructing a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
In a third aspect, an embodiment of the invention provides an electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method according to any of the first aspects when executing the program.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the first aspects.
Advantageous effects
The embodiment of the invention provides a multi-micro-grid collaborative optimization operation method, a device, equipment and a medium, which are implemented by constructing a state space S and an action space A of the multi-micro-grid, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
fig. 1 is a flowchart of a multi-micro grid collaborative optimization operation method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method of collaborative optimization operation for multiple micro-grids according to an embodiment of the present invention;
FIG. 3 is a flow chart of a multi-agent deep reinforcement learning training process according to an embodiment of the present invention;
FIG. 4 is a flow chart of a multi-agent deep reinforcement learning action selection process based on sequential iteration in accordance with an embodiment of the present invention;
FIG. 5 is a graph showing convergence of the reward function of the algorithm training result according to the embodiment of the present invention;
fig. 6 is a graph showing an energy storage charging and discharging operation of a system including 3 multi-energy micro-grid co-operation for 24 hours according to an embodiment of the present invention;
fig. 7 is a block diagram of a multi-micro grid collaborative optimization operation device according to an embodiment of the present invention;
fig. 8 is a block diagram of another multi-micro grid collaborative optimization operation device according to an embodiment of the present invention;
fig. 9 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings in connection with embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The following detailed description is exemplary and is intended to provide further details of the invention. Unless defined otherwise, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the invention.
Multi-agent deep reinforcement learning is an optimization method based on artificial intelligence technology. According to the method, each micro-grid is regarded as an agent, and the information interaction and cooperative operation mechanism among the agents is designed, so that the benefit requirements of each micro-grid are met on the premise of protecting the data privacy, and the cooperative and optimal operation of multiple micro-grids is finally realized. Meanwhile, the multi-agent deep reinforcement learning is a data driving method, an accurate system model is not required to be constructed based on a physical formula, and an optimal control strategy is obtained under the guidance of a reward and punishment mechanism through interaction with the environment, so that the problems of high dimension, multiple parameters, nonlinearity and the like in the traditional optimization method are solved.
Referring to fig. 1, an embodiment of the present invention provides a method for collaborative optimization operation of multiple micro-grids, which specifically includes:
s20, constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;
s40, constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely the electric quantity consumption of the whole region.
By constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.
The following describes the advantageous effects of the present invention by way of a specific example:
please refer to fig. 2:
s1, constructing a multi-micro-grid collaborative optimization operation environment, wherein the multi-micro-grid collaborative optimization operation environment comprises operation parameters such as renewable energy sources, energy storage devices, energy conversion devices and the like contained in the micro-grid, electricity prices, carbon emission prices and the like.
S2, constructing a multi-agent deep reinforcement learning state space S, wherein variables contained in the S are shown in the following table:
TABLE 1 State space variable constitution
Sequence number Amount of state space Sequence number Amount of state space
1 Date of day 6 Carbon emission unit price
2 External electricity purchasing and selling amount of micro-grid 7 Current electricity price and 24-hour predictive value
3 Energy storage state SOC 8 Predicted value of temperature and 24 hours
4 Electric load 9 Humidity and 24-hour predictive value
5 Photovoltaic output 10 Solar radiation and 24-hour prediction value
S3, constructing a multi-agent deep reinforcement learning action space A, wherein the action space A is shown in the following formula:
A={a elec ,a cooling ,a heating } (1)
wherein a is elec ,a cooling ,a heating The device is used for charging and discharging the electric energy storage device, the cold energy storage device and the hot energy storage device. a epsilon [ -1,1]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.
S4, constructing a multi-agent deep reinforcement learning reward function, wherein the multi-agent deep reinforcement learning reward function is shown in the following formula:
in the formula e i Represents the outsourcing electricity quantity of the ith agent, and n represents the number of agents. For each agent, the reward function contains its own outsourcing power e i And a component Σe containing other agent information i I.e. the whole areaIs a power consumption of the battery. Sigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced. Reinforcement learning reward functions including individual agent electricity consumption and regional electricity consumption are disclosedSigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.
S5, designing a multi-agent deep reinforcement learning action selection flow based on sequential iteration, as shown in FIG. 5
1) Randomly sequencing all agents;
2) First agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations,
3) Estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
4) Next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
5) When m=k, i.e. iterate k times, the multi-agent finally outputs a set of action sequences
6) Action sequenceAnd outputting.
S6, a multi-agent deep reinforcement learning training process is shown in fig. 4
1) 6 neural networks are built, namely a state value neural network q 1 、q 2 The method comprises the steps of carrying out a first treatment on the surface of the Policy neural network pi. The parameters are respectively as follows: θ 1 、θ 2Their corresponding target neural network is +.> π target
2) The neural network is initialized. Including parameter value settings such as weights of the neural network, optimizer settings used for updating the neural network, and the like.
3) Training parameters such as the number of training rounds, num_steps of training iteration times per round, playback pool buffer capacity, batch.size, learning rate, discount factors and the like are set.
4) When the number of environmental steps is less than num_sWhen teps is set, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (4) storing the data into a buffer, and circularly executing the step 4).
5) When the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (5) storing the data in a buffer, then starting training the network, and circularly executing the step 5).
The network training process is as follows:
(1) at step t, sample size samples (s, a, r, s) t+1 ) Conversion to a tensor training language
(2) State s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 )
(3) Based on the obtained value of (2), calculate π target Value and calculate neural network q 1 、q 2 And (3) performing gradient updating based on the loss function of pi and the optimizer selected by the neural network.
(4) Based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target
6) And stopping network training when the set times of the training round epoode are reached.
The trained multi-agent deep reinforcement learning can directly give out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
Compared with other reinforcement learning methods, the method adopts a multi-agent soft action-critic (MASAC) algorithm, and the entropy is added in an objective function of the operation of the micro-grid, so that the multi-agent of the micro-grid synchronously explores more possible operation optimization strategies in the process of pursuing the maximization of the benefit of the multi-agent of the micro-grid, the problems that the operation optimization problem is easy to fall into an optimal solution, the robustness to the environment change is not strong and the like are overcome. Secondly, in the selection of the multi-agent flexible evaluator-actor algorithm to the multi-agent actions, a sequential iteration method is utilized to obtain the multi-micro-grid optimized operation action strategy, so that the selection of the multi-agent action strategy considers the self operation state information and the operation conditions of other agents, and on the premise of meeting the self benefits, the regional micro-grid operation cost is reduced, and the energy utilization efficiency is improved.
The advantageous effects of the invention are further illustrated in the following preferred embodiments:
the algorithm flow chart is shown in fig. 2.
S1: and constructing a multi-micro-grid collaborative optimization operation system containing 3 micro-grids. The micro-grid selects a California intelligent building system, and specifically comprises photovoltaic power generation, cold-hot electricity energy storage, an electric heater, an electric heat pump and other devices. The data selects 3 intelligent building systems 24 hours of information.
S2: the state space of the multiple micro-grids is constructed, as shown by the state quantity in table 1, each micro-grid comprises 10 state quantities of 1 intelligent building system, and the multi-micro-grid cooperative operation system comprises 30 state quantities in total.
S3: the method comprises the steps of constructing an action space of a plurality of micro-grids, wherein each intelligent agent selects electric energy storage as an action variable, so that the multi-micro-grid cooperative operation system totally comprises 3 action amounts, and the action space is represented by the following formula:
in the formula, subscripts 1-3 represent what number of agents, and the upper elec represents an electric energy storage action.
S4: and constructing a multi-agent reward function. As shown in formula (2), where n=2.
S5: action selection based on sequential iterations.
1) And selecting a GBDT-based neural network estimated power consumption value.
2) The number of iterations is selected to be 100.
S6: network and parameter setting for multi-micro-grid collaborative operation model training based on multi-agent deep reinforcement learning.
1) Status value neural network q 1 、q 2 The method comprises the steps of carrying out a first treatment on the surface of the The policy neural network pi and the corresponding target network thereof all adopt 4 layers of fully connected neural networks,
for the state value neural network, 11 neurons of the input layer correspond to the sum of 10 input environment state variables and 1 action variable; 1 output layer neuron, the state action value of the corresponding output; for a policy neural network, 10 neurons of an input layer correspond to 10 input environment state variables; 2 output layer neurons which correspond to the output action mean; entropy log_prob.
The hidden layer neuron number of the neural network is set to 256 and 256, respectively.
2) The neural network update uses an optimizer Adam, the loss function selects the root mean square.
3) The number of training wheels is set to 300, and 8000 steps are set for each training wheel;
4) The playback pool buffer capacity is set to 10000; the size is set to 256;
5) Learning rate 0.003, discount factor 0.99
After model training is completed, the model starts to verify. The calculation results are shown in fig. 5-6.
The invention discloses a multi-agent flexible evaluator-actor algorithm (MASAC) multi-micro-grid collaborative optimization operation algorithm, which increases entropy in an objective function, improves the capability of the algorithm to explore a better strategy, and enhances the robustness of the collaborative optimization algorithm.
The invention discloses a multi-agent deep reinforcement learning action selection method based on sequential iteration, which not only can protect privacy of each agent, but also can meet benefit requirements of each agent, and realizes multi-agent collaborative optimization operation.
The invention discloses a reinforcement learning rewarding function containing the electricity consumption of each intelligent agent and the electricity consumption of a regionSigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.
Based on the same inventive concept, the embodiment of the present invention further provides a multi-micro-grid collaborative optimization operation device, which can be used to implement a multi-micro-grid collaborative optimization operation method described in the above embodiment, as described in the following embodiments: because the principle of solving the problem of the multi-micro-grid collaborative optimization operation device is similar to that of a multi-micro-grid collaborative optimization operation method, the implementation of the multi-micro-grid collaborative optimization operation device can be referred to the implementation of the multi-micro-grid collaborative optimization operation method, and the repetition is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Referring to fig. 7, the apparatus includes:
the space construction module 200 is configured to construct a state space S and an action space a of a plurality of micro-grids, where each micro-grid is an agent, the state space S includes a plurality of state variables, and each agent selects an electric energy storage as an action variable of the action space a;
the reward function construction module 400 is configured to construct a multi-agent deep reinforcement learning reward function, and guide the multi-agents to cooperatively meet the regional power consumption requirement through rewards, wherein the reward function of each agent includes the outsourcing power of the agent and the component containing information of other agents, namely the power consumption of the whole region.
The multi-micro-grid collaborative optimization operation device constructs a state space S and an action space A of the multi-micro-grid through a space construction module, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A; the rewarding function construction module further constructs a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely the electric quantity consumption of the whole region. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.
The multi-micro-grid collaborative optimization operation device of the invention is described in a preferred embodiment as follows:
referring to fig. 8, the operation environment construction module 100 is specifically configured to construct a multi-micro grid collaborative optimization operation environment, where the operation environment includes renewable energy sources, energy storage devices, energy conversion devices, and other operation parameters contained in the micro grid, electricity prices, carbon emission prices, and the like.
The state space construction module 201 is configured to construct a multi-agent deep reinforcement learning state space S, where variables included in S are as follows:
TABLE 1 State space variable constitution
The action space construction module 202 is configured to construct a multi-agent deep reinforcement learning action space a, as shown in the following formula:
A={a elec ,a cooling ,a heating } (1)
wherein a is elec ,a cooling ,a heating The device is used for charging and discharging the electric energy storage device, the cold energy storage device and the hot energy storage device. a epsilon [ -1,1]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.
The reward function construction module 400 is configured to construct a multi-agent deep reinforcement learning reward function, as shown in the following formula:
in the formula e i Represents the outsourcing electricity quantity of the ith agent, and n represents the number of agents. For each agent, the reward function contains its own outsourcing power e i And a component Σe containing other agent information i I.e. the power consumption of the whole area. Sigma e in the bonus function i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.
The flow design module 500 is configured to design a multi-agent deep reinforcement learning action selection flow based on sequential iteration, as shown in fig. 2:
1) Randomly sequencing all agents;
2) First agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations,
3) Estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
4) Next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
5) When m=k, i.e. iterate k times, the multi-agent finally outputs a set of action sequences
6) Action sequenceAnd outputting.
Training module 600, specifically a multi-agent deep reinforcement learning training process, as shown in FIG. 3
1) 6 neural networks are built, namely a state value neural network q 1 、q 2 The method comprises the steps of carrying out a first treatment on the surface of the Policy neural network pi. The parameters are respectively as follows: θ 1 、θ 2Their corresponding target neural network is +.> π target
2) The neural network is initialized. Including parameter value settings such as weights of the neural network, optimizer settings used for updating the neural network, and the like.
3) Training parameters such as the number of training rounds, num_steps of training iteration times per round, playback pool buffer capacity, batch.size, learning rate, discount factors and the like are set.
4) When the number of environmental steps is smaller than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (4) storing the data into a buffer, and circularly executing the step 4).
5) When the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And (5) storing the data in a buffer, then starting training the network, and circularly executing the step 5).
The network training process is as follows:
(1) at step t, sample size samples (s, a, r, s) t+1 ) Conversion to a tensor training language
(2) State s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 )
(3) Based on the obtained value of (2), calculate π target Value and calculate neural network q 1 、q 2 And (3) performing gradient updating based on the loss function of pi and the optimizer selected by the neural network.
(4) Based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target
6) And stopping network training when the set times of the training round epoode are reached.
The trained multi-agent deep reinforcement learning can directly give out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
The embodiment of the present invention also provides a computer electronic device, fig. 9 shows a schematic diagram of a structure of an electronic device to which the embodiment of the present invention can be applied, and as shown in fig. 9, the computer electronic device includes a central processing module (CPU) 901 which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 910 as needed, so that a computer program read out therefrom is installed into the storage section 908 as needed.
As another aspect, the present invention further provides a computer readable storage medium, which may be a computer readable storage medium included in the multi-microgrid collaborative optimization operation device in the above embodiment; or may be a computer-readable storage medium, alone, that is not incorporated into an electronic device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the multi-microgrid co-optimization operation methods described in the present invention.
It will be appreciated by those skilled in the art that the present invention can be carried out in other embodiments without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosed embodiments are illustrative in all respects, and not exclusive. All changes that come within the scope of the invention or equivalents thereto are intended to be embraced therein.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (10)

1. The multi-micro-grid collaborative optimization operation method is characterized by comprising the following steps of:
constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A;
and constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
2. The collaborative optimization operation method of claim 1, wherein the action space a is represented by the following formula:
A={a elec ,a cooling ,a heating }
wherein a is elec ,a cooling ,a heating The charging and discharging actions of the electric energy storage device, the cold energy storage device and the hot energy storage device are performed; a epsilon [ -1,1]Indicating 100 x a percent charging (+)/discharging (-) of the current capacity.
3. The collaborative optimization operation method of claim 2, wherein the multi-agent deep reinforcement learning reward function is represented by the following formula:
in the formula e i Represents the outsourcing electricity quantity of the ith intelligent agent, n represents the number of the intelligent agents, and the rewarding function of each intelligent agent comprises the outsourcing electricity quantity e of the intelligent agent i And a component Σe containing other agent information i
4. The collaborative optimization operation method according to claim 3, further comprising: multiple agent deep reinforcement learning action selection flow based on sequential iteration:
randomly sequencing all agents;
first agent selectionActionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations;
estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;
next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>
When m=k, the agent finally outputs a group of action sequences
5. The collaborative optimization operation method according to claim 4, wherein the multi-agent deep reinforcement learning training process comprises:
constructing a neural network comprising a state value neural network q 1 、q 2 The policy neural network pi and the corresponding target neural network areπ target
Setting training round number epoode, training iteration number num_steps of each round, playback pool buffer capacity, batch.size, learning rate and discount factor based on the constructed neural network;
when the number of environmental steps is smaller than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) Storing into buffer, and executing this step circularly;
when the environmental step number is greater than the set value of num_steps, according to the current state s t Selecting action a t And performs action a t Obtain rewards r t And next environmental state s t+1 Then, the sample (s t ,a t ,r t ,s t+1 ) And storing the data into a buffer, starting training the network, and circularly executing the step.
6. The collaborative optimization operation method according to claim 5, wherein,
sampling of batch size samples from buffer (s, a, r, s t+1 ) Converting into a tensor training language;
state s of acquisition of samples t+1 Inputting a strategy neural network pi to obtain a t+1 And entropy log (pi (a t+1 |s t+1 );
Based on the obtained values, calculateπ target Value and calculate neural network q 1 、q 2 A pi loss function, and performing gradient update based on an optimizer selected by the neural network;
based on Polyak averaging method, neural network q 1 、q 2 Parameter interval transmission of pi to target network π target
Stopping network training when the set times of the training round ep are reached;
the trained multi-agent deep reinforcement learning directly gives out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.
7. A multi-microgrid collaborative optimization operation device, characterized in that the device comprises:
the space construction module is used for constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;
and the rewarding function construction module is used for constructing a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing information of other agents.
8. The multi-microgrid co-optimal operation device according to claim 7, wherein said space construction module comprises:
the action space construction module is used for constructing a multi-agent deep reinforcement learning action space A, and the action space A is shown as the following formula:
A={a elec ,a cooling ,a heating } (1)
wherein a is elec ,a cooling ,a heating Charging and discharging actions a E < -1 >, 1 for electric energy storage, cold energy storage and hot energy storage devices]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.
9. An apparatus comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the computer program, implements the method of any of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any one of claims 1 to 7.
CN202311315801.2A 2023-10-11 2023-10-11 Multi-micro-grid collaborative optimization operation method, device, equipment and medium Pending CN117374937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311315801.2A CN117374937A (en) 2023-10-11 2023-10-11 Multi-micro-grid collaborative optimization operation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311315801.2A CN117374937A (en) 2023-10-11 2023-10-11 Multi-micro-grid collaborative optimization operation method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN117374937A true CN117374937A (en) 2024-01-09

Family

ID=89405262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311315801.2A Pending CN117374937A (en) 2023-10-11 2023-10-11 Multi-micro-grid collaborative optimization operation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN117374937A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808174A (en) * 2024-03-01 2024-04-02 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN117997152A (en) * 2024-04-03 2024-05-07 深圳市德兰明海新能源股份有限公司 Bottom layer control method of modularized multi-level converter based on reinforcement learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808174A (en) * 2024-03-01 2024-04-02 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN117808174B (en) * 2024-03-01 2024-05-28 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN117997152A (en) * 2024-04-03 2024-05-07 深圳市德兰明海新能源股份有限公司 Bottom layer control method of modularized multi-level converter based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN112186799B (en) Distributed energy system autonomous control method and system based on deep reinforcement learning
Wang et al. Deep reinforcement learning method for demand response management of interruptible load
Zheng et al. Optimal chiller loading by improved invasive weed optimization algorithm for reducing energy consumption
Guo et al. Optimal energy management of multi-microgrids connected to distribution system based on deep reinforcement learning
CN112614009B (en) Power grid energy management method and system based on deep expectation Q-learning
Wang et al. Virtual power plant containing electric vehicles scheduling strategies based on deep reinforcement learning
Wan et al. Residential energy management with deep reinforcement learning
Bai et al. Double-layer staged training echo-state networks for wind speed prediction using variational mode decomposition
Andervazh et al. Emission‐economic dispatch of thermal power generation units in the presence of hybrid electric vehicles and correlated wind power plants
CN117374937A (en) Multi-micro-grid collaborative optimization operation method, device, equipment and medium
CN104636985A (en) Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network
CN112491094B (en) Hybrid-driven micro-grid energy management method, system and device
CN106779177A (en) Multiresolution wavelet neutral net electricity demand forecasting method based on particle group optimizing
Wu et al. Strategic bidding in a competitive electricity market: An intelligent method using Multi-Agent Transfer Learning based on reinforcement learning
Dinh et al. Supervised-learning-based hour-ahead demand response for a behavior-based home energy management system approximating MILP optimization
CN117057553A (en) Deep reinforcement learning-based household energy demand response optimization method and system
Zhang et al. A cooperative EV charging scheduling strategy based on double deep Q-network and Prioritized experience replay
CN115714382A (en) Active power distribution network real-time scheduling method and device based on security reinforcement learning
CN115409645A (en) Comprehensive energy system energy management method based on improved deep reinforcement learning
Zhang et al. Physical-model-free intelligent energy management for a grid-connected hybrid wind-microturbine-PV-EV energy system via deep reinforcement learning approach
Nourianfar et al. Economic emission dispatch considering electric vehicles and wind power using enhanced multi-objective exchange market algorithm
Dou et al. Double‐deck optimal schedule of micro‐grid based on demand‐side response
Zhang et al. A double-deck deep reinforcement learning-based energy dispatch strategy for an integrated electricity and district heating system embedded with thermal inertial and operational flexibility
Pandian et al. Solving Economic Load Dispatch ProblemConsidering Transmission Losses by a HybridEP-EPSO Algorithm for Solving both Smoothand Non-Smooth Cost Function
CN113555888B (en) Micro-grid energy storage coordination control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination