CN117374937A

CN117374937A - Multi-micro-grid collaborative optimization operation method, device, equipment and medium

Info

Publication number: CN117374937A
Application number: CN202311315801.2A
Authority: CN
Inventors: 赵琦; 乔骥; 陈予尧; 陈盛
Original assignee: China Electric Power Research Institute Co Ltd CEPRI
Current assignee: China Electric Power Research Institute Co Ltd CEPRI
Priority date: 2023-10-11
Filing date: 2023-10-11
Publication date: 2024-01-09

Abstract

The invention provides a multi-micro-grid collaborative optimization operation method, a device, equipment and a medium, which are characterized in that a state space S and an action space A of a multi-micro-grid are constructed, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the plurality of intelligent agents, and the plurality of intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, so that the energy consumption cost is reduced.

Description

Multi-micro-grid collaborative optimization operation method, device, equipment and medium

Technical Field

The invention belongs to the field of micro-grid collaborative optimization operation, and particularly relates to a multi-micro-grid collaborative optimization operation method, device, equipment and medium.

Background

The micro-grid comprises a plurality of distributed energy sources and energy storage systems, and can realize gradient utilization of energy and finally realize high-efficiency and flexible utilization of energy by proper system configuration and proper operation strategy on the premise of meeting load requirements. Each micro-grid is cooperated and assisted by energy among areas, so that the running cost of the system is reduced, and the energy utilization efficiency is improved.

In the prior art, for a multi-micro-grid group, each micro-grid belongs to different benefit subjects, and in the multi-micro-grid collaborative optimization operation process, how to collaboratively meet regional power consumption requirements, so that the energy consumption cost is reduced, and the problem of multi-micro-grid collaborative operation is solved.

Disclosure of Invention

The invention provides a multi-micro-grid collaborative optimization operation method, device, equipment and medium, which aim to collaborative meet regional power consumption requirements and reduce energy consumption cost.

In a first aspect, the present invention provides a multi-micro grid collaborative optimization operation method, where the collaborative optimization operation method includes:

constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A;

and constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing information of other agents.

Preferably, the motion space a is represented by the following formula:

A＝{a ^elec ,a ^cooling ,a ^heating }

wherein a is ^elec ,a ^cooling ,a ^heating The charging and discharging actions of the electric energy storage device, the cold energy storage device and the hot energy storage device are performed; a epsilon [ -1,1]Indicating 100 x a percent charging (+)/discharging (-) of the current capacity.

Preferably, the multi-agent deep reinforcement learning reward function is as follows:

in the formula e _i Represents the outsourcing electricity quantity of the ith intelligent agent, n represents the number of the intelligent agents, and the rewarding function of each intelligent agent comprises the outsourcing electricity quantity e of the intelligent agent _i And a component Σe containing other agent information _i 。

Preferably, the collaborative optimization operation method further comprises: multiple agent deep reinforcement learning action selection flow based on sequential iteration:

randomly sequencing all agents;

first agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations;

estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;

next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>

When m=k, the multi-agent finally outputs a group of action sequences

Preferably, the multi-agent deep reinforcement learning training process includes:

constructing a neural network comprising a state value neural network q ₁ 、q ₂ The policy neural network pi and the corresponding target neural network are π ^target ；

Setting training round number epoode, training iteration number num_steps of each round, playback pool buffer capacity, batch.size, learning rate and discount factor based on the constructed neural network;

when the number of environmental steps is smaller than the set value of num_steps, according to the current state s _t Selecting action a _t And performs action a _t Obtain rewards r _t And next environmental state s _t+1 Then, the sample (s _t ,a _t ,r _t ,s _t+1 ) Storing into buffer, and executing this step circularly;

when the environmental step number is greater than the set value of num_steps, according to the current state s _t Selecting action a _t And performs action a _t Obtain rewards r _t And next environmental state s _t+1 Then, the sample (s _t ,a _t ,r _t ,s _t+1 ) And storing the data into a buffer, starting training the network, and circularly executing the step.

Wherein,

sampling of batch size samples from buffer (s, a, r, s _t+1 ) Converting into a tensor training language;

state s of acquisition of samples _t+1 Inputting a strategy neural network pi to obtain a _t+1 And entropy log (pi (a _t+1 |s _t+1 )；

Based on the obtained values, calculate π ^target Value and calculate neural network q ₁ 、q ₂ A pi loss function, and performing gradient update based on an optimizer selected by the neural network;

based on Polyak averaging method, neural network q ₁ 、q ₂ Parameter interval transmission of pi to target network π ^target ；

Stopping network training when the set times of the training round ep are reached;

the trained multi-agent deep reinforcement learning directly gives out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.

In a second aspect, an embodiment of the present invention provides a device for collaborative optimization operation of multiple micro-grids, where the device includes:

the space construction module is used for constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;

and the rewarding function construction module is used for constructing a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing information of other agents.

In a third aspect, an embodiment of the invention provides an electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method according to any of the first aspects when executing the program.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the first aspects.

Advantageous effects

The embodiment of the invention provides a multi-micro-grid collaborative optimization operation method, a device, equipment and a medium, which are implemented by constructing a state space S and an action space A of the multi-micro-grid, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:

fig. 1 is a flowchart of a multi-micro grid collaborative optimization operation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method of collaborative optimization operation for multiple micro-grids according to an embodiment of the present invention;

FIG. 3 is a flow chart of a multi-agent deep reinforcement learning training process according to an embodiment of the present invention;

FIG. 4 is a flow chart of a multi-agent deep reinforcement learning action selection process based on sequential iteration in accordance with an embodiment of the present invention;

FIG. 5 is a graph showing convergence of the reward function of the algorithm training result according to the embodiment of the present invention;

fig. 6 is a graph showing an energy storage charging and discharging operation of a system including 3 multi-energy micro-grid co-operation for 24 hours according to an embodiment of the present invention;

fig. 7 is a block diagram of a multi-micro grid collaborative optimization operation device according to an embodiment of the present invention;

fig. 8 is a block diagram of another multi-micro grid collaborative optimization operation device according to an embodiment of the present invention;

fig. 9 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail below with reference to the drawings in connection with embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The following detailed description is exemplary and is intended to provide further details of the invention. Unless defined otherwise, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the invention.

Multi-agent deep reinforcement learning is an optimization method based on artificial intelligence technology. According to the method, each micro-grid is regarded as an agent, and the information interaction and cooperative operation mechanism among the agents is designed, so that the benefit requirements of each micro-grid are met on the premise of protecting the data privacy, and the cooperative and optimal operation of multiple micro-grids is finally realized. Meanwhile, the multi-agent deep reinforcement learning is a data driving method, an accurate system model is not required to be constructed based on a physical formula, and an optimal control strategy is obtained under the guidance of a reward and punishment mechanism through interaction with the environment, so that the problems of high dimension, multiple parameters, nonlinearity and the like in the traditional optimization method are solved.

Referring to fig. 1, an embodiment of the present invention provides a method for collaborative optimization operation of multiple micro-grids, which specifically includes:

s20, constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A;

s40, constructing a multi-agent deep reinforcement learning reward function, and guiding the multi-agent to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely the electric quantity consumption of the whole region.

By constructing a state space S and an action space A of a plurality of micro-grids, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as the action variable of the action space A; the multi-agent deep reinforcement learning reward function is further constructed, the multi-agent is guided to cooperatively meet regional power consumption requirements through rewards, wherein the reward function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely regional overall electric quantity consumption. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.

The following describes the advantageous effects of the present invention by way of a specific example:

please refer to fig. 2:

s1, constructing a multi-micro-grid collaborative optimization operation environment, wherein the multi-micro-grid collaborative optimization operation environment comprises operation parameters such as renewable energy sources, energy storage devices, energy conversion devices and the like contained in the micro-grid, electricity prices, carbon emission prices and the like.

S2, constructing a multi-agent deep reinforcement learning state space S, wherein variables contained in the S are shown in the following table:

TABLE 1 State space variable constitution

Sequence number	Amount of state space	Sequence number	Amount of state space
				1	Date of day	6	Carbon emission unit price
2	External electricity purchasing and selling amount of micro-grid	7	Current electricity price and 24-hour predictive value
				3	Energy storage state SOC	8	Predicted value of temperature and 24 hours
4	Electric load	9	Humidity and 24-hour predictive value
				5	Photovoltaic output	10	Solar radiation and 24-hour prediction value

S3, constructing a multi-agent deep reinforcement learning action space A, wherein the action space A is shown in the following formula:

A＝{a ^elec ,a ^cooling ,a ^heating } (1)

wherein a is ^elec ,a ^cooling ,a ^heating The device is used for charging and discharging the electric energy storage device, the cold energy storage device and the hot energy storage device. a epsilon [ -1,1]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.

S4, constructing a multi-agent deep reinforcement learning reward function, wherein the multi-agent deep reinforcement learning reward function is shown in the following formula:

in the formula e _i Represents the outsourcing electricity quantity of the ith agent, and n represents the number of agents. For each agent, the reward function contains its own outsourcing power e _i And a component Σe containing other agent information _i I.e. the whole areaIs a power consumption of the battery. Sigma e in the bonus function _i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced. Reinforcement learning reward functions including individual agent electricity consumption and regional electricity consumption are disclosedSigma e in the bonus function _i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.

S5, designing a multi-agent deep reinforcement learning action selection flow based on sequential iteration, as shown in FIG. 5

1) Randomly sequencing all agents;

2) First agent selection actionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations,

3) Estimating the consumed electric energy of the first agent under actionInformation is shared to the next agent;

4) Next agent selection actionAnd will->And consumption power information shared by the last agent +.>The common input gradient-lifting tree GBDT predicts the electric energy to be consumed in this action>

5) When m=k, i.e. iterate k times, the multi-agent finally outputs a set of action sequences

6) Action sequenceAnd outputting.

S6, a multi-agent deep reinforcement learning training process is shown in fig. 4

1) 6 neural networks are built, namely a state value neural network q ₁ 、q ₂ The method comprises the steps of carrying out a first treatment on the surface of the Policy neural network pi. The parameters are respectively as follows: θ ₁ 、θ ₂ 、Their corresponding target neural network is +.> π ^target 。

2) The neural network is initialized. Including parameter value settings such as weights of the neural network, optimizer settings used for updating the neural network, and the like.

3) Training parameters such as the number of training rounds, num_steps of training iteration times per round, playback pool buffer capacity, batch.size, learning rate, discount factors and the like are set.

4) When the number of environmental steps is less than num_sWhen teps is set, according to the current state s _t Selecting action a _t And performs action a _t Obtain rewards r _t And next environmental state s _t+1 Then, the sample (s _t ,a _t ,r _t ,s _t+1 ) And (4) storing the data into a buffer, and circularly executing the step 4).

5) When the environmental step number is greater than the set value of num_steps, according to the current state s _t Selecting action a _t And performs action a _t Obtain rewards r _t And next environmental state s _t+1 Then, the sample (s _t ,a _t ,r _t ,s _t+1 ) And (5) storing the data in a buffer, then starting training the network, and circularly executing the step 5).

The network training process is as follows:

(1) at step t, sample size samples (s, a, r, s) _t+1 ) Conversion to a tensor training language

(2) State s of acquisition of samples _t+1 Inputting a strategy neural network pi to obtain a _t+1 And entropy log (pi (a _t+1 |s _t+1 )

(3) Based on the obtained value of (2), calculate π ^target Value and calculate neural network q ₁ 、q ₂ And (3) performing gradient updating based on the loss function of pi and the optimizer selected by the neural network.

(4) Based on Polyak averaging method, neural network q ₁ 、q ₂ Parameter interval transmission of pi to target network π ^target 。

6) And stopping network training when the set times of the training round epoode are reached.

The trained multi-agent deep reinforcement learning can directly give out cooperative operation actions when the state of the multi-micro power grid is given, and a strategy is generated.

Compared with other reinforcement learning methods, the method adopts a multi-agent soft action-critic (MASAC) algorithm, and the entropy is added in an objective function of the operation of the micro-grid, so that the multi-agent of the micro-grid synchronously explores more possible operation optimization strategies in the process of pursuing the maximization of the benefit of the multi-agent of the micro-grid, the problems that the operation optimization problem is easy to fall into an optimal solution, the robustness to the environment change is not strong and the like are overcome. Secondly, in the selection of the multi-agent flexible evaluator-actor algorithm to the multi-agent actions, a sequential iteration method is utilized to obtain the multi-micro-grid optimized operation action strategy, so that the selection of the multi-agent action strategy considers the self operation state information and the operation conditions of other agents, and on the premise of meeting the self benefits, the regional micro-grid operation cost is reduced, and the energy utilization efficiency is improved.

The advantageous effects of the invention are further illustrated in the following preferred embodiments:

the algorithm flow chart is shown in fig. 2.

S1: and constructing a multi-micro-grid collaborative optimization operation system containing 3 micro-grids. The micro-grid selects a California intelligent building system, and specifically comprises photovoltaic power generation, cold-hot electricity energy storage, an electric heater, an electric heat pump and other devices. The data selects 3 intelligent building systems 24 hours of information.

S2: the state space of the multiple micro-grids is constructed, as shown by the state quantity in table 1, each micro-grid comprises 10 state quantities of 1 intelligent building system, and the multi-micro-grid cooperative operation system comprises 30 state quantities in total.

S3: the method comprises the steps of constructing an action space of a plurality of micro-grids, wherein each intelligent agent selects electric energy storage as an action variable, so that the multi-micro-grid cooperative operation system totally comprises 3 action amounts, and the action space is represented by the following formula:

in the formula, subscripts 1-3 represent what number of agents, and the upper elec represents an electric energy storage action.

S4: and constructing a multi-agent reward function. As shown in formula (2), where n=2.

S5: action selection based on sequential iterations.

1) And selecting a GBDT-based neural network estimated power consumption value.

2) The number of iterations is selected to be 100.

S6: network and parameter setting for multi-micro-grid collaborative operation model training based on multi-agent deep reinforcement learning.

1) Status value neural network q ₁ 、q ₂ The method comprises the steps of carrying out a first treatment on the surface of the The policy neural network pi and the corresponding target network thereof all adopt 4 layers of fully connected neural networks,

for the state value neural network, 11 neurons of the input layer correspond to the sum of 10 input environment state variables and 1 action variable; 1 output layer neuron, the state action value of the corresponding output; for a policy neural network, 10 neurons of an input layer correspond to 10 input environment state variables; 2 output layer neurons which correspond to the output action mean; entropy log_prob.

The hidden layer neuron number of the neural network is set to 256 and 256, respectively.

2) The neural network update uses an optimizer Adam, the loss function selects the root mean square.

3) The number of training wheels is set to 300, and 8000 steps are set for each training wheel;

4) The playback pool buffer capacity is set to 10000; the size is set to 256;

5) Learning rate 0.003, discount factor 0.99

After model training is completed, the model starts to verify. The calculation results are shown in fig. 5-6.

The invention discloses a multi-agent flexible evaluator-actor algorithm (MASAC) multi-micro-grid collaborative optimization operation algorithm, which increases entropy in an objective function, improves the capability of the algorithm to explore a better strategy, and enhances the robustness of the collaborative optimization algorithm.

The invention discloses a multi-agent deep reinforcement learning action selection method based on sequential iteration, which not only can protect privacy of each agent, but also can meet benefit requirements of each agent, and realizes multi-agent collaborative optimization operation.

The invention discloses a reinforcement learning rewarding function containing the electricity consumption of each intelligent agent and the electricity consumption of a regionSigma e in the bonus function _i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.

Based on the same inventive concept, the embodiment of the present invention further provides a multi-micro-grid collaborative optimization operation device, which can be used to implement a multi-micro-grid collaborative optimization operation method described in the above embodiment, as described in the following embodiments: because the principle of solving the problem of the multi-micro-grid collaborative optimization operation device is similar to that of a multi-micro-grid collaborative optimization operation method, the implementation of the multi-micro-grid collaborative optimization operation device can be referred to the implementation of the multi-micro-grid collaborative optimization operation method, and the repetition is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Referring to fig. 7, the apparatus includes:

the space construction module 200 is configured to construct a state space S and an action space a of a plurality of micro-grids, where each micro-grid is an agent, the state space S includes a plurality of state variables, and each agent selects an electric energy storage as an action variable of the action space a;

the reward function construction module 400 is configured to construct a multi-agent deep reinforcement learning reward function, and guide the multi-agents to cooperatively meet the regional power consumption requirement through rewards, wherein the reward function of each agent includes the outsourcing power of the agent and the component containing information of other agents, namely the power consumption of the whole region.

The multi-micro-grid collaborative optimization operation device constructs a state space S and an action space A of the multi-micro-grid through a space construction module, wherein each micro-grid is an agent, the state space S comprises a plurality of state variables, and each agent selects electric energy storage as an action variable of the action space A; the rewarding function construction module further constructs a multi-agent deep reinforcement learning rewarding function, and the multi-agent deep reinforcement learning rewarding function is guided by rewards to cooperatively meet regional power consumption requirements, wherein the rewarding function of each agent comprises self outsourcing electric quantity and components containing other agent information, namely the electric quantity consumption of the whole region. The reinforcement learning reward function comprising the electricity consumption of each intelligent agent and the regional electricity consumption is disclosed, so that the electricity consumption information is shared among the multiple intelligent agents, the multiple intelligent agents are guided to cooperatively meet the regional electricity consumption requirement through rewards, and the energy consumption cost is reduced.

The multi-micro-grid collaborative optimization operation device of the invention is described in a preferred embodiment as follows:

referring to fig. 8, the operation environment construction module 100 is specifically configured to construct a multi-micro grid collaborative optimization operation environment, where the operation environment includes renewable energy sources, energy storage devices, energy conversion devices, and other operation parameters contained in the micro grid, electricity prices, carbon emission prices, and the like.

The state space construction module 201 is configured to construct a multi-agent deep reinforcement learning state space S, where variables included in S are as follows:

TABLE 1 State space variable constitution

The action space construction module 202 is configured to construct a multi-agent deep reinforcement learning action space a, as shown in the following formula:

A＝{a ^elec ,a ^cooling ,a ^heating } (1)

The reward function construction module 400 is configured to construct a multi-agent deep reinforcement learning reward function, as shown in the following formula:

in the formula e _i Represents the outsourcing electricity quantity of the ith agent, and n represents the number of agents. For each agent, the reward function contains its own outsourcing power e _i And a component Σe containing other agent information _i I.e. the power consumption of the whole area. Sigma e in the bonus function _i The introduction of the system ensures that the electric quantity consumption information is shared among the multiple intelligent agents, so that the multiple intelligent agents are guided to cooperatively meet regional power consumption requirements through rewards, and the energy consumption cost is reduced.

The flow design module 500 is configured to design a multi-agent deep reinforcement learning action selection flow based on sequential iteration, as shown in fig. 2:

1) Randomly sequencing all agents;

6) Action sequenceAnd outputting.

Training module 600, specifically a multi-agent deep reinforcement learning training process, as shown in FIG. 3

4) When the number of environmental steps is smaller than the set value of num_steps, according to the current state s _t Selecting action a _t And performs action a _t Obtain rewards r _t And next environmental state s _t+1 Then, the sample (s _t ,a _t ,r _t ,s _t+1 ) And (4) storing the data into a buffer, and circularly executing the step 4).

The network training process is as follows:

The embodiment of the present invention also provides a computer electronic device, fig. 9 shows a schematic diagram of a structure of an electronic device to which the embodiment of the present invention can be applied, and as shown in fig. 9, the computer electronic device includes a central processing module (CPU) 901 which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 910 as needed, so that a computer program read out therefrom is installed into the storage section 908 as needed.

As another aspect, the present invention further provides a computer readable storage medium, which may be a computer readable storage medium included in the multi-microgrid collaborative optimization operation device in the above embodiment; or may be a computer-readable storage medium, alone, that is not incorporated into an electronic device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the multi-microgrid co-optimization operation methods described in the present invention.

It will be appreciated by those skilled in the art that the present invention can be carried out in other embodiments without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosed embodiments are illustrative in all respects, and not exclusive. All changes that come within the scope of the invention or equivalents thereto are intended to be embraced therein.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The multi-micro-grid collaborative optimization operation method is characterized by comprising the following steps of:

2. The collaborative optimization operation method of claim 1, wherein the action space a is represented by the following formula:

A＝{a ^elec ,a ^cooling ,a ^heating }

3. The collaborative optimization operation method of claim 2, wherein the multi-agent deep reinforcement learning reward function is represented by the following formula:

4. The collaborative optimization operation method according to claim 3, further comprising: multiple agent deep reinforcement learning action selection flow based on sequential iteration:

randomly sequencing all agents;

first agent selectionActionAnd will->Input gradient-lifting tree GBDT predicts the electric energy to be consumed by the first agent under this action +.>i represents an agent subscript, i e {0, 1..n }, n representing the number of agents; m represents the number of sequential iterations;

When m=k, the agent finally outputs a group of action sequences

5. The collaborative optimization operation method according to claim 4, wherein the multi-agent deep reinforcement learning training process comprises:

constructing a neural network comprising a state value neural network q ₁ 、q ₂ The policy neural network pi and the corresponding target neural network areπ ^target ；

6. The collaborative optimization operation method according to claim 5, wherein,

Based on the obtained values, calculateπ ^target Value and calculate neural network q ₁ 、q ₂ A pi loss function, and performing gradient update based on an optimizer selected by the neural network;

7. A multi-microgrid collaborative optimization operation device, characterized in that the device comprises:

8. The multi-microgrid co-optimal operation device according to claim 7, wherein said space construction module comprises:

the action space construction module is used for constructing a multi-agent deep reinforcement learning action space A, and the action space A is shown as the following formula:

A＝{a ^elec ,a ^cooling ,a ^heating } (1)

wherein a is ^elec ,a ^cooling ,a ^heating Charging and discharging actions a E < -1 >, 1 for electric energy storage, cold energy storage and hot energy storage devices]The decimal indicates that 100 x a percent of charge (+)/discharge (-) is performed on the current capacity.

9. An apparatus comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the computer program, implements the method of any of claims 1 to 7.

10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any one of claims 1 to 7.