CN117613848A

CN117613848A - Load scheduling method and device for resident user aggregate

Info

Publication number: CN117613848A
Application number: CN202310928127.9A
Authority: CN
Inventors: 潘廷哲; 李超; 金鑫; 孟子杰; 徐迪; 喻振帆; 罗鸿轩; 蔡新雷
Original assignee: CSG Electric Power Research Institute; Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Current assignee: CSG Electric Power Research Institute; Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2024-02-27

Abstract

The invention relates to the technical field of load scheduling, and discloses a load scheduling method and device for a resident user aggregate. The method of the invention comprises the following steps: constructing a resident user polymer simulation model and an agent model of distributed load scheduling based on multi-agent reinforcement learning, wherein an agent simulation environment is simulated in the agent model by adopting a Markov game model, and an agent structure comprising a centralized evaluation network and a distributed strategy network is constructed based on a multi-layer perceptron; training the intelligent body model by using the resident user aggregate simulation model to obtain a trained intelligent body model; executing distributed load scheduling tasks of the resident aggregate based on the trained agent model; the corresponding device comprises a first construction module, a second construction module, a training module and a scheduling module. The invention fully considers the requirement of user privacy protection and the randomness of the power consumption behavior of the user, and has a larger application prospect.

Description

Load scheduling method and device for resident user aggregate

Technical Field

The invention relates to the technical field of load scheduling, in particular to a load scheduling method and device for a resident user aggregate.

Background

The current frequent energy shortage problem accelerates the progress of replacing the traditional fossil energy with new energy mainly comprising wind energy and light energy, so that the installed capacity of the new energy power generation is continuously increased. The new energy power generation capacity has the problems of intermittence and randomness, the power generation side has weak regulation capacity, and when the new energy power generation capacity is connected into a power grid in a large scale, the regulation capacity of a power system can be weakened, and the stability is reduced. Active scheduling of the load side can achieve effective matching of power generation and power consumption requirements, and the influence of insufficient power generation side scheduling capability is made up. Resident users are an important component part of the load side, and the scheduling potential of resident users has larger development space. However, since the resident users are mostly geographically dispersed, the load scheduling difficulty for the resident users is increased due to the consideration of privacy protection in addition to the resident users.

The resident user aggregate (hereinafter referred to as an aggregate) is formed by linking geographically dispersed resident users by a load aggregator through a network communication method, and can participate in a load side dispatching task of a power grid. The current load scheduling method of the aggregate is mainly centralized, and the centralized scheduling of the load of each user is completed through centralized optimization and decision of the aggregate. However, the method fails to consider the privacy problem of the electricity consumption behavior of the user, and the electricity consumption behavior of the resident user has larger randomness, so that the complexity of the problem of centralized optimization is further increased.

Disclosure of Invention

The invention provides a load scheduling method and device for a resident user aggregate, which solve the technical problems that the existing centralized resident user aggregate load scheduling method cannot meet the requirement of user privacy protection and is easily influenced by the randomness of the power consumption behavior of a user.

The first aspect of the present invention provides a load scheduling method for a residential user aggregate, including:

constructing a resident user aggregate simulation model; the resident user aggregate simulation model comprises a load scheduling model of resident users and a local transaction settlement price model in an aggregate;

constructing an agent model of distributed load scheduling based on multi-agent reinforcement learning; the intelligent agent simulation environment is simulated by adopting a Markov game model in the intelligent agent model, and an intelligent agent structure comprising a centralized evaluation network and a distributed strategy network is built based on a multi-layer perceptron, wherein the evaluation network takes the observed values of all intelligent agents as input and takes the state value as output, and the strategy network takes the observed value of a single intelligent agent as input and takes normal distribution parameters for generating action output of the intelligent agent as output;

Training the intelligent body model by using the resident user aggregate simulation model to obtain a trained intelligent body model;

and executing the distributed load scheduling task of the resident aggregate based on the trained agent model.

According to one implementation manner of the first aspect of the present invention, the building a residential user aggregate simulation model includes:

constructing an objective function by taking the electricity cost of the minimized resident user as an optimization target, setting constraint conditions for the objective function based on the power balance constraint and the energy storage system constraint, and constructing a load scheduling model of the resident user according to the objective function and the constraint conditions;

and constructing a local transaction settlement price model combined with the inclined block rate mode.

According to an implementation manner of the first aspect of the present invention, the building a load scheduling model of a residential user according to the objective function and the constraint condition includes:

the load scheduling model of the residential subscribers is built as follows:

wherein P is _t ^ch,i Charging power of stored energy for resident user i of period t, P _t ^dis,i The discharge power of the stored energy for the resident user i of period t,for the price of buying electricity in the aggregate of period t, < >>Price of electricity sold in aggregate for period t, P _t ^b,i Buying electric power for resident user i in t period of time, P _t ^s,i The power sold by the resident user i in t time period is delta t, the duration of t time period is P _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^pv,i Photovoltaic output power for resident user i in time t, +.>State of charge of ESS (energy storage battery system) for resident user i of period t +.>The state of charge, η, of the ESS of the resident user i for the period t+1 ^ch,i Charging efficiency, eta for ESS of residential consumer i ^dis,i Discharge efficiency of ESS for resident user i, +.>Maximum storage capacity of ESS for resident user i, +.>The minimum state of charge of the ESS for resident user i,maximum state of charge of ESS for resident user i, +.>And (3) rated charging power of the ESS for the resident user i, wherein Γ is a scheduling period set.

According to one implementation manner of the first aspect of the present invention, the building a local transaction settlement price model combined with a sloping block rate mode includes:

any period t e Γ defines the generated power, the required power and the power exchanged with the large power grid in the aggregate as:

wherein P is _t ^md For t-period of power demand in aggregate, P _t ^mg P is the power generated during period t in the aggregate _t ^mu For the power exchanged by the aggregate of the period t and the large power grid, Γ is the scheduling period set, N _r For aggregating a collection of resident users in a body, P _t ^b,i For the consumption power of resident user i of period t, P _t ^s,i The output power of the resident user i in the period t;

constructing a local transaction settlement price model to enable the model to meet the following conditions:

when the required power and the generated power in the aggregate are equal, the electricity purchase price and the electricity sale price of the local transaction are respectively as follows:

in the method, in the process of the invention,purchase price for local transaction in time t period,/->Price of sell for local trade in time t period,/->The price of electricity purchase from the power grid for the aggregate in the period t, wherein eta is a trade discount coefficient;

when the power in the polymer is short, namely P _t ^mu When the price of electricity purchased by the local transaction is greater than 0, the manager of the aggregate automatically purchases electricity to a large power grid to balance the load demand in the aggregate, and the price of electricity purchased by the local transaction is:

wherein, xi is the inclined block rate mode parameter, which satisfies xi E [0,1 ]]，P ^lim Limiting power transmission between the polymer and a large power grid;

when the polymer is superfluous, i.e. P _t ^mu When < 0, the aggregate manager will acquire additional income by selling electricity to the large power grid and feed back the selling electricity price of the local transaction to each user participating in the transaction, namely:

according to one implementation manner of the first aspect of the present invention, the building of the distributed load scheduling agent model based on multi-agent reinforcement learning includes:

Constructing a Markov game model to simulate an agent simulation environment; tuple for Markov game model<n,S,A ₁ ,...A _n ,O ₁ ,...,O _n ,P,r ¹ ,...,r ⁿ >Wherein n is the number of agents, each agent represents a resident user, S is the set of states of the environment in which all agents are located, A ₁ ,...A _n For each agent action set, O ₁ ,...,O _n Is the observation set of multiple intelligent agents, P is SxA to [0,1 ]]Is a state transfer function of the agent simulation environment,r ¹ ,...,r ⁿ for the bonus function of each agent,a reward function for the agent;

and constructing a centralized evaluation network and a distributed strategy network based on the multi-layer perceptron to obtain the intelligent agent structure.

According to one implementation manner of the first aspect of the present invention, the building a markov game model to simulate an agent simulation environment includes:

setting up the observation of any agent in the agent simulation environmentAction->Reward function->The method comprises the following steps of:

in the method, in the process of the invention,the state of charge, P, of the ESS for resident user i at time t _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^b,i Buying electric power for resident user i of period t, P _t ^s,i Sell electric power for resident user i of period T, T _t ^out,i Outdoor temperature measurement value for resident user i in time t,/-, for>The purchase price for the local transaction for time period t-1, Price of sell for local trade for period t-1, +.>For the t-1 period of the aggregate, < > power demand>For t-1 period of generated power in aggregate, < >>For the price of buying electricity in the aggregate of period t, < >>For the price of electricity sold in the aggregate of the period t, alpha is a preset rewarding function coefficient, P _t ^mu Power exchanged by the aggregate with the large grid for period t, P ^lim For power transmission limitation between polymer and large power network>For the working power of the energy storage equipment of the resident user i, when P _t ^ess,i At > 0, indicates that the stored energy is charging, P _t ^ess,i And < 0 indicates that the stored energy is discharging.

According to one implementation manner of the first aspect of the present invention, the training the agent model by using the resident user polymer simulation model to obtain a trained agent model includes:

training the agent model by a back propagation method of a loss function; the loss functions of the evaluation network and the strategy network in the intelligent agent model are as follows:

wherein L is ^π,i (θ) is a loss function of the policy network, E represents the expectation,representing the policy network after the update,representing a policy network before update, r is a reward function value, gamma is a discount factor, o _t+1 Observations of agent for t+1 period, +. >Is o is _t+1 Corresponding state value o _t Observations of agent for period t, +.>Is o is _t Corresponding state values, beta and c ₁ Is super-parameter (herba Cinchi Oleracei)>Representation->And->KL divergence between L ^v,i (ω) is the loss function of the evaluation network,a reward function for the period t+l, gamma ^l For the discount factor of the first period, T is the last period in the set of scheduling periods, and ω and θ are parameters of the multi-layer perceptron.

A second aspect of the present invention provides a load scheduling apparatus for a residential user aggregate, including:

the first construction module is used for constructing a resident user polymer simulation model; the resident user aggregate simulation model comprises a load scheduling model of resident users and a local transaction settlement price model in an aggregate;

the second construction module is used for constructing an agent model of distributed load scheduling based on multi-agent reinforcement learning; the intelligent agent simulation environment is simulated by adopting a Markov game model in the intelligent agent model, and an intelligent agent structure comprising a centralized evaluation network and a distributed strategy network is built based on a multi-layer perceptron, wherein the evaluation network takes the observed values of all intelligent agents as input and takes the state value as output, and the strategy network takes the observed value of a single intelligent agent as input and takes normal distribution parameters for generating action output of the intelligent agent as output;

The training module is used for training the intelligent body model by utilizing the resident user polymer simulation model to obtain a trained intelligent body model;

and the scheduling module is used for executing the distributed load scheduling task of the resident aggregate based on the trained agent model.

According to one manner of implementation of the second aspect of the present invention, the first building block includes:

the first construction unit is used for constructing an objective function by taking the electricity cost of the minimized resident user as an optimization target, setting constraint conditions for the objective function based on the power balance constraint and the energy storage system constraint, and constructing a load scheduling model of the resident user according to the objective function and the constraint conditions;

and a second construction unit for constructing a local transaction settlement price model combined with the oblique block rate mode.

According to one possible implementation manner of the second aspect of the present invention, the first building unit is specifically configured to:

the load scheduling model of the residential subscribers is built as follows:

wherein P is _t ^ch,i Charging power of stored energy for resident user i of period t, P _t ^dis,i The discharge power of the stored energy for the resident user i of period t,for the price of buying electricity in the aggregate of period t, < > >Price of electricity sold in aggregate for period t, P _t ^b,i Buying electric power for resident user i in t period of time, P _t ^s,i The power sold by the resident user i in t time period is delta t, the duration of t time period is P _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^pv,i Photovoltaic output power for resident user i in time t, +.>The state of charge of ESS for resident user i in period t, +.>The state of charge, η, of the ESS of the resident user i for the period t+1 ^ch,i Charging efficiency, eta for ESS of residential consumer i ^dis,i Discharge efficiency of ESS for resident user i, +.>Maximum storage capacity of ESS for resident user i, +.>Minimum state of charge of ESS for resident user i, +.>Maximum state of charge of ESS for resident user i, +.>And (3) rated charging power of the ESS for the resident user i, wherein Γ is a scheduling period set.

According to one possible implementation manner of the second aspect of the present invention, the second building unit is specifically configured to:

according to one manner of implementation of the second aspect of the present invention, the second building block includes:

A third construction unit for constructing a Markov game model to simulate an agent simulation environment; tuple for Markov game model<n,S,A ₁ ,...A _n ,O ₁ ,...,O _n ,P,r ¹ ,...,r ⁿ >Wherein n is the number of agents, each agent represents a resident user, S is the set of states of the environment in which all agents are located, A ₁ ,...A _n For each agent action set, O ₁ ,...,O _n Is the observation set of multiple intelligent agents, P is SxA to [0,1 ]]R is a state transfer function of an intelligent agent simulation environment ¹ ,...,r ⁿ For the bonus function of each agent,a reward function for the agent;

and the fourth construction unit is used for constructing a centralized evaluation network and a distributed strategy network based on the multi-layer perception mechanism to obtain an intelligent agent structure.

According to one possible manner of the second aspect of the present invention, the third building unit is specifically configured to:

in the method, in the process of the invention,the state of charge, P, of the ESS for resident user i at time t _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^b,i Buying electric power for resident user i of period t, P _t ^s,i Sell electric power for resident user i of period T, T _t ^out,i Outdoor temperature measurement value for resident user i in time t,/-, for >The purchase price for the local transaction for time period t-1,price of sell for local trade for period t-1, +.>For the t-1 period of the aggregate, < > power demand>For t-1 period of generated power in aggregate, < >>For the price of buying electricity in the aggregate of period t, < >>For the price of electricity sold in the aggregate of the period t, alpha is a preset rewarding function coefficient, P _t ^mu Power exchanged by the aggregate with the large grid for period t, P ^lim For power transmission limitation between polymer and large power network>For the working power of the energy storage equipment of the resident user i, when P _t ^ess,i At > 0, indicates that the stored energy is charging, P _t ^ess,i And < 0 indicates that the stored energy is discharging.

According to one manner in which the second aspect of the present invention can be implemented, the training module includes:

a training unit for training the agent model by means of a back propagation device of the loss function; the loss functions of the evaluation network and the strategy network in the intelligent agent model are as follows:

wherein L is ^π,i (θ) is a loss function of the policy network, E represents the expectation,representing the policy network after the update,representing a policy network before update, r is a reward function value, gamma is a discount factor, o _t+1 Observations of agent for t+1 period, +.>Is o is _t+1 Corresponding state value o _t Observations of agent for period t, +.>Is o is _t Corresponding state values, beta and c ₁ Is super-parameter (herba Cinchi Oleracei)>Representation->And->KL divergence between L ^v,i (ω) is the loss function of the evaluation network,a reward function for the period t+l, gamma ^l For the discount factor of the first period, T is the last period in the set of scheduling periods, and ω and θ are parameters of the multi-layer perceptron.

A third aspect of the present invention provides a load scheduling apparatus for a residential user aggregate, comprising:

a memory for storing instructions; the instruction is used for realizing the load scheduling method for the resident user aggregate according to the mode which can be realized by any one of the above instructions;

and the processor is used for executing the instructions in the memory.

A fourth aspect of the present invention is a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the load scheduling method for a residential user aggregate according to any one of the modes that can be implemented as described above.

From the above technical scheme, the invention has the following advantages:

the invention builds a resident user aggregate simulation model, comprising a load scheduling model of resident users and a local transaction settlement price model in an aggregate; constructing an agent model of distributed load scheduling based on multi-agent reinforcement learning; the intelligent agent simulation environment is simulated by adopting a Markov game model in the intelligent agent model, and an intelligent agent structure comprising a centralized evaluation network and a distributed strategy network is built based on a multi-layer perceptron, wherein the evaluation network takes the observed values of all intelligent agents as input and takes the state value as output, and the strategy network takes the observed value of a single intelligent agent as input and takes normal distribution parameters for generating action output of the intelligent agent as output; training the intelligent body model by using the resident user aggregate simulation model to obtain a trained intelligent body model; executing distributed load scheduling tasks of the resident aggregate based on the trained agent model; according to the invention, each intelligent agent can dispersedly schedule the energy storage equipment in the resident user environment where the intelligent agent is located, and communication is not needed, so that the requirement of protecting the privacy of a user can be met, the distributed load scheduling of a resident aggregate is performed based on the intelligent agent model of the dispersed load scheduling of multi-intelligent agent reinforcement learning, and the strategy can be adaptively updated according to the feedback of the system, so that the change and uncertainty of the system can be better adapted, the stability and efficiency of the system are ensured, and the influence of the randomness of the power consumption behavior of the user on the load scheduling is effectively avoided.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a load scheduling method for a residential user aggregate according to an alternative embodiment of the present invention;

FIG. 2 is a schematic view of a residential user aggregate structure provided in an alternative embodiment of the present invention;

FIG. 3 is a schematic diagram of a load scheduling agent according to an alternative embodiment of the present invention;

FIG. 4 is a schematic diagram of a centralized training decentralized execution framework employed as provided by an alternative embodiment of the present invention;

FIG. 5 is a schematic diagram of an example of a user-uncontrollable load in an embodiment of the invention;

FIG. 6 is a schematic diagram showing an example of outdoor temperature in an embodiment of the present invention;

FIG. 7 is a schematic diagram showing the load scheduling effect on a residential user aggregate in an embodiment of the present invention;

fig. 8 is a block diagram showing the structural connection of a load dispatching device for a residential user aggregate according to an alternative embodiment of the present invention.

Reference numerals:

1-a first building block; 2-a second building block; 3-a training module; 4-scheduling module.

Detailed Description

The embodiment of the invention provides a load scheduling method and device for a resident user aggregate, which are used for solving the technical problems that the existing centralized resident user aggregate load scheduling method cannot meet the requirement of user privacy protection and is easily influenced by the randomness of the power consumption behavior of a user.

In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a load scheduling method for a resident user aggregate.

Referring to fig. 1, fig. 1 shows a flowchart of a load scheduling method for a residential user aggregate according to an embodiment of the present invention.

The load scheduling method for the resident user aggregate provided by the embodiment of the invention comprises the steps S1-S4.

Step S1, constructing a resident user polymer simulation model; the resident user aggregate simulation model comprises a load scheduling model of resident users and a local transaction settlement price model in an aggregate.

A schematic diagram of the structure of the corresponding resident user aggregate of the constructed resident user aggregate simulation model is shown in FIG. 2. The aggregate body contains an aggregate manager and a plurality of residential subscribers, each of which is equipped with a residential energy management system (Residential Energy Management System, REMS) to accomplish automated scheduling of residential subscriber controlled loads. The resident users can purchase and sell electricity in the aggregate to meet the energy demand of the resident users, the aggregate performs power interaction with the power grid as a whole, and settles in real time, and the final cost is distributed to each resident user through the local transaction settlement price. The aggregate settles the local energy consumption by local transaction settlement prices, and the user minimizes their energy costs by scheduling their controllable loads. In summary, the polymer simulation model specifically includes: a load scheduling optimization model of residential users and a local transaction settlement price model in an aggregate.

Load scheduling models for residential users typically consider the user's uncontrollable load and the stored energy load.

In one implementation, the building of the residential user aggregate simulation model includes:

In one implementation manner, the building a load scheduling model of the residential user according to the objective function and the constraint condition includes:

the load scheduling model of the residential subscribers is built as follows:

wherein P is _t ^ch,i Charging power of stored energy for resident user i of period t, P _t ^dis,i The discharge power of the stored energy for the resident user i of period t,for the price of buying electricity in the aggregate of period t, < >>Price of electricity sold in aggregate for period t, P _t ^b,i Buying electric power for resident user i in t period of time, P _t ^s,i The power sold by the resident user i in t time period is delta t, the duration of t time period is P _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^pv,i Photovoltaic output power for resident user i in time t, +.>The state of charge of ESS for resident user i in period t, +.>The state of charge, η, of the ESS of the resident user i for the period t+1 ^ch,i Charging efficiency, eta for ESS of residential consumer i ^dis,i Discharge efficiency of ESS for resident user i, +.>Maximum storage capacity of ESS for resident user i, +.>Minimum state of charge of ESS for resident user i, +.>Maximum state of charge of ESS for resident user i, +.>And (3) rated charging power of the ESS for the resident user i, wherein Γ is a scheduling period set.

The second constraint is used to ensure the power balance of the user, where P _t ^fixed,i Is the power of the uncontrollable load; p (P) _t ^pv,i Is the output power of the photovoltaic, which is unpredictable. Note that P _t ^fixed,i And P _t ^pv,i All from the user history load data. The third constraint and the fourth constraint require that the resident can only choose to buy or sell electricity in a period of time. The fifth constraint is an energy conversion dynamic model of the ESS, the sixth constraint is a limit range of the state of charge, and the seventh constraint requires that the ESS can be charged or discharged only for a period of time, and the charging and discharging power is subject to the eighth constraint.

In the embodiment of the invention, the local transaction settlement price model (Local Trading Settlement Price, LTSP) in the aggregate is considered to be combined with the inclined block rate (Inclining Block Rate, IBR) method, so that the problem of power transmission blockage caused by overlarge power interaction between the aggregate and the power grid in a short time is avoided. The main purpose of LTSP is to calculate the buying price in the consumer aggregateAnd price of electricity>

In one manner that can be implemented, the building of a local transaction settlement price model in combination with a sloping block rate model includes:

wherein,prices for aggregates to purchase electricity from the grid, such as Time of use (TOU); by setting the trade discount coefficient, the income of energy consumption in the aggregate can be higher than that of direct trade with the power grid.

Wherein,the price for the aggregate to sell electricity to the large grid is the upper grid electricity price (Feed-in Tariff, FIT). At this point the purchase price of the local transaction +.>Is->

It is worth noting that in the LTSP model considered by the invention, the local electricity purchase price is lower compared with TOU and the electricity selling price is higher than FIT when in local consumption, so that the user can be promoted to participate in the local energy consumption to a certain extent, the dependence on a power grid is reduced, and TOU and FIT are actually macroscopic regulation and control methods of the power grid on the load of the polymer.

S2, constructing an agent model of distributed load scheduling based on multi-agent reinforcement learning; the intelligent agent model adopts a Markov game model to simulate an intelligent agent simulation environment, and constructs an intelligent agent structure comprising a centralized evaluation network and a distributed strategy network based on a multi-layer perceptron, wherein the evaluation network takes the observed values of all intelligent agents as input and takes state values as output, and the strategy network takes the observed values of a single intelligent agent as input and takes normal distribution parameters for generating action output of the intelligent agent as output.

The intelligent agent is constructed based on a Multi-Layer Perceptron (MLP), and the basic schematic diagram of the intelligent agent is shown in figure 3. The agent comprises a centralized evaluation networkAnd a decentralized policy network->Are both MLP structures, where ω and θ represent parameters of the MLP. The centralized evaluation network inputs are the observation value set of all the agentsWhereas the inputs of the decentralized policy network are only agent local observations +.>The output of the centralized evaluation network is state value +.>The method characterizes the quality of the current state of the intelligent agent and is used for guiding the intelligent agent to learn a new strategy. The outputs of the distributed policy network are normally distributed parameters, i.e. mean μ and variance σ ² Then, the normal distribution based on the parameter is sampled to obtain the action output +.>Based on the structure of the intelligent agent, the strategy learning process of the intelligent agent is the learning process of the strategy network and the evaluation network of the intelligent agent, and the strategy learning process and the evaluation network are constructed based on MLP, and can pass through the networkThe Back Propagation (BP) method of the loss function trains.

In one manner that can be implemented, the building of an agent model for decentralized load scheduling based on multi-agent reinforcement learning includes:

constructing a Markov game model to simulate an agent simulation environment; tuple for Markov game model<n,S,A ₁ ,...A _n ,O ₁ ,...,O _n ,P,r ¹ ,...,r ⁿ >Wherein n is the number of agents, each agent represents a resident user, S is the set of states of the environment in which all agents are located, A ₁ ,...A _n For each agent action set, O ₁ ,...,O _n Is the observation set of multiple intelligent agents, P is SxA to [0,1 ]]R is a state transfer function of an intelligent agent simulation environment ¹ ,...,r ⁿ For the bonus function of each agent,a reward function for the agent;

In one manner that can be implemented, the building of a Markov game model to simulate an agent simulation environment includes:

in the method, in the process of the invention,the state of charge, P, of the ESS for resident user i at time t _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^b,i Buying electric power for resident user i of period t, P _t ^s,i Sell electric power for resident user i of period T, T _t ^out,i Outdoor temperature measurement value for resident user i in time t,/-, for>The purchase price for the local transaction for time period t-1,price of sell for local trade for period t-1, +.>For the t-1 period of the aggregate, < > power demand>For t-1 period of generated power in aggregate, < >>For the price of buying electricity in the aggregate of period t, < >>For the price of electricity sold in the aggregate of the period t, alpha is a preset rewarding function coefficient, P _t ^mu Power exchanged by the aggregate with the large grid for period t, P ^lim For power transmission limitation between polymer and large power network>For the working power of the energy storage equipment of the resident user i, when P _t ^ess,i At > 0, indicates that the stored energy is charging, P _t ^ess,i And < 0 indicates that the stored energy is discharging.

And step S3, training the intelligent body model by using the resident user polymer simulation model to obtain a trained intelligent body model.

In one implementation manner, the training the agent model by using the resident user polymer simulation model to obtain a trained agent model includes:

As one embodiment, β and c ₁ Can be set to 0.1 and 0.05, respectively.

The present invention employs a centralized training decentralized execution framework, as shown in fig. 4. In the training stage, the centralized evaluation network of each intelligent agent participates in the learning of the intelligent agent strategy, and at the moment, the centralized evaluation network is maintained by an aggregate manager, so that limited information interaction service is provided for intelligent agent training of each resident user. By training the strategy network and the evaluation network of the intelligent agent, the intelligent agent can continuously optimize own strategy, and the efficiency and accuracy of executing tasks are gradually improved. Because the structure of the intelligent agent adopts MLP, the BP method can be used for updating network parameters, thereby realizing the learning process of the intelligent agent.

And S4, executing a distributed load scheduling task of the resident aggregate based on the trained agent model.

In the embodiment of the invention, the trained intelligent agent model is used for the distributed load scheduling task of the aggregate. After training of all the agents is finished, the execution stage is started, and the agents can distributively schedule the energy storage equipment in the resident user environment where the agents are located, so that communication is not needed, and the electricity privacy of the user is protected.

Compared with the prior art, the reinforcement learning is used in the aspect of the load scheduling of the resident user aggregate in the embodiment of the invention, and the reinforcement learning method has the following advantages:

1) The self-adaption capability is strong: the reinforcement learning method learns and adapts by constantly interacting with the environment, so that different environment and system changes can be adaptively dealt with; in the aspect of resident user load scheduling, the reinforcement learning method can learn and adjust according to different user behaviors, electricity utilization habits, weather changes and other factors, so that an optimal load scheduling strategy is adaptively provided; meanwhile, the strategy can be adaptively updated according to the feedback of the system, so that the system change and uncertainty can be better adapted; in the aspect of resident user load scheduling, the reinforcement learning method can update the load scheduling strategy according to the real-time load condition, user feedback and other information, thereby ensuring the stability and efficiency of the system;

2) Centralized learning decentralized scheduling: the reinforcement learning method can avoid single-point faults and system instability of the centralized controller through decentralized decisions; in the aspect of resident user load scheduling, the reinforcement learning method can disperse decisions to each user side, so that decentralized load scheduling is realized, and the risk and uncertainty of a system are reduced;

3) Privacy protection is achieved: the load scheduling model is trained in a centralized manner. At this stage, the user can collect the electricity data of all users and transmit the data to the centralized server for load prediction and training of the load scheduling model; in the process, the privacy data of the user can be encrypted and anonymized, so that the privacy of the user is protected; in the decentralized execution stage, the private data of the user does not need to be transmitted to a centralized server, and can be processed and protected locally.

The method according to the above-described embodiment of the present invention will be described in detail with reference to the following specific examples.

Consider an aggregate containing 5 typical residential users, the load of each residential user comprising: roof photovoltaics, energy storage, and other user random loads. Wherein a schematic diagram of the corresponding resident user uncontrollable load data is shown in fig. 5. According to the invention, two-way power and information connection exists between each resident user and the aggregate manager, and the two-way power and information connection is used for balancing the power consumption requirement of the resident user, uploading and downloading information issued by the aggregate manager. The load scheduling task of this example will be performed in discrete times of day, specifically t= {1,..96 }, with a time interval of Δt=0.25 hours for any period. The parameters of the energy storage device installed by each user are set as shown in table 1. According to the invention, the operation of each stage is described below.

Table 1 energy storage device parameter settings for each user

/>

The first stage: historical data is collected, and a resident user aggregate simulation model considered by the invention is constructed, wherein necessary data comprises: uncontrollable loads (i.e., random loads of users) of the individual users; outdoor temperature data; and a power grid time-sharing electricity price and an online electricity price scheme. The uncontrolled load of the resident user of the present example application is shown in fig. 5; the outdoor temperature is shown in fig. 6; the time-of-use electricity prices and the internet electricity price schemes are shown in table 2.

Table 2 Power grid time-of-use and Internet-surfing Power price scheme

And a second stage: an agent model of distributed load scheduling based on multi-agent reinforcement learning is constructed, and observation, action and rewarding functions in a Markov game model of the multi-agent in the example are defined as optional forms of the invention. The agent was constructed using an alternative form of the invention, in which the MLP structure is shown in Table 3.

TABLE 3 design of agent MLP Structure

And a third stage: training the intelligent agent constructed in the second stage by utilizing the simulation model constructed in the first stage; the training process is centralized, i.e. each agent contains a single agent maintained by the aggregate managerCentralized evaluation network. The training process for the agent is shown in Table 4, where m _ep For maximum training times M _tr The number of times is mini-batch, B is mini-batch size, and D is cache size.

TABLE 4 training process for Polymer dispersed load scheduling agent

Fourth stage: the trained agent model is used in the distributed load scheduling task of the aggregate. In this stage, the agent will output and obtain the control action via the policy network according to the observed information, and execute the control action, so that the users in the aggregate do not need to communicate with each other, and the implementation process of the specific execution of the control command does not belong to the scope of the present invention, and will not be described here. According to this example, the energy scheduling effect of this stage is shown in fig. 7.

The invention also provides a load scheduling device for the resident user aggregate, which can be used for executing the load scheduling method for the resident user aggregate.

Referring to fig. 8, fig. 8 shows a block diagram of structural connection of a load dispatching device for a residential user aggregate according to an embodiment of the present invention.

The load scheduling device for the resident user aggregate provided by the embodiment of the invention comprises the following components:

the first construction module 1 is used for constructing a resident user polymer simulation model; the resident user aggregate simulation model comprises a load scheduling model of resident users and a local transaction settlement price model in an aggregate;

A second construction module 2, configured to construct an agent model of distributed load scheduling based on multi-agent reinforcement learning; the intelligent agent simulation environment is simulated by adopting a Markov game model in the intelligent agent model, and an intelligent agent structure comprising a centralized evaluation network and a distributed strategy network is built based on a multi-layer perceptron, wherein the evaluation network takes the observed values of all intelligent agents as input and takes the state value as output, and the strategy network takes the observed value of a single intelligent agent as input and takes normal distribution parameters for generating action output of the intelligent agent as output;

the training module 3 is used for training the intelligent body model by utilizing the resident user polymer simulation model to obtain a trained intelligent body model;

and the scheduling module 4 is used for executing the distributed load scheduling task of the resident aggregate based on the trained agent model.

In one possible implementation, the first building block 1 comprises:

In one possible implementation, the first building unit is specifically configured to:

the load scheduling model of the residential subscribers is built as follows:

wherein P is _t ^ch,i Charging power of stored energy for resident user i of period t, P _t ^dis,i The discharge power of the stored energy for the resident user i of period t,for the price of buying electricity in the aggregate of period t, < >>Price of electricity sold in aggregate for period t, P _t ^b,i Buying electric power for resident user i in t period of time, P _t ^s,i The power sold by the resident user i in t time period is delta t, the duration of t time period is P _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^pv,i Photovoltaic output power for resident user i in time t, +.>The state of charge of ESS for resident user i in period t, +.>The state of charge, η, of the ESS of the resident user i for the period t+1 ^ch,i Charging efficiency, eta for ESS of residential consumer i ^dis,i Discharge efficiency of ESS for resident user i, +.>Maximum storage of ESS for resident user iStorage capacity->Minimum state of charge of ESS for resident user i, +.>Maximum state of charge of ESS for resident user i, +.>And (3) rated charging power of the ESS for the resident user i, wherein Γ is a scheduling period set.

In one possible implementation, the second building unit is specifically configured to:

in the method, in the process of the invention,local exchange for period tEasy purchase price, < > for electricity>Price of sell for local trade in time t period,/->The price of electricity purchase from the power grid for the aggregate in the period t, wherein eta is a trade discount coefficient;

wherein, xi is the inclined block rate mode parameter, which satisfies xi E [0,1 ] ]，P ^lim Limiting power transmission between the polymer and a large power grid;

in one possible implementation, the second building block 2 comprises:

a third construction unit for constructing a Markov game model to simulate an agent simulation environment; tuple for Markov game model<n,S,A ₁ ,...A _n ,O ₁ ,...,O _n ,P,r ¹ ,...,r ⁿ >Wherein n is the number of agents, each agent represents a resident user, S is the set of states of the environment in which all agents are located, A ₁ ,...A _n For each agent action set, O ₁ ,...,O _n Is the observation set of multiple intelligent agents, P is SxA to [0,1 ]]State transition for agent simulation environmentFunction r ¹ ,...,r ⁿ For the bonus function of each agent,a reward function for the agent;

In one possible implementation, the third building unit is specifically configured to:

setting up the observation of any agent in the agent simulation environment Action->Reward function->The method comprises the following steps of:

In one possible implementation, the training module 3 comprises:

The invention also provides a load scheduling device for the resident user aggregate, which comprises:

a memory for storing instructions; the instructions are used for implementing the load scheduling method for the resident user aggregate according to any one of the embodiments;

and the processor is used for executing the instructions in the memory.

The invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the load scheduling method for the resident user aggregate according to any one of the embodiments when being executed by a processor.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working processes of the above-described apparatus, modules and units may refer to corresponding processes in the foregoing method embodiments, and specific beneficial effects of the above-described apparatus, modules and units may refer to corresponding beneficial effects in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The load scheduling method for the resident user aggregate is characterized by comprising the following steps of:

2. The residential user aggregate oriented load scheduling method as claimed in claim 1, wherein said constructing a residential user aggregate simulation model comprises:

3. The load scheduling method for residential user aggregate according to claim 2, wherein,

the load scheduling model of the residential users is as follows:

4. The load scheduling method for residential user aggregate according to claim 2, wherein said constructing a local transaction settlement price model in combination with a sloping block rate model comprises:

wherein P is _t ^md For t-period of power demand in aggregate, P _t ^mg P is the power generated during period t in the aggregate _t ^mu For period t The power exchanged by the aggregate with the large grid, Γ is the set of scheduling periods,for aggregating a collection of resident users in a body, P _t ^b,i For the consumption power of resident user i of period t, P _t ^s,i The output power of the resident user i in the period t;

when the power in the aggregate is in shortage, the aggregate manager will automatically purchase power to the large power grid to balance the load demand in the aggregate, and the purchase price of the local transaction is:

when the aggregate power is excessive, the aggregate manager will obtain additional income by selling power to the large power grid and feed back the selling power price of the local transaction to each user participating in the transaction, namely:

5. the residential user aggregate-oriented load scheduling method as claimed in claim 1, wherein said constructing an agent model for distributed load scheduling based on multi-agent reinforcement learning comprises:

6. The residential customer aggregate-oriented load scheduling method as defined in claim 5, wherein said constructing a markov game model to simulate an agent simulation environment comprises:

in the method, in the process of the invention,the state of charge, P, of the ESS for resident user i at time t _t ^fixed,i Power of uncontrollable load for resident user i of period t, P _t ^b,i Buying electric power for resident user i of period t, P _t ^s,i Sell electric power for resident user i of period T, T _t ^out,i Outdoor temperature measurement value for resident user i in time t,/-, for >Purchase price for local transaction for time period t-1, +.>Price of sell for local trade for period t-1, +.>For the t-1 period of the aggregate, < > power demand>For t-1 period of generated power in aggregate, < >>For the price of buying electricity in the aggregate of period t, < >>For the price of electricity sold in the aggregate of the period t, alpha is a preset rewarding function coefficient, P _t ^mu Power exchanged by the aggregate with the large grid for period t, P ^lim For power transmission limitation between polymer and large power network>For the working power of the energy storage equipment of the resident user i, when P _t ^ess,i At > 0, indicates that the stored energy is charging, P _t ^ess,i And < 0 indicates that the stored energy is discharging.

7. The load scheduling method for a residential user aggregate according to claim 1, wherein the training the agent model by using the residential user aggregate simulation model to obtain a trained agent model comprises:

wherein L is ^π,i (θ) is the loss function of the policy network,indicating desire->Representing an updated policy network, ++>Representing a policy network before update, r is a reward function value, gamma is a discount factor, o _t+1 Is the observed value of the agent for the t +1 period,is o is _t+1 Corresponding state value o _t Observations of agent for period t, +.>Is o is _t Corresponding state values, beta and c ₁ Is super-parameter (herba Cinchi Oleracei)>Representation->And->KL divergence between L ^v,i (omega) is the loss function of the evaluation network, < ->A reward function for the period t+l, gamma ^l For the discount factor of the first period, T is the last period in the set of scheduling periodsω and θ are parameters of the multilayer perceptron.

8. A residential user aggregate-oriented load scheduling apparatus, comprising:

9. A residential user aggregate-oriented load scheduling apparatus, comprising:

a memory for storing instructions; wherein the instructions are for implementing a load scheduling method for a residential user aggregate according to any one of claims 1 to 7;

and the processor is used for executing the instructions in the memory.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the load scheduling method for a residential user aggregate according to any one of claims 1-7.