CN115392766A

CN115392766A - Demand side resource collaborative optimization scheduling method based on local power market

Info

Publication number: CN115392766A
Application number: CN202211110254.XA
Authority: CN
Inventors: 赵博超; 许彪; 栾文鹏; 刘博�
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2022-11-25

Abstract

The invention discloses a demand side resource collaborative optimization scheduling method based on a local power market, which comprises the following steps: establishing a demand side resource aggregation model for an electric vehicle charging station, a heating ventilation air-conditioning aggregator and a micro-grid; on the basis of the intermediate market price, the network loss balance cost is apportioned according to the network loss sensitivity, a local electric power market pricing strategy meeting the balance of revenue and expenditure requirement is provided, and a local electric power market guiding the cooperative operation of the resources on the demand side is established; constructing basic elements of a multi-agent reinforcement learning method based on a demand side resource aggregation model and a local power market; and training an agent representing different demand side resource aggregators by using the MACSAC, and performing scheduling decision on the demand side resource aggregators represented by the agent to realize distributed cooperative optimal scheduling of demand side resources. The demand side resource coordination under the multi-benefit subject is solved, and the balance of the local power market, the privacy safety of the user and the safe and stable operation of the power distribution network are effectively guaranteed.

Description

Demand side resource collaborative optimization scheduling method based on local power market

Technical Field

The invention relates to the field of power systems and automation thereof, in particular to a demand side resource collaborative optimization scheduling method based on a local power market.

Background

Demand side resources, including renewable energy generation, flexible demand and energy storage, are growing rapidly in power distribution networks today, with great potential in voltage regulation, congestion management and carbon abatement. To activate these potentials, a coordinated management of demand side resources is required. As a marketization means, the local power market can stimulate users to spontaneously conduct direct power trading through price signals, promote the balance of local power supply and demand, and achieve the purpose of coordinated operation of demand-side resources. Specifically, the local power market sets a selling price higher than an online electricity price and a buying price lower than a time-of-use electricity price according to the local supply and demand conditions, and stimulates users to adjust the power utilization curve and trade power surplus or power shortage on the local power market, so that the dependence on a power grid is reduced. Compared with a centralized control means, the local power market adopts an indirect guidance mode, the control decision right of the equipment is released to the user, the control autonomy of the user is kept, and the privacy safety problem is avoided. However, the existing work still has many disadvantages, which are as follows:

neglecting the influence of local power trading on network loss, the pricing mechanism of the local power market has the problem of unbalanced revenue and expenditure. The existing pricing mechanism aiming at the local electric power market often determines the local transaction price directly according to the electric quantity reported by a user, changes of net load and net generated electric quantity caused by network loss are ignored, and the local electric power market cannot charge the cost required by balancing the network loss through pricing, so that the problem of unbalanced balance is caused. In addition, how to distribute the network loss balance cost among users fairly according to the size of the network loss contribution degree is also a key problem which must be solved.

The transaction decision of the user in the local electric power market does not consider the network security constraint, and the safe and stable operation of the power grid can be threatened. Most researches generally consider that the generation and utilization scales of users in the local power market are small, and the influence on the safe and stable operation of a power grid is negligible. However, due to the self-owned tendency of users, the electricity demand/generation plan may be concentrated at the time of the lowest/highest electricity price, which causes demand bounce or new generation peak, thereby possibly causing problems such as node voltage out-of-limit, line transmission blocking, etc.

The existing distributed optimization algorithm is difficult to meet the requirement of the optimal trading decision of the user on the solving efficiency in the local power market. The alternative multiplier method, a commonly used distributed optimization algorithm, can help users to make trading decisions independently, but the method needs a coordination center to coordinate all users. The method based on the consistency principle is improved on the basis of the alternative multiplier method, and complete decentralized optimization can be realized without depending on a coordination center. However, both the alternative multiplier method and the improved method involve an iterative solution process, and the convergence of solving the large-scale optimization problem needs to be verified.

Disclosure of Invention

Aiming at the prior art, the invention provides a demand side resource collaborative optimization scheduling method based on a local power market, which mainly comprises the following steps:

s1, modeling three types of demand side resource aggregators, namely an electric vehicle charging station, a heating ventilation air conditioner aggregator and a micro-grid to obtain a demand side resource aggregation model;

s2, on the basis of the intermediate market price, the network loss balance cost is apportioned according to the network loss sensitivity, a local electric power market pricing strategy meeting the balance of revenue and expenditure is provided, and a local electric power market guiding the resources on the demand side to cooperatively run is established;

s3, constructing basic elements of the multi-agent reinforcement learning method based on a demand side resource aggregation model and a local power market, wherein the basic elements comprise an agent, an environment, an observation value, an action value, a reward function and a cost function;

and S4, training the agents representing different demand side resource aggregators by using a multi-agent reinforced learning algorithm MACSAC (multi-agent constrained soft operator-critical), and using the trained agents to perform scheduling decision on the demand side resource aggregators represented by the agents, thereby realizing distributed cooperative optimal scheduling of demand side resources.

Further, the specific steps of step S1 include:

s1-1) modeling is carried out on the electric vehicle charging station by adopting a virtual energy storage model, wherein the virtual energy storage model of the electric vehicle charging station is expressed as follows:

in the formula (1), the reaction mixture is,

representing the total power of the virtual stored energy i representing the electric vehicle charging station at the moment t,

and

respectively an upper power limit and a lower power limit;

representing an amount of power representing a virtual stored energy of an electric vehicle charging station,

and

respectively an electric quantity upper limit and an electric quantity lower limit; Δ t represents a time interval;

representing the virtual energy storage capacity change caused by the fact that the electric vehicle leaves or enters the charging station;

s1-2) modeling the heating, ventilation and air conditioning aggregator by adopting a virtual energy storage model, wherein the virtual energy storage model of the heating, ventilation and air conditioning aggregator is expressed as follows:

in the formula (2), the reaction mixture is,

to represent the total power of the virtual energy storage i of the hvac aggregator at time t,

and

respectively an upper power limit and a lower power limit;

to represent the amount of electricity of the virtual stored energy of the hvac aggregator,

and

respectively an electric quantity upper limit and an electric quantity lower limit; alpha is the electric quantity attenuation rate of the virtual energy storage; in addition, the virtual energy storage representing the heating, ventilation and air conditioning also has a reference load

The parameters of (1);

s1-3) the microgrid comprises a photovoltaic system, an energy storage system and a non-flexible load; wherein: the photovoltaic system is modeled as:

the energy storage system is modeled as follows:

the inflexible load modeling is as follows:

in the formulae (4) to (8),

outputting an active power predicted value for the photovoltaic;

outputting reactive power for the photovoltaic inverter;

outputting active power for the photovoltaic system; sigma is a power factor;

the output power of the energy storage i at the moment t is obtained;

and

respectively an upper limit and a lower limit of the energy storage power;

the electric quantity of the stored energy i at the moment t is obtained;

and

respectively representing an upper limit and a lower limit of the energy storage electric quantity; eta ₊ And η _- Respectively the charge and discharge efficiency of the stored energy;

and

the active power of the inflexible load and a predicted value thereof;

and

and the reactive power of the inflexible load and a predicted value thereof.

In step S2 of the method, the local power market is formed by taking a demand side resource aggregator as a market member and a trading platform together; the local electricity market pricing strategy that meets the balance of revenue and expenditure requirements comprises: determining a basic electricity price; determining a network loss allocation price; determining local electricity purchasing price and local electricity selling price by combining the basic electricity price and the network loss allocation price; the local power market guiding the cooperative operation of the resources on the demand side is established based on a local power market pricing strategy, the local electricity purchasing price and the electricity selling price are determined in a self-adaptive mode according to the supply and demand balance condition on the local power market, and the resource aggregator on the demand side makes an optimal resource scheduling decision on the demand side according to the local electricity purchasing price and the electricity selling price, so that the cooperative operation of the resources on the demand side is realized. The specific content of step S2 is as follows:

s2-1) determining the basic electricity price: firstly, each market member determines the electricity purchasing quantity or electricity selling quantity on the local electricity market according to the scheduling decision value of the managed demand side resource:

in the formula (9), the reaction mixture is,

representing the electricity purchases submitted by the demand side resource aggregator i on the local electricity market at time tElectric quantity or selling electric quantity, omega _EV 、Ω _HVAC And Ω _MG Respectively representing an electric vehicle charging station set, a heating ventilation air-conditioning aggregation business set and a micro-grid set in the market; the local power market then calculates the total demand on the market

Total power generation

And net demand

In the formula (10), the compound represented by the formula (10),

representing the increment of loss of the network caused by local electric power market trading; Ω represents a set of all market members; calculating a base electricity purchase price according to the intermediate market price definition

And basic electricity selling price

In the formula (12), the reaction mixture is,

and

respectively representing time-of-use electricity price and internet electricity price in the electricity price of the power grid;

s2-2) determining the network loss apportionment price: based on the network loss sensitivity, calculating the network loss apportionment price of each node:

in the formula (13), the reaction mixture is,

the network loss sensitivity coefficient corresponding to the node i is obtained;

s2-3) determining local electricity purchasing price and local electricity selling price by combining the basic electricity price and the network loss share price:

in formulas (14) and (15):

representing local electricity purchase prices of market members located at node i;

a local electricity selling price representing a market member located at the node j;

s2-4) the operation mechanism of the local power market is as follows: when each transaction is started, the transaction platform firstly releases the current power grid price and the local electricity purchasing price and the local electricity selling price of the last transaction to market members; the market members make a demand side resource scheduling decision, determine electricity purchasing electric quantity and electricity selling electric quantity according to the scheduling decision, and submit the electricity purchasing electric quantity and the electricity selling electric quantity to the transaction platform; the trading platform clears the local electric power market by using a local electric power market pricing strategy to obtain a local electricity purchasing price and an electricity selling price; entering the next transaction; therefore, the cooperative operation of the resources on the demand side is realized.

In step S3 of the method of the present invention, the basic elements for constructing the multi-agent reinforcement learning method include:

s3-1) an agent: initializing an agent for each demand side resource aggregator, wherein the agent comprises a reward evaluator network, a cost evaluator network and an actor network, the reward evaluator network and the cost evaluator network are used for guiding actor network training, and the actor network is used for outputting a scheduling decision of the demand side resource aggregator on the managed demand side resource;

s3-2) environment: the local power market and the power distribution network jointly form an environment for interacting with the intelligent agent; the interaction process of the environment and the intelligent agent is as follows: after receiving the electricity purchasing quantity and the electricity selling quantity submitted by the demand side resource aggregator, the local electricity market clears the market by using the proposed local electricity market pricing strategy to obtain a local electricity purchasing price and an electricity selling price, and distributes the local electricity purchasing price and the electricity selling price to each demand side resource aggregator; meanwhile, the power distribution network operation platform feeds back the running state information of the power distribution network at the current moment, including node voltage and branch power, to each demand side resource aggregator;

s3-3) observed value: defining an observed value of each intelligent agent according to the monitoring value of the equipment operation state in the demand side resource aggregation model established in the step S1 and price information transmitted to the intelligent agents by the environment, wherein the price information comprises power grid electricity price, local electricity purchasing price and local electricity selling price;

for an agent i representing the microgrid, the set of observation values at the moment t is as follows:

for agent j representing an electric vehicle charging station, itThe set of observations at time t is:

for the intelligent agent k of the heating, ventilation and air conditioning aggregator, the observation value set at the time t is

S3-4) action value: defining an action value of each agent according to the control variable of the equipment in the demand side resource aggregation model established in the step S1, wherein the action values of the agents form a scheduling decision of a demand side resource aggregator on the demand side resources managed by the demand side resource aggregator; for an agent i representing a microgrid, the set of action values is

For agent j representing a charging station for an electric vehicle, the action value is

For an agent k representing the aggregator of heating, ventilation and air conditioning, the action value is

S3-5) reward function: the reward function for each agent is defined as its electricity sales revenue minus electricity purchase expenditure on the local electricity market, which is calculated by the local electricity market in the environment and fed back to the agent:

the operator in the formula (16) is defined as [ x ]] ₊ /[x] _- ＝max/min(θ，x)；

S3-6) cost function: defining a cost function according to the safe operation constraint of the power distribution network, wherein the cost function value is calculated by a power distribution network operation platform in the environment and fed back to an intelligent agent:

in the formula (17), V _n Is the voltage of node n;

andVthe upper limit and the lower limit of the node voltage are respectively; n is a radical of _i The method is a node set in a power distribution network partition to which an agent i belongs.

In step S4 of the method, an agent representing resource aggregators on different demand sides is trained by using a multi-agent reinforcement learning algorithm MACSAC, and the method specifically comprises the following steps:

s4-1) initializing parameters of an actor network, a reward evaluator network and a cost evaluator network and an empty experience playback pool for each agent;

s4-2) inputting the observed value of each agent in the environment to the actor network to obtain an action value;

s4-3) each market member schedules the managed demand-side resource according to the action value given by the intelligent agent, calculates the electricity purchasing and selling quantity and submits the electricity purchasing and selling quantity to the local electric power market, the local electric power market clears the market according to the pricing mechanism provided in the step S2, the local electric power market feeds back a reward function value and price information to the market members, and the power distribution network operation platform feeds back a cost function value to the market members;

s4-4) storing an action value, an observation value, an incentive function value and a cost function value generated by interaction of the intelligent agent and the environment as a sample in an experience playback pool;

s4-5) randomly extracting a batch of samples from the experience playback pool, and updating the network parameters of the reward evaluator and the cost evaluator of each agent;

s4-6) estimating the future accumulated reward expectation value and the accumulated cost expectation value of each intelligent agent by using the reward evaluator network and the cost evaluator network, and guiding the updating of the actor network of each intelligent agent;

s4-7) repeating the steps S4-2) -S4-6) until the number of times of updating the parameters reaches a set value, and obtaining a trained intelligent agent;

when the method of the invention is used for demand side resource collaborative optimization scheduling, after each agent receives the observed value of the current moment, the agent network is used for outputting the action value, and the corresponding demand side resource aggregator schedules the managed demand side resource according to the action value, thereby realizing the distributed collaborative optimization scheduling of the demand side resource.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a demand side resource collaborative optimization scheduling method based on a local power market. The marketization means is utilized to guide the demand side resources to actively adjust the power curve so as to realize local supply and demand balance, and the problem of resource coordination of the demand side under the multi-benefit subject is solved; a local electric power market pricing mechanism considering the fair share of the network loss is designed, and the balance of the local electric power market is effectively guaranteed; the resource collaborative optimization scheduling problem at the demand side is modeled into a limited Markov game model, and the MACSAC algorithm is utilized to realize decentralized optimization solution, so that the privacy and the safety of users are protected, and the safe and stable operation of the power distribution network is ensured.

Drawings

Fig. 1 is a flow chart of a scheduling method of the present invention.

FIG. 2 is a 33-node power distribution network according to an exemplary embodiment of the present invention;

FIG. 3 is a reward convergence curve and a voltage out-of-limit cost curve for both MACSAC and MASAC algorithms;

FIG. 4 is a schematic diagram of a node voltage box in an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the following figures and specific examples, which are not intended to limit the invention in any way.

The design concept of the invention is as follows: constructing a demand side resource collaborative optimization scheduling framework based on a local power market: the demand side resource aggregator and the micro-grid are used as market members and form a local electric power market together with the trading platform; before each transaction, the transaction platform firstly releases the power grid price to market members; the market member makes a scheduling plan for the demand side resources managed by the market member according to the price of the power grid, and submits the power purchasing and selling plan to a trading platform; the trading platform clears the local electric power market by using a pricing mechanism, and feeds back the obtained local electricity purchasing and selling prices to market members, and the market members continuously improve a trading decision model according to the local electric power supply and demand conditions reflected by the local electricity purchasing and selling prices, and optimize future electricity purchasing and selling plans, so that coordinated operation among resources on demand sides is realized. Based on the framework, the original problem of demand side resource collaborative optimization scheduling is converted into the optimal transaction decision problem of the demand side resource aggregator in the local power market. The method of the invention, as shown in fig. 1, comprises the following steps:

s1, firstly, constructing a demand side resource collaborative optimization scheduling framework based on a local power market, so that a demand side resource collaborative optimization scheduling problem is converted into an optimal transaction decision problem of each demand side resource aggregator, and modeling is performed on three types of demand side resource aggregators, namely an electric vehicle charging station, a heating ventilation air conditioner aggregator and a micro-grid, so as to obtain a demand side resource aggregation model;

s2, on the basis of the intermediate market price, the network loss balance cost is apportioned according to the network loss sensitivity, a local electric power market pricing strategy meeting the balance of revenue and expenditure is provided, and a local electric power market guiding the cooperative operation of the resources on the demand side is established through price signals;

s3, constructing basic elements of the multi-agent reinforcement learning method based on a demand side resource aggregation model and a local power market, wherein the basic elements comprise agents, environments, observation values, action values, reward functions and cost functions; modeling each demand side resource aggregator in the market by using a plurality of intelligent agents; the method specifically comprises the steps of defining a state variable, an action variable and a reward function of each intelligent agent according to the operating characteristics, the control characteristics and the economic parameters of demand side resources, defining a cost function according to the safety operation constraint of a power distribution network, and describing the transaction process of demand side resource aggregators on a local electric power market as the interaction between a plurality of intelligent agents and the environment, so that the optimal transaction decision problem of the demand side resource aggregators on the local electric power market is modeled into a limited Markov game model;

and S4, training the agents representing different demand side resource aggregators by using a multi-agent reinforced learning algorithm MACSAC (Multi-agent constrained software-conditional) and using the trained agents to perform scheduling decisions on the demand side resource aggregators represented by the agents, wherein each demand side resource aggregator independently makes transaction decisions through the respective agent, so that distributed cooperative optimal scheduling of demand side resources is realized.

In the invention, three types of demand side resource aggregators, namely an electric vehicle charging station, a heating ventilation air conditioner aggregator and a microgrid, are modeled, wherein the electric vehicle charging station and the heating ventilation air conditioner aggregator are modeled by adopting a virtual energy storage model, and the specific formula is as follows:

(1) For an electric vehicle charging station that is modeled using a virtual energy storage model, the virtual energy storage model representing the electric vehicle charging station may be represented as:

in the formula (I), the compound is shown in the specification,

and

respectively an upper power limit and a lower power limit;

and

representing a change in the virtual amount of stored energy due to the electric vehicle leaving or entering the charging station.

(2) For the heating, ventilation and air conditioning aggregator, a virtual energy storage model is adopted for modeling, and the virtual energy storage model representing the heating, ventilation and air conditioning aggregator can be expressed as follows:

in the formula (I), the compound is shown in the specification,

and

respectively an upper power limit and a lower power limit;

and

The parameter (c) of (c).

(3) The microgrid comprises a photovoltaic system, an energy storage system and a non-flexible load; wherein: the photovoltaic system can be modeled as:

the energy storage system can be modeled as:

the inflexible load can be modeled as:

in the formula (I), the compound is shown in the specification,

outputting an active power predicted value for the photovoltaic;

outputting reactive power for the photovoltaic inverter;

outputting active power for the photovoltaic system; sigma is a power factor;

the output power of the energy storage i at the moment t is obtained;

and

respectively an upper limit and a lower limit of the energy storage power;

the electric quantity of the stored energy i at the moment t is obtained;

and

and

the active power of the inflexible load and a predicted value thereof;

and

and the reactive power of the inflexible load and a predicted value thereof.

In the invention, the local power market is formed by a demand side resource aggregator as a market member and a trading platform together; the local electricity market pricing strategy that meets the balance of revenue and expenditure requirements comprises: determining a basic electricity price; determining a network loss allocation price; determining local electricity purchasing price and local electricity selling price by combining the basic electricity price and the network loss allocation price; the local power market for guiding the demand side resources to run cooperatively is established based on a local power market pricing strategy, the local electricity purchasing price and the electricity selling price are determined in a self-adaptive mode according to the supply and demand balance condition on the local power market, and the demand side resource aggregator makes an optimal demand side resource scheduling decision according to the local electricity purchasing price and the electricity selling price, so that the demand side resources run cooperatively.

The invention designs a local electric power market price mechanism meeting the balance of revenue and expenditure requirements, wherein the price comprises two parts of basic electricity price and network loss share price, and the specific calculation formula is as follows:

(1) Determining the basic electricity price: first, the bid power of each type of market member is defined: and each market member determines the electricity purchasing quantity or electricity selling quantity of the market member on the local electricity market according to the scheduling decision value of the managed demand side resource:

in the formula (I), the compound is shown in the specification,

representing the electricity purchasing quantity or electricity selling quantity, omega, submitted by the demand side resource aggregator i on the local electricity market at the moment t _EV 、Ω _HVAC And Ω _MG Respectively represent electric automobile charging station set, heating and ventilation air conditioning aggregate business set and little electric wire netting set in the market.

Then, the total demand on the market is calculated

Total power generation

And net demand

In the formula (I), the compound is shown in the specification,

representing the increment of loss of the network caused by local electric power market trading; Ω represents the set of all market members.

Calculating the basic electricity purchase price according to the definition of the intermediate market price

And basic electricity selling price

In the formula (12), the reaction mixture is,

and

and respectively representing time-of-use electricity price and internet electricity price in the electricity price of the power grid.

(2) Determining loss share prices

Based on the network loss sensitivity, calculating the network loss apportionment price of each node:

in the formula (I), the compound is shown in the specification,

is a section ofAnd (4) the network loss sensitivity coefficient corresponding to the point i.

And determining local electricity purchasing price and local electricity selling price by combining the basic electricity price and the network loss share price:

in the formula:

representing the local electricity selling price of the market member located at node j.

The operating mechanism of the local power market is: when each transaction is started, the transaction platform firstly releases the current power grid price and the local electricity purchasing price and the local electricity selling price of the last transaction to market members; the market members make a demand side resource scheduling decision, determine electricity purchasing electric quantity and electricity selling electric quantity according to the scheduling decision, and submit the electricity purchasing electric quantity and the electricity selling electric quantity to the transaction platform; the trading platform clears the local electric power market by using a local electric power market pricing strategy to obtain a local electricity purchasing price and an electricity selling price; entering the next transaction; therefore, the cooperative operation of the resources on the demand side is realized.

In the invention, an optimal transaction decision problem of a demand side resource aggregator in a local power market is modeled into a restricted Markov game model, which specifically comprises the following steps: in the restricted Markov game corresponding to the problem, each agent corresponds to a demand side resource aggregator, the environment interacting with the agent is composed of a power distribution network and a local power market, and the complete process is as follows: at each time step t, each agent receives its own observation o _i，t Then according to its policy function pi _i (·|o _i，t ) SelectingAction value a _i，t (ii) a All agents take corresponding action and receive the reward r of environmental feedback _i，t And cost

The environment transitions to the next state and each agent will receive a new observation o _i，t+1 And continues the above process. The basic elements of the method for constructing the multi-agent reinforcement learning relate to the following specific definitions:

(1) The intelligent agent:

and initializing an agent for each demand side resource aggregator, wherein the agent consists of a reward evaluator network, a cost evaluator network and an actor network, the reward evaluator network and the cost evaluator network are used for guiding the actor network to train, and the actor network is used for outputting a scheduling decision of the demand side resource aggregator on the managed demand side resource.

(2) Environment:

the local power market and the power distribution network jointly form an environment for interacting with the intelligent agent; the interaction process of the environment and the intelligent agent is as follows: after receiving the electricity purchasing quantity and the electricity selling quantity submitted by the demand side resource aggregator, the local electricity market clears the market by using the proposed local electricity market pricing strategy to obtain a local electricity purchasing price and an electricity selling price, and distributes the local electricity purchasing price and the electricity selling price to each demand side resource aggregator; meanwhile, the power distribution network operation platform feeds back the running state information of the power distribution network at the current moment, including node voltage and branch power, to each demand side resource aggregator;

(3) Observation variables:

and defining the observed value of each intelligent agent according to the monitoring value of the equipment operation state in the established demand side resource aggregation model and the price information transmitted to the intelligent agents by the environment, wherein the price information comprises the power grid electricity price, the local electricity purchasing electricity price and the local electricity selling price.

For agent i representing the microgrid, the set of observations at time t is:

for agent j representing the charging station of the electric vehicle, the observation set at the moment t is as follows:

(4) Action variables:

and defining the action value of each agent according to the control variable of the equipment in the established demand side resource aggregation model, wherein the action value of each agent forms a scheduling decision of a demand side resource aggregator on the demand side resources managed by the demand side resource aggregator.

For agent i representing the microgrid, the set of action variables is

For agent j representing the aggregator of electric vehicles, the action variable is

For agent k representing the aggregator of heating, ventilating and air conditioning, the action variable is

(5) The reward function:

the reward function for each agent is defined as its electricity sales revenue minus electricity purchase expenditure on the local electricity market, and the reward function value is calculated and fed back to the agent by the local electricity market in the environment: the reward function expression is:

the operator in the formula is defined as [ x ]] ₊ /[x] _- ＝max/min(0，x)。

(6) The cost function is:

defining a cost function according to the safety operation constraint of the power distribution network, wherein the cost function value is calculated by a power distribution network operation platform in the environment and fed back to the intelligent agents, the cost function of each intelligent agent is defined as the voltage out-of-limit punishment in the power distribution network partition to which the intelligent agent belongs, and the expression is as follows:

in the formula, V _n Is the voltage of node n;

andVthe upper limit and the lower limit of the node voltage are respectively; n is a radical of hydrogen _i The method is a node set in a power distribution network partition to which an agent i belongs.

In the invention, a multi-agent constrained soft operator-critical (MACSAC) algorithm is utilized to train agents representing resource aggregators on different demand sides, and the specific solving steps are as follows:

1) For each agent, parameters of the actor network, reward evaluator network, cost evaluator network, and an empty experience playback pool are initialized.

2) Inputting the observed quantity of each agent in the local power market to an action neural network to obtain an action value;

3) Each market member carries out scheduling control on demand side resources managed by the market member according to an action value given by the intelligent agent, calculates electricity purchasing and selling quantity and submits the electricity purchasing and selling quantity to a local electric power market (environment), the local electric power market clears the market according to a proposed pricing mechanism, rewards and cost including reward function values and price information and new observation information are fed back to the market members, and a power distribution network operation platform feeds back cost function values to the market members;

4) Storing an action value, an observation value, an incentive function value and a cost function value generated by interaction of the intelligent agent and the environment in an experience playback pool as a sample;

5) Randomly extracting a batch of samples from the experience playback pool, and updating the network parameters of the reward evaluator and the cost evaluator of each agent;

6) Estimating the future accumulated reward expectation value and the accumulated cost expectation value of each intelligent agent by using the reward evaluator network and the cost evaluator network, and guiding the updating of the actor network of each intelligent agent;

7) And repeating the steps 2) -6) until the number of times of updating the parameters reaches a set value, thereby obtaining the trained intelligent agent.

In an application stage of demand side resource collaborative optimization scheduling, after each intelligent agent receives an observed value at the current moment, an actor network is used for outputting an action value, a corresponding demand side resource aggregator schedules the managed demand side resource according to the action value, and performs power trading in a local power market, so that distributed collaborative optimization scheduling of the demand side resource is realized.

Study materials:

the effect of the method is verified based on a 33-node power distribution system, as shown in fig. 2, the system comprises two micro-grids (MG 1, MG 2), two electric vehicle charging stations (EV 1, EV 2) and two heating, ventilation and air conditioning aggregators (HVAC 1, HVAC 2), and as shown in fig. 2, the node numbers of the demand resource aggregators are 7, 15, 11, 23, 18, and 27 respectively. The photovoltaic power data, the electric vehicle battery parameters and other test data adopted in the experiment are all from open-source real data. Comparative methods used in the experiments include:

MACSAC: the invention adopts a multi-agent reinforcement learning method considering network security constraints;

MASAC: a multi-agent reinforcement learning method without considering network security constraints;

LMMR: the invention provides a pricing method meeting the balance of revenue and expenditure;

MMR: intermediate market price law.

Figure 3 shows the reward (average daily electricity cost) convergence curve and the voltage out-of-limit cost curve for both MACSAC and MASAC algorithms. As can be seen from fig. 3 (a), both MASAC and MACSAC algorithms converge after a limited number of training, and although MASAC corresponds to a higher reward value, according to fig. 3 (b), the MACSAC algorithm can reduce the voltage out-of-limit cost to approach 0, which is significantly better than the MASAC algorithm.

To further illustrate the advantages of the MACSAC algorithm in terms of satisfying the grid safety constraints, fig. 4 shows the voltage statistics of some nodes. It can be seen from FIG. 4 that, when the MASAC algorithm is adopted, the voltages of the nodes where MG1, MG2 and EV1 are located exceed the safety constraint range [0.95,1.05] p.u. in some situations, which is not acceptable in practical operation. However, when the MACSAC algorithm is employed, all node voltages are within safety constraints.

Table 1 shows each economic index when two pricing strategies, LMMR and MMR, are respectively adopted, where the total income represents the daily average total cost paid by the local electricity market members to the local electricity market operation platform, and the total expenditure represents the daily average total cost that the local electricity market operation platform needs to pay to the power grid.

TABLE 1 comparison of economic indicators

As can be seen from table 1: under the MMR method, there is a 28.9 dollar difference between the total revenue and total expenditure of the local electricity market operating platform, which does not exist when the LMMR method is employed. In addition, under the LMMR method, the electric car aggregators pay less money, because the proposed LMMR method will be price-incentivized when the trading decisions made by market members help to reduce network loss.

While the present invention has been described with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments, which are illustrative only and not restrictive, and various modifications which do not depart from the spirit of the present invention and which are intended to be covered by the claims of the present invention may be made by those skilled in the art.

Claims

1. A demand side resource collaborative optimization scheduling method based on a local power market is characterized by comprising the following steps:

s1, modeling three types of demand side resource aggregators, namely an electric vehicle charging station, a heating ventilation air conditioner aggregator and a micro-grid, so as to obtain a demand side resource aggregation model;

2. The demand side resource cooperative optimization scheduling method of claim 1, wherein the specific step of step S1 includes:

in the formula (1), the reaction mixture is,

representing the total power of the virtual energy storage i representing the electric vehicle charging station at time t,

and

respectively an upper power limit and a lower power limit;

and

in the formula (2), the reaction mixture is,

and

respectively an upper power limit and a lower power limit;

and

respectively an upper limit and a lower limit of electric quantity; alpha is the electric quantity attenuation rate of the virtual energy storage; in addition, the virtual energy storage representing the heating, ventilation and air conditioning also has a reference load

The parameters of (1);

s1-3) the microgrid comprises a photovoltaic system, an energy storage system and a non-flexible load; wherein:

the photovoltaic system is modeled as:

the energy storage system is modeled as follows:

the inflexible load modeling is as follows:

in the formulae (4) to (8),

outputting an active power predicted value for the photovoltaic;

outputting reactive power for the photovoltaic inverter;

outputting active power for the photovoltaic system; sigma is a power factor;

the output power of the energy storage i at the moment t is obtained;

and

respectively an upper limit and a lower limit of the energy storage power;

the electric quantity of the stored energy i at the moment t is obtained;

and

and

the active power of the inflexible load and a predicted value thereof;

and

the reactive power of the inflexible load and the predicted value of the reactive power are obtained.

3. The demand side resource collaborative optimization scheduling method according to claim 1, wherein in step S2, the local power market is formed by a demand side resource aggregator as a market member and a trading platform together; the local electricity market pricing strategy that meets the balance of revenue and expenditure requirements comprises: determining a basic electricity price; determining a network loss allocation price; determining local electricity purchasing price and local electricity selling price by combining the basic electricity price and the network loss allocation price; the local power market for guiding the demand side resources to run cooperatively is established based on a local power market pricing strategy, the local electricity purchasing price and the electricity selling price are determined in a self-adaptive mode according to the supply and demand balance condition on the local power market, and the demand side resource aggregator makes an optimal demand side resource scheduling decision according to the local electricity purchasing price and the electricity selling price, so that the demand side resources run cooperatively.

4. The demand side resource collaborative optimization scheduling method according to claim 1, wherein the specific content of step S2 is as follows:

s2-1) determining the basic electricity price:

firstly, each market member determines the electricity purchasing quantity or electricity selling quantity on the local electricity market according to the scheduling decision value of the managed demand side resource:

in the formula (9), the reaction mixture is,

representing the electricity purchasing quantity or electricity selling quantity, omega, submitted by the demand side resource aggregator i on the local electricity market at the moment t _EV 、Ω _HVAC And Ω _MG Respectively representing an electric vehicle charging station set, a heating ventilation air-conditioning aggregation business set and a micro-grid set in the market;

the local power market then calculates the total demand on the market

Total power generation

And net demand

In the formula (10), the reaction mixture is,

representing the increment of loss of the network caused by local electric power market trading; Ω represents a set of all market members;

according to intermediate market price definitionCalculating the basic electricity purchase price

And basic electricity selling price

In the formula (12), the reaction mixture is,

and

s2-2) determining the network loss apportionment price:

in the formula (13), the reaction mixture is,

in formulae (14) and (15):

5. The demand side resource collaborative optimization scheduling method according to claim 1, wherein in step S3, the constructing basic elements of the multi-agent reinforcement learning method includes:

s3-2) environment: the local power market and the power distribution network jointly form an environment for interacting with the intelligent agent;

the interaction process of the environment and the intelligent agent is as follows: after receiving the electricity purchasing quantity and the electricity selling quantity submitted by the demand side resource aggregator, the local electricity market clears the market by using the proposed local electricity market pricing strategy to obtain a local electricity purchasing price and an electricity selling price, and distributes the local electricity purchasing price and the electricity selling price to each demand side resource aggregator; meanwhile, the power distribution network operation platform feeds back the running state information of the power distribution network at the current moment, including node voltage and branch power, to each demand side resource aggregator;

for agent i representing the microgrid, the set of observations at time t is:

S3-4) action value: defining an action value of each agent according to the control variable of the equipment in the demand side resource aggregation model established in the step S1, wherein the action values of the agents form a scheduling decision of a demand side resource aggregator on the demand side resources managed by the demand side resource aggregator;

for agent i representing the microgrid, the set of action values is

the operator in equation (16) is defined as [ x ]] ₊ /[x] _- ＝max/min(0，x)；

in the formula (17), V _n Is the voltage of node n;

6. The demand side resource collaborative optimization scheduling method of claim 1, wherein in step S4, an agent representing different demand side resource aggregators is trained by using a multi-agent reinforcement learning algorithm MACSAC, and the specific steps are as follows:

s4-3) scheduling the demand side resources managed by each market member according to the action values given by the intelligent agents, calculating the electricity purchasing and selling quantity and submitting the electricity purchasing and selling quantity to a local electric power market, clearing the local electric power market according to the pricing mechanism provided in the step S2, feeding reward function values and price information back to the market members by the local electric power market, and feeding cost function values back to the market members by a power distribution network operation platform;

s4-7) repeating the steps S4-2) -S4-6) until the number of times of updating the parameters reaches a set value, and obtaining a trained intelligent agent; when the demand side resource collaborative optimization scheduling is carried out, after each intelligent agent receives the observed value of the current moment, the action value is output by using the actor network, and the corresponding demand side resource aggregator schedules the managed demand side resource according to the action value, so that the distributed collaborative optimization scheduling of the demand side resource is realized.