CN112001752A

CN112001752A - Multi-virtual power plant dynamic game transaction behavior analysis method based on limited rationality

Info

Publication number: CN112001752A
Application number: CN202010832963.3A
Authority: CN
Inventors: 高红均; 张凡; 刘友波; 刘俊勇
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2020-11-27

Abstract

The invention discloses a multi-virtual power plant dynamic game transaction behavior analysis method based on rationality, which comprises the following steps of: firstly, each virtual power plant bidding main body fully considers the target requirement of the development stage of the main body and researches the dynamic pricing behavior of an upper-layer operator; secondly, performing transaction behavior analysis on the multiple virtual power plants by adopting different transaction target modeling, performing evolutionary learning on limited transaction behavior information by adopting a particle swarm algorithm, and further improving the self target to gradually become optimal by learning a competitor strategy; finally, a dynamic game computing process of a multi-virtual power plant is researched, and a dynamic game particle swarm optimization algorithm is provided to be combined with an optimization tool box to solve a game model; the solution of the dynamic game model has good convergence, and a new idea and reference are provided for the virtual power plant to formulate different transaction targets to participate in market transaction.

Description

Multi-virtual power plant dynamic game transaction behavior analysis method based on limited rationality

Technical Field

The invention belongs to the technical field of transaction behavior analysis of multiple virtual power plants, and particularly relates to a multiple virtual power plant dynamic game transaction behavior analysis method based on rationality.

Background

Under the power development strategy of renewable energy and electric energy substitution, distributed power sources, energy storage and responsive loads are widely popularized and applied at the level of a power distribution network, so that the traditional large-scale power production and long-distance transmission gradually develop towards distributed energy. But large-scale networking thereof poses a great challenge to the safe operation of the power grid. At present, scholars at home and abroad start from simple profit subjects and have made a lot of researches on aspects such as power generation scheduling and bidding modes of virtual power plants. But as the national grid development strategy has changed in recent years, allowing more diverse social capital to emerge into the electricity market, each VPP will likely belong to a different stakeholder to operate. Under the driving of the self-rationality of each benefit agent, each benefit agent pursues self benefit maximization, and the traditional optimization scheduling method of a single agent is difficult to apply. However, most of the research has focused on the game relationship between the VPPs on the premise of a given transaction price, but the following problems still exist: firstly, the influence of a virtual power plant on market trading power price is not considered, and the virtual power plant is only used as a price acceptor to participate in market operation and is essentially still optimized in a VPP (virtual private Point); secondly, the decision environment is over-ideal, which is mostly game research under complete rationality conditions, not only the principal itself is required to have perfect rationality, but also the principals are required to trust the rationality of each other, which is difficult to achieve in practical situations. Meanwhile, most of the existing literature researches are to traditionally make an optimal bidding scheme with the goal of maximizing profit or minimizing operation cost, but in the future diversified market, each benefit agent has different development requirements in different development stages, and the trading behavior with the goal of maximizing profit or minimizing operation cost cannot occupy the bidding advantage in the fully competitive market.

Disclosure of Invention

The invention aims to provide a multi-virtual power plant dynamic game transaction behavior analysis method based on rationality, which is used for solving one of the technical problems in the prior art, such as: in the prior art, the following problems still exist: firstly, the influence of a virtual power plant on market trading power price is not considered, and the virtual power plant is only used as a price acceptor to participate in market operation and is essentially still optimized in a VPP (virtual private Point); secondly, the decision environment is over-ideal, which is mostly game research under complete rationality conditions, not only the principal itself is required to have perfect rationality, but also the principals are required to trust the rationality of each other, which is difficult to achieve in practical situations. Meanwhile, most of the existing literature researches are to traditionally make an optimal bidding scheme with the goal of maximizing profit or minimizing operation cost, but in the future diversified market, each benefit agent has different development requirements in different development stages, and the trading behavior with the goal of maximizing profit or minimizing operation cost cannot occupy the bidding advantage in the fully competitive market.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a multi-virtual power plant dynamic game transaction behavior analysis method based on limited rationality comprises the following steps:

s1: each virtual power plant bidding main body researches dynamic pricing behaviors of upper operators according to target requirements of a development stage of the virtual power plant bidding main body;

s2: adopting different trading target modeling to analyze trading behaviors of the multiple virtual power plants;

s3: carrying out evolutionary learning on the limited rational transaction behavior information based on a particle swarm algorithm, and gradually optimizing the target of the user by learning a competitor strategy;

s4: the dynamic game computing process of the multi-virtual power plant is researched, and a dynamic game particle swarm optimization algorithm is provided to be combined with an optimization tool box to solve a game model.

Further, according to the influence of different stages of VPP development and decision behaviors of competitors on self optimization decisions, a decision scheme with different targets of multiple VPPs is established; considering the power purchasing and selling quantity and the distributed power output in a VPP as strategies, designing a plurality of VPP bidding decision schemes with different targets, wherein the VPP can select different transaction targets and decision schemes by considering the difference of power consumption of users in different areas and the different pursuits of VPP alliance targets; secondly, in the process of playing the game, each main body can change the behavior rule of the main body after integrating the market transaction behaviors of other competitors to convert the target; in the game playing process, information of competitors can be continuously acquired through evolutionary learning; the strategy made along with the continuous increase of the self-acquired information can be gradually optimized; eventually, no principal can benefit unilaterally from changing bidding strategies, i.e., a stable strategy is achieved.

Further, the stable equilibrium strategy of the multi-benefit agent under the limited rationality condition can be defined as: in dynamic gaming with N principals of interest, there is a stable policy combination X ═ X (X)₁,X₂,...,X_N) For any other policy combination Y ═ (Y)₁,Y₂,...,Y_N) I Y ≠ X, there are main bodies i and₀e (0,1) is such that:

in the formula: i (Y)_i,S_-i) And I (X)_i,S_-i) Respectively adopting Y for main body i_iAnd X_iTarget utility conditions at the time of the policy; s_-iIs a combination of policies for other entities of the system than entity i.

Furthermore, the VPP game strategy considers the output plan of each distributed power supply in the VPP, the charging and discharging power of stored energy, the response power of a response load, and the output power of clean energy such as wind and light; the VPP obtains the system load demand and possible bidding strategies of competitors according to the limited market information and other VPP bidding historical data which are mastered at present, so that the maximum self-effectiveness is realized.

Further, the different transaction targets in step S2 include:

the game objective function of the conservative VPP is considered as maximizing self-benefit, and internal resources including distributed power generation cost, energy storage cost and responsive load compensation cost are fully integrated; deciding an optimal bidding strategy; the VPP bidding objective function is:

in the formula: lambda [ alpha ]^D-V,sell、λ^D-V,buyRespectively selling electricity price and purchasing electricity price from VPP to DSO;

the electric quantity purchased and sold by the VPP to the DSO respectively; n is a VPP total number set, and VPPi belongs to N; n is a radical of_iIs DER set contained in ith VPP;

for the cost of power generation by MT in DER, a_j、b_j、c_jFor the cost factor of the MT unit in DERj,

output power for MT;

in order to be a cost of the ESS operation,

for the scheduling cost factor of the ESS,

the charging and discharging power of the ESS, the value of which is more than 0 represents discharging, and the value of which is less than 0 represents charging;

in order to be able to respond to the load cost,

in response toThe load compensates for the electricity price,

response power for the response load; theta is subsidy electricity price of the government for new energy power generation;

respectively photovoltaic power output and wind power output;

the game objective function of the aggressive VPP is considered as the maximization of the market share scale, and the electric quantity bidding is carried out in the market in a maximized manner on the premise that the internal resources are integrated to meet the constraint output of each unit, so that the exchange electric quantity is obtained on the DSO level as much as possible, and the maximum power purchase quantity is obtained; the VPP bidding objective function is:

bidding constraint:

in the formula:

the power purchased from the DSO for the ith VPP,

power sold from the DSO for the ith VPP;

maximum value of purchasing and selling electric quantity bidding is carried out for VPPi respectively;

after the DSO uniform purchase and sale electricity price and the transaction electricity quantity are obtained, under the premise that an aggressive VPP objective function is met, other schedulable resources in the VPP are coordinated to meet the surplus load electricity quantity, and an electricity quantity shortage scheduling principle meets a lowest cost principle;

the game objective function of the stable VPP is considered as the maximum utilization of renewable energy, the renewable energy is considered as wind power and photovoltaic, meanwhile, the uncertainty of the power generation of the renewable energy is fully considered, and the influence of the fluctuation of the output of the renewable energy on the power supply stability of the system is stabilized by using the ESS; then, the output conditions of other units in the VPP and the electricity quantity bidding to the upper DSO are considered, and the VPP bidding objective function is as follows:

in the formula: alpha and beta are objective function weight coefficients; rho_j,tThe wind, light and storage combined electricity selling price is obtained;

wind, light and storage combined output; f is the fluctuation variance of the wind, light and storage combined output power; f. of₀The minimum allowed fluctuation variance value of the system.

Further, when the VPP performs a decision response, no matter which transaction target is executed, the constraint conditions that need to be satisfied include:

and power balance constraint:

in the formula:

electricity usage that is unresponsive load;

is the output power of VPPi;

gas turbine set constraint:

in the formula:

the maximum power and the minimum power of the MT output are respectively the upper limit and the lower limit of the maximum power and the minimum power; r is_d,i、r_u,iThe downward slope climbing rate and the upward slope climbing rate of the unit j are respectively; Δ T is an operating period;

responsive to load constraints:

in the formula:

the maximum response quantity of the responsive load;

and (3) renewable energy output constraint:

in the formula:

respectively predicting the maximum value of wind and light output according to historical data;

energy storage restraint:

in the formula:

a 0-1 variable for ESS discharge and charge status; n is a radical of_ESSAn upper limit for the number of discharge and charge state transitions for the ESS;

the upper and lower limits of the discharge power;

the charging power is the upper and lower limits;

at t for ESSCapacity status of the time period;

upper and lower capacity limits for consideration of ESS life and other factors;

respectively charge and discharge efficiency coefficient, the value range of which is

Further, the transaction is considered at the DSO level as:

DSO establishes electricity selling price lambda for VPP^D-V,sellAnd the electricity purchase price lambda^D-V,buy(ii) a The DSO takes the maximized self-benefit as a game target, including the electricity purchasing and selling cost and benefit between the DSO and a superior power grid and a lower multi-VPP;

maxu^DSO＝F_VPP-F_DSO

in the formula: f_VPPRevenue obtained for transactions between the DSO and the VPP; f_DSOCosts spent in trading DSO with superior grids; specific expressions of the terms are as follows:

in the formula: t is a total scheduling time interval; n is a radical of_VPPIn order to be the number of the VPPs,

for the real-time electricity rates established by the DSO,

a power value for the ith VPP to interact with the DSO in the tth scheduling period;

for the border node electricity prices between the DSO and the upper grid,

the value of the transmission power between the DSO and the upper power grid is greater than 0, which indicates that the power is purchased from the upper power grid, and less than 0 indicates that the power is sold to the upper power grid.

Further, after obtaining the competitive bidding data of the competitor, the competitor strategy judgment index coefficient is as follows:

in the formula: xi₁,ξ₂,ξ₃Index coefficient for judging competitor strategy, when xi₁≥ξ_lim1When the measured value is "aggressive", when xi is₂≤ξ_lim2When it is stable, the product is judged to be stable₃≤ξ_lim3The time is judged to be 'conservative', wherein xi is_lim1,ξ_lim2,ξ_lim3Determining a constant for the index coefficient, and obtaining the constant through historical data of each VPP bidding; if the strategy index judgment is not within the index coefficient judgment constant limit value at the end of one game round, the strategy result is not output, and the next iteration round is continued;

each VPP determines the total benefit of the VPP through selecting a game strategy through evolutionary learning; each strategy corresponds to a transaction behavior mode under the strategy and is used by rho_mnRepresents the condition switching state of VPPi trading behavior mode x (u) from strategy m to strategy n; after obtaining the competitive bidding information of competitors, VPPi learns and evolves the strategy of the competitors; if the strategy n is more than the strategy mWith greater utility, then strategy m will learn strategy n and make changes; expressed as:

in the formula: x is the number of_m(u) and x_n(u) transaction behavior patterns of VPPi under policies m and n, respectively, when

If the value is more than 0, updating the VPP strategy according to the condition switching state, and if the value is more than 0, updating the VPP strategy according to the condition switching state

When the value is always less than 0, a stable equilibrium strategy is achieved; u. of^mAnd uⁿRespectively representing the utility of the corresponding trading behavior modes of the strategies m and n, uⁿ-u^m]₊Is max {0, uⁿ-u^m}。

Compared with the prior art, the invention has the beneficial effects that:

the virtual power plant bidding method has the innovation points that firstly, each virtual power plant bidding main body fully considers the target requirement of the development stage of the virtual power plant bidding main body and researches the dynamic pricing behavior of an upper-layer operator; secondly, performing transaction behavior analysis on the multiple virtual power plants by adopting different transaction target modeling, performing evolutionary learning on limited transaction behavior information by adopting a particle swarm algorithm, and further improving the self target to gradually become optimal by learning a competitor strategy; finally, a dynamic game computing process of a multi-virtual power plant is researched, and a dynamic game particle swarm optimization algorithm is provided to be combined with an optimization tool box to solve a game model; the solution of the dynamic game model has good convergence, and a new idea and reference are provided for the virtual power plant to formulate different transaction targets to participate in market transaction.

Drawings

Fig. 1 is a flow chart of a multi-virtual power plant dynamic game transaction behavior analysis method based on rationality.

FIG. 2 is a system block diagram of a multiple virtual power plant.

FIG. 3 is a schematic diagram of different transaction objectives of multiple virtual power plants.

Fig. 4 is a multi-virtual power plant dynamic game solving process.

Fig. 5 is a schematic diagram of the algorithm convergence process.

Fig. 6 is a schematic diagram of transaction electricity price optimization.

Fig. 7 is a schematic diagram of the transaction result of the virtual power plant 1.

Fig. 8 is a schematic diagram of the transaction result of the virtual power plant 2.

Fig. 9 is a schematic diagram of the transaction result of the virtual power plant 3.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to fig. 1 to 9 of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example (b):

a multi-virtual power plant dynamic game transaction behavior analysis method based on limited rationality is characterized by comprising the following steps:

each virtual power plant bidding main body fully considers the target requirements of the development stage of the main body and researches the dynamic pricing behaviors of upper-layer operators;

adopting different trading target modeling to analyze trading behaviors of the multiple virtual power plants;

carrying out evolutionary learning on the limited rational transaction behavior information based on a particle swarm algorithm, and further improving the self target to gradually become optimal by learning a competitor strategy;

the dynamic game computing process of the multi-virtual power plant is researched, and a dynamic game particle swarm optimization algorithm is provided to be combined with an optimization tool box to solve a game model.

Considering the influence of different stages of VPP development and decision behaviors of competitors on self optimization decision, a decision scheme with different targets of multiple VPPs is established. The method mainly considers the power consumption including the power purchasing and selling in the VPP, the distributed power supply and other output as strategies, and designs a plurality of VPP bidding decision schemes with different targets, wherein firstly, the VPP possibly selects different transaction targets and decision schemes by considering the power consumption difference of users in VPPs in different areas and the pursuit of VPP alliance targets to be different; and secondly, in the process of playing the game, each main body can change own behavior rules after integrating market transaction behaviors of other competitors to convert targets. In the process of playing the game, the information of competitors is continuously acquired through evolutionary learning. The strategy made along with the increasing of self-acquired information can be gradually optimized. Eventually, no principal can benefit unilaterally from changing bidding strategies, i.e., a stable strategy is achieved.

The stable equilibrium strategy of multi-benefit agents under limited rational conditions can be defined as: in dynamic gaming with N principals of interest, there is a stable policy combination X ═ X (X)₁,X₂,...,X_N) For any other policy combination Y ═ (Y)₁,Y₂,...,Y_N) I Y ≠ X, there are main bodies i and₀e (0,1) is such that:

The VPP game strategy considers the output plan of each distributed power supply in the VPP, the charging and discharging power of stored energy, the response power of a response load, and the output power of clean energy such as wind and light. The VPP obtains the system load demand and possible bidding strategies of competitors according to the limited market information and other VPP bidding historical data which are mastered at present, so that the maximum self-effectiveness is realized. It should be noted that the "utility" herein is different from the utility function of the conventional optimized scheduling, and refers to making an optimal strategy according to the development of itself.

The game objective function of the conservative VPP is considered to maximize self-benefit, and internal resources including the costs of distributed power generation, energy storage, responsive load compensation and the like are fully integrated. And (5) deciding an optimal bidding strategy. The VPP bidding objective function is:

output power for MT;

in order to be a cost of the ESS operation,

for the scheduling cost factor of the ESS,

in order to be able to respond to the load cost,

in order to compensate the electricity prices in response to the load,

respectively photovoltaic power output and wind power output. The main parties in this document consider operational benefits, neglecting the relatively low operating and maintenance costs of new energy generation.

The game objective function of the radical VPP is considered as the maximization of the market share scale, and the electric quantity bidding is carried out in the market in a maximized mode on the premise that internal resources are integrated to meet the constraint output of each unit, so that the exchange electric quantity is obtained on the DSO level as much as possible, and the maximum power purchase quantity is obtained. The VPP bidding objective function is:

bidding constraint:

in the formula:

the power purchased from the DSO for the ith VPP,

power sold from the DSO for the ith VPP;

the maximum value of the purchased and sold electric quantity bidding is respectively carried out for the VPPi.

After the DSO unified purchase and sale electricity price and the transaction electricity quantity are obtained, on the premise that the objective function formula (5) is met, other schedulable resources in the VPP are coordinated, the surplus load electricity quantity shortage is met, and the electricity quantity shortage scheduling principle meets the lowest cost principle.

The game objective function of the stable VPP is considered to be the maximum utilization of renewable energy, the renewable energy is considered to be wind power and photovoltaic, meanwhile, the uncertainty of the power generation of the renewable energy is fully considered, and the influence of the fluctuation of the output of the renewable energy on the power supply stability of the system is stabilized by using the ESS. Then, the output conditions of other units in the VPP and the electricity quantity bidding to the upper DSO are considered, and the VPP bidding objective function is as follows:

1) And power balance constraint:

in the formula:

electricity usage that is unresponsive load;

is the output power of VPPi.

2) Gas turbine set constraint:

in the formula:

the maximum power and the minimum power of the MT output are respectively the upper limit and the lower limit of the maximum power and the minimum power; r is_d,i、r_u,iThe downward slope climbing rate and the upward slope climbing rate of the unit j are respectively; Δ T is an operating period.

3) Responsive to load constraints:

in the formula:

is the maximum response quantity of the responsive load.

4) And (3) renewable energy output constraint:

in the formula:

respectively, the maximum value of the wind-solar output predicted according to historical data.

5) Energy storage restraint:

in the formula:

the upper and lower limits of the discharge power;

the charging power is the upper and lower limits;

is the capacity state of the ESS in the t period;

DSO establishes electricity selling price lambda for VPP^D-V,sellAnd the electricity purchase price lambda^D-V,buy. The DSO takes the maximization of self-benefit as a game target, and comprises the electricity purchasing and selling cost and benefit between the DSO and an upper-level power grid and a lower-level multi-VPP.

maxu^DSO＝F_VPP-F_DSO

In the formula: f_VPPRevenue obtained for transactions between the DSO and the VPP; f_DSOThe cost spent for the DSO to transact with the upper level grid. Specific expressions of the terms are as follows:

for the real-time electricity rates established by the DSO,

for the border node electricity prices between the DSO and the upper grid,

After obtaining the competitive bidding data of the competitor, the competitor strategy judgment index coefficient is as follows:

in the formula: xi₁,ξ₂,ξ₃Index coefficient for judging competitor strategy, when xi₁≥ξ_lim1When the measured value is "aggressive", when xi is₂≤ξ_lim2When it is stable, the product is judged to be stable₃≤ξ_lim3The time is judged to be 'conservative', wherein xi is_lim1,ξ_lim2,ξ_lim3And (4) judging a constant for the index coefficient, and obtaining through historical data of each VPP bidding. If the strategy index judgment is not within the index coefficient judgment constant limit value at the end of one game round, the strategy result is not output, and the next iteration round is continued.

Each VPP determines the total benefit of the VPP through the selection of the game strategy through evolutionary learning. Each strategy corresponds to a transaction behavior mode under the strategy and is used by rho_mnRepresents the conditional switch state of VPPi trade behavior pattern x (u) from policy m to policy n. After obtaining the competitive bidding information of competitors, VPPi learns to evolve the strategy of the competitors. If policy n has a higher utility than policy m, then policy m will learn policy n and make changes. Expressed as:

Referring to fig. 1, fig. 1 is a flowchart of a method for analyzing dynamic game transaction behaviors of a multi-virtual power plant based on rationality; the flowchart includes steps 101 to 104;

in step 101, each virtual power plant bidding main body fully considers the target demand of the development stage of the main body and researches the dynamic pricing behavior of an upper-layer operator;

in step 102, transaction behavior analysis is carried out on the multiple virtual power plants by adopting different transaction target modeling;

in one embodiment of the present invention, each federated VPP may experience different types of electricity purchase and sale states throughout the operation of the power distribution system:

the method comprises three types of electricity surplus (electricity selling VPP), electricity shortage (electricity purchasing VPP) and electricity self-sufficiency. Therefore, there is a demand for power purchase in a multi-VPP system.

The VPP comprehensively considers the output of each unit, the power generation of renewable energy sources, the electricity price information, the strategy information of other operators and the like on the premise of meeting the internal power supply of the VPP, coordinates the internal resources, establishes the target which best meets the self development, and plays games with other competitors under the condition of meeting various constraints.

The DSO layer scheduling center comprehensively considers the electric quantity information uploaded by each VPP of the lower layer, considers the maximized social benefit, formulates the uniform market purchase and sale electricity price and the transaction electric quantity of each VPP, reasonably adjusts the internal resource output of the VPPs according to the electricity price and the transaction electric quantity formulated by the DSO to each VPP, and formulates the transaction electric quantity with the DSO.

In step 103, evolutionary learning is carried out on the limited rational transaction behavior information based on the particle swarm optimization, and the competitor strategy is learned to further improve the self target so that the self target is gradually optimized;

after the game starts, the participants can learn and improve according to each round of strategy of the two parties, and the game is a continuous and repeated continuous transaction process.

And (3) taking each VPP in the market as an individual to participate in the game, respectively considering decision information of other competitors to make an objective function strategy for maximizing the self utility, solving the self optimal decision by the VPPi after obtaining the competitive bidding data of the competitors, and repeating the i ∈ N to obtain the first round game strategy of each main body.

And each main body updates the previous game strategy until each main body can not benefit from changing the bidding strategy in a unilateral way, and the final stable and balanced state is achieved.

In step 104, a dynamic game computing process of the multi-virtual power plant is researched, and a dynamic game particle swarm optimization algorithm is provided to solve a game model in combination with an optimization tool box.

In one embodiment of the invention, fig. 4 illustrates a solving process of a multi-virtual power plant dynamic game.

In the model solution, DSO gives the purchase price. In the game process, each VPP learns continuously acquired electricity price information and strategy information formulated by other competitors to select the optimal solution of the most suitable self-transaction target.

In the DSO layer, the electricity price strategy is used as a particle, the income under the electricity price is used as the fitness of the particle, and the optimal electricity price strategy is searched by using an intelligent optimization algorithm. The whole solving process can be regarded as two parts of DSO electricity price optimizing and multi-VPP strategy learning solving. An improved particle swarm algorithm is combined with a Yalmip platform to call a CPLEX solver to solve the dynamic game evolutionary learning model of the multiple VPPs with different transaction targets.

Referring to fig. 2, fig. 2 is a system configuration diagram of a multiple virtual power plant. The VPP mainly comprises a micro gas turbine set, a wind turbine set, a photovoltaic system, an energy storage system, a responsive load and the like. The micro gas turbine set has stable power output, can respond quickly and can flexibly meet the requirements required and met by a load change system, so that the micro gas turbine set is used as a schedulable distributed power supply inside a VPP.

The VPP is used as a whole to externally add DSO and schedule a power market, and is mainly used for carrying out coordinated operation control on a plurality of DER inside the VPP and carrying out negotiation interaction on a plurality of parties such as user loads and equipment maintenance managers. The VPP regulates and controls each internal resource and carries out energy and information interaction on the upper DSO through the energy management platform.

The energy management platform has the functions of day-ahead prediction, decision optimization, real-time monitoring and the like, and can store and process DER internal load historical power utilization conditions, output conditions of each power supply, user information, weather data and upper-layer DSO price information in a database, so that VPPs can make more accurate decisions.

Referring to fig. 3, firstly, considering that the difference of power consumption of users in different areas is different and the target pursuit of VPP alliance is different, the VPP may select different transaction target and decision scheme; and secondly, in the process of playing the game, each main body can change own behavior rules after integrating market transaction behaviors of other competitors to convert targets. An objective function of different transaction objective types is formulated for the VPP.

In one embodiment of the present invention, the internal constituent elements of each VPP are shown in Table 1. And selecting a corresponding cost function, constraint conditions and an initial objective function to form a limited rational game model containing 3 VPPs. Based on the example system, after the DSO receives bidding strategies submitted by various VPP operators, market trading power rates are determined, and social benefits are maximized. After 19 rounds of games, the selection strategy of each VPP tends to be stable after learning, and the obtained income also tends to be stable. Each VPP cannot change own strategy from one side to obtain more benefits, and at the moment, each main strategy target reaches the optimal equilibrium solution.

TABLE 1 VPP internal resources

In one embodiment of the invention, the following two operating strategies are set:

case 1: and the power grid electricity price and the internet electricity price are used as the purchase and sale electricity price, the VPPs are independently optimized by adopting a non-cooperative game, and the VPPs are respectively optimized by selecting a response target strategy.

Case 2: competitive bidding strategies of competitors are evolved and learned among VPPs, the rationality degree is improved, and meanwhile, the price optimization is carried out, namely, the game model provided by the invention is adopted.

As can be seen from the simulation result of FIG. 5, the dynamic game equilibrium solution for solving different transaction targets of the multi-virtual power plant by the algorithm of the invention is effective and convergent, and can achieve good solving precision and calculation efficiency.

In two scenarios, the DSO makes a purchase and sale electricity price as shown in fig. 6, a state of 3 exists in the VPP at each moment, and if the transaction electricity quantity is greater than 0, it indicates that the VPP sells electricity to the DSO, which is a "electricity selling VPP"; if the transaction electric quantity is less than 0, the VPP purchases electricity from the DSO, and the electricity purchasing VPP is determined; if the transaction electric quantity is equal to 0, the supply and demand in the VPP are balanced, and the VPP is a self-balancing VPP (in the figure 8, the drawing is convenient, the transaction electric quantity is positive to represent electricity purchasing, and negative to represent electricity selling). At this time, if the change of the remaining competitor policy causes the change of the purchase and sale price, the change of the VPP purchase and sale policy may be caused.

7-9 show that VPP1 adopts a 'conservative' type to aim at maximizing income per se, VPP1 supplies more than demand in time periods 4-19, so electricity is sold to the superior operator, and the rest time periods have electricity shortage, so electricity needs to be purchased to the superior operator; the VPP2 adopts an 'aggressive type' to maximize the electricity purchasing of the superior operator and improve the operation scale of the VPP, and except for time periods 8-10 and 16, the electricity purchasing is carried out on the superior operator in other time periods. VPP2 has the greatest demand during peak load periods compared to VPP1 and VPP 3. Meanwhile, load increment and reduction which are most beneficial to self scale expansion are set through RL regulation and control, and the occupation ratio of the load increment and the reduction to the electric quantity transaction of a superior operator is increased. The VPP3 adopts stable type to consider renewable energy sources to generate power maximally, and simultaneously ensures the power supply stability of the VPP3, so that a wind-light-storage combined power generation unit is established. As can be seen from fig. 9, after the output fluctuation of the renewable energy is stabilized by the energy storage unit through the output scheduling of the joint unit, the stability of the presented power curve is greatly improved. In the time period 15-18, a certain amount of wind curtailment light curtailment is generated in order to keep the fluctuation variance from exceeding the limit due to the energy storage unit capacity and the charge-discharge power limit per time period. In the time periods 9-12 and 16-18, the power is required to be purchased by the superior operators to meet the power balance due to the shortage of the self-electricity.

Table 2 shows the revenue comparison of DSO using the Case1 electricity pricing strategy and the Case2 electricity pricing strategy, and Table 3 shows the revenue comparison (units/dollars) of each VPP using the Case1 electricity pricing strategy and the Case2 electricity pricing strategy.

TABLE 2 comparison of benefits of different electricity price strategies for DSO

TABLE 3 VPP different Electricity price policy revenue comparison

By comparison, after the DSO is optimized by electricity price, the electricity purchasing and selling quantity of each VPP is increased and balanced. Compared with the trading of the power grid price, the trading yield between the DSO and the VPP is improved by 7.8%. Under the optimal target operation condition of the game, the VPPs can reduce the operation cost and achieve the new balance of transaction.

The established dynamic game model considering the participation of the multiple virtual power plants under the limited rationality condition of different transaction targets obtains the following conclusion by carrying out example analysis on the multiple virtual power plant system integrating different distributed energy sources: 1) compared with the traditional single objective function with the minimum operation cost, the method can effectively meet the diversified development requirements of each virtual power plant in different stages of development in the future; 2) the established limited rational transaction behavior learning enables the VPP to achieve the improvement of rationality degree through the integration of internal resources and the learning of competitive bidding strategies of competitors, and can also generate larger influence on competitive bidding results. Enabling each VPP to gain greater utility in market operations through trading; 3) the intelligent algorithm and the lower-layer mixed integer programming problem are combined to be solved to obtain the equilibrium solution, and compared with the traditional intelligent algorithm used alone, the method greatly saves the solving time and can quickly converge.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A multi-virtual power plant dynamic game transaction behavior analysis method based on limited rationality is characterized by comprising the following steps:

2. The limited rationality-based multi-virtual power plant dynamic game transaction behavior analysis method is characterized in that a multi-VPP decision scheme with different targets is established according to the influence of different stages of VPP development and decision behaviors of competitors on self optimization decisions; considering the power purchasing and selling quantity and the distributed power output in a VPP as strategies, designing a plurality of VPP bidding decision schemes with different targets, wherein the VPP can select different transaction targets and decision schemes by considering the difference of power consumption of users in different areas and the different pursuits of VPP alliance targets; secondly, in the process of playing the game, each main body can change the behavior rule of the main body after integrating the market transaction behaviors of other competitors to convert the target; in the game playing process, information of competitors can be continuously acquired through evolutionary learning; the strategy made along with the continuous increase of the self-acquired information can be gradually optimized; eventually, no principal can benefit unilaterally from changing bidding strategies, i.e., a stable strategy is achieved.

3. The rational-limited-based multi-virtual-power-plant dynamic game transaction behavior analysis method according to claim 2, wherein a stability balancing strategy of a multi-benefit agent under the rational-limited condition is defined as: in N stakeholdersIn the body-played dynamic game, a stable strategy combination X ═ X (X) exists₁,X₂,...,X_N) For any other policy combination Y ═ (Y)₁,Y₂,...,Y_N) I Y ≠ X, there are main bodies i and₀e (0,1) is such that:

4. The rational-limited-based multi-virtual-power-plant dynamic game transaction behavior analysis method according to claim 3, wherein the game strategy of the VPP considers the output plan of each distributed power supply in the VPP, the charging and discharging power of stored energy, the response power of a responsive load, and the output power of clean energy such as wind and light; the VPP obtains the system load demand and possible bidding strategies of competitors according to the limited market information and other VPP bidding historical data which are mastered at present, so that the maximum self-effectiveness is realized.

5. The limited rationality-based multi-virtual power plant dynamic game transaction behavior analysis method according to claim 1, wherein the different transaction objectives in step S2 include:

output power for MT;

in order to be a cost of the ESS operation,

for the scheduling cost factor of the ESS,

in order to be able to respond to the load cost,

in order to compensate the electricity prices in response to the load,

respectively photovoltaic power output and wind power output;

bidding constraint:

in the formula:

the power purchased from the DSO for the ith VPP,

coming out of DSO for ith VPP(ii) power sold;

6. The limited rationality-based multi-virtual power plant dynamic game transaction behavior analysis method according to claim 5, wherein when the VPP performs decision response, no matter which transaction target is executed, the constraint conditions required to be met include:

and power balance constraint:

in the formula:

electricity usage that is unresponsive load;

is the output power of VPPi;

gas turbine set constraint:

in the formula:

the maximum power and the minimum power of the MT output are respectively the upper limit and the lower limit of the maximum power and the minimum power; r is_d,i、r_u,iThe downward slope climbing rate and the upward slope climbing rate of the unit j are respectively; Δ T is oneAn operational period;

responsive to load constraints:

in the formula:

the maximum response quantity of the responsive load;

and (3) renewable energy output constraint:

in the formula:

energy storage restraint:

in the formula:

the upper and lower limits of the discharge power;

the charging power is the upper and lower limits;

is the capacity state of the ESS in the t period;

7. The limited rationality-based multi-virtual power plant dynamic gaming transaction behavior analysis method of claim 6, wherein the transaction is considered at the DSO level as:

maxu^DSO＝F_VPP-F_DSO

in the formula: f_VPPIs DSO andrevenue obtained from transactions between VPPs; f_DSOCosts spent in trading DSO with superior grids; specific expressions of the terms are as follows:

for the real-time electricity rates established by the DSO,

a power value for the ith VPP to interact with the DSO in the tth scheduling period; lambda [ alpha ]_t ^DSOFor the boundary node between DSO and the upper grid, P_t ^DSOThe value of the transmission power between the DSO and the upper power grid is greater than 0, which indicates that the power is purchased from the upper power grid, and less than 0 indicates that the power is sold to the upper power grid.

8. The limited rationality-based multi-virtual power plant dynamic game transaction behavior analysis method according to claim 7, wherein after competitor bidding data is obtained, the competitor strategy judgment index coefficients are as follows:

each VPP determines the total benefit of the VPP through selecting a game strategy through evolutionary learning; each strategy corresponds to a transaction behavior mode under the strategy and is used by rho_mnRepresents the condition switching state of VPPi trading behavior mode x (u) from strategy m to strategy n; after obtaining the competitive bidding information of competitors, VPPi learns and evolves the strategy of the competitors; if policy n has a higher utility than policy m, then policy m will learn policy n and make changes; expressed as: