CN110048461B

CN110048461B - Multi-virtual power plant decentralized self-discipline optimization method

Info

Publication number: CN110048461B
Application number: CN201910409906.1A
Authority: CN
Inventors: 赵瑞锋; 王彬; 郭文鑫; 卢建刚; 刘文涛; 李世明; 李波; 徐展强
Original assignee: Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2021-07-02
Anticipated expiration: 2039-05-16
Also published as: CN110048461A

Abstract

The application discloses many virtual power plant dispersion autonomous optimization method includes: s1, initializing parameters; s2, classifying the tasks and forming an initial knowledge matrix; s3, acquiring information; s4, determining the action of the optimizing individual; s5, calculating objective function values of all agents; s6, calculating a reward function; s7, updating the knowledge matrix; s8, information feedback: each agent returns the current optimal solution to the information center; s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, return to S3. The application discloses a distributed autonomous optimization method for multiple virtual power plants, which solves the technical problems that the existing distribution network regulation and control is difficult to meet the condition that multiple virtual power plants participate in the power market in real time for profit, and the grid-connected behavior of distributed equipment is effectively controlled to support the safe and effective operation of the distribution network.

Description

Multi-virtual power plant decentralized self-discipline optimization method

Technical Field

The application relates to the technical field of distribution network scheduling, in particular to a distributed autonomous optimization method for multiple virtual power plants.

Background

With the development of energy utilization technology, a large number of distributed power supplies are connected to a distribution network, and the number of distributed energy storage systems and controllable loads is increased day by day. On one hand, due to the huge amount, the influence of the comprehensive effect on the power grid cannot be ignored; on the other hand, the huge number of the devices makes it impossible for the power grid to directly and independently schedule each distributed device, and meanwhile, the influence of a single device on the power grid is small, and the direct control significance of the power grid on the single device is not large. And the electric power market construction is promoted orderly. Due to the characteristics of small capacity, distributed layout and high output randomness, the massive distributed power supply mainly using clean energy is difficult to participate in electric power market competition as an independent individual in fact. This will affect the enthusiasm of the construction of the distributed renewable energy sources, and the market effect will not be played the best. A virtual power plant is an ideal choice to solve the above problems. The virtual power plant technology is that a controllable load, a distributed power supply and an energy storage system are organically combined through a virtual power plant control center, and the controllable load, the distributed power supply and the energy storage system are enabled to participate in operation in the power grid and the power market in the identity of an equivalent power plant.

However, renewable energy sources have large uncertainty, and the existing distribution network regulation and control is difficult to meet the demand that a plurality of virtual power plants participate in the power market in real time and make profit by profit, and effectively control the grid-connected behavior of distributed equipment to support the safe and effective operation of a distribution network.

Disclosure of Invention

The application provides a decentralized self-discipline optimization method for multiple virtual power plants, and solves the technical problems that the existing distribution network regulation and control is difficult to meet the condition that multiple virtual power plants participate in the power market in real time for profit, and the grid-connected behavior of distributed equipment is effectively controlled to support the safe and effective operation of the distribution network.

In view of this, the present application provides a decentralized autonomous optimization method for multiple virtual power plants, including:

s1, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | J_hL, penalty factor η, and reward cooperation constant W₀；

S2, classifying the tasks and forming an initial knowledge matrix: if the current task is a source task, an initial knowledge matrix of the current task is formed randomly; if the current task is a new task, screening out a similar source task with the maximum similarity to the current task, and calculating an initial memory matrix of the current task according to the optimal memory matrix of the similar source task;

s3, information acquisition: the method comprises the following steps that the intelligent agents obtain current decision variables of other intelligent agents from an information center, one intelligent agent corresponds to a virtual power plant, and one intelligent agent comprises a plurality of optimizing individuals; the decision variables comprise the output power of micro fuel power generation equipment in the virtual power plant at the moment t, the output power of fan equipment at the moment t, the adjustable power of a controllable load at the moment t, the charge and discharge power of an energy storage system at the moment t and the electricity purchasing cost of multilateral transactions;

s4, determining the action of the optimizing individual: determining action values corresponding to the optimizing individuals according to the state-action chain;

s5, calculating the objective function value of each agent:

calculating the corresponding benefit of the intelligent agent through a preset benefit formula according to the acquired decision variable;

the preset benefit formula is as follows:

wherein the control variable vector x is a decision variable inside the virtual power plant h; s is a wind power and photovoltaic output scene obtained based on historical data; pi(s) is the probability of s scene occurrence;

as virtual electricityThe power purchasing and selling income of the factory i in the real-time market and the large power grid,

generating electricity purchasing and selling benefits for the virtual power plant i through the multilateral contract and other virtual power plants in electricity quantity transaction;

the power generation cost of the virtual power plant i;

network loss cost of the virtual power plant i;

the switching cost of the virtual power plant i; n is_sIs the total number of virtual power plants; n is_TThe total time of the preset time;

converting the calculated benefits into objective function values corresponding to the intelligent agents;

s6, calculating a reward function:

calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained through conversion;

the preset reward formula is as follows:

F_i ^Bestrepresenting the minimum value of an objective function of the optimal individual in the population in the kth iteration of the ith agent; f_i ^kjRepresenting the objective function in the kth iteration of the ith agent; p is a radical of_mIs a positive multiple; c. C_fRepresents a correction factor for ensuring that the reward function is positive;

a set of state-action pairs representing the optimal individual in the kth iteration of the ith agent; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)^k，s^k+1，a^k) In an action a^kSlave state when occurring s^kTransition to state s^k+1The reward function of (2); (s, a) represents a state-action pair;

s7, updating the knowledge matrix:

updating the knowledge matrix through a preset formula group; the preset formula group is as follows:

selecting action a in the knowledge matrix representing the kth iteration_imArrival state

The corresponding knowledge value of the current knowledge value,

selecting an action in the k-th knowledge matrix

Arrival state

The corresponding knowledge value of the current knowledge value,

is a knowledge increment; j is the population size in one iteration; a is_imThe action is a preset optional action; m is the total number of decision variables;

representing the action value searched by the jth individual in the ith agent in the kth iteration for the mth decision variable;

representing the state of the jth individual in the ith agent in the kth iteration with respect to the mth decision variable; alpha is a learning factor; gamma is a discount factor; a. the_imIs a preset optional action a_imA set of (a); s8, information feedback: each agent returns the current optimal solution to the information center;

s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, returning to the S3.

Preferably, the S4 specifically includes:

the action value corresponding to each optimizing individual is determined through a preset state-action chain formula;

the preset state-action chain formula is as follows:

wherein q is₀A random number between 0 and 1; epsilon is the probability of adopting a greedy optimization method strategy; a is_randIs the probability of adopting a random optimization strategy;

representing the action value found by the jth individual in the ith agent for the mth decision variable in the kth iteration.

Preferably, in S2, if the current task is a new task, screening out a similar source task with the largest similarity to the current task, and calculating the initial memory matrix of the current task according to the optimal memory matrix of the similar source task specifically includes:

if the current task is a new task, screening out a first similar source task and a second similar source task which have the maximum similarity with the current task;

calculating a first migration weight of the first similar source task relative to the current task and a second migration weight of the second similar source task relative to the current task through a migration weight calculation formula;

the migration weight calculation formula is as follows:

r_epsimilarity of the first similar source task and the current task is obtained; r is_eqSimilarity of the second similar source task and the current task is obtained; t is t_epIs the first migration weight; t is t_eqIs the second migration weight;

calculating an initial memory matrix of the current task through a preset memory matrix calculation formula according to the first migration weight and the second migration weight;

the preset memory matrix calculation formula is as follows:

an initial memory matrix for the current task with respect to a variable i;

an optimal memory matrix for the first similar source task with respect to a variable i;

an optimal memory matrix for the second similar source task with respect to variable i.

Preferably, the S8 specifically includes:

and each intelligent agent sends the current optimal solution to the information center to replace the decision variables corresponding to the intelligent agent in the archive set of the information center after the last iteration.

Preferably, before the S2, the method further includes:

acquiring virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters;

the virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity;

the scene parameters comprise wind speed and load size.

According to the technical scheme, the method has the following advantages:

the application provides a multi-virtual power plant decentralized self-discipline optimization method which comprises the steps that (1) the dimension of a memory matrix is effectively reduced by adopting a mutual-connection state-action chain, and a dimension disaster is avoided; meanwhile, a plurality of individuals are organized by means of the group intelligent technology to carry out autonomous optimization of the intelligent agent, and a memory matrix is introduced to enable the optimizing individuals to have memory self-learning capacity, so that the optimizing rate is greatly improved; (2) the memory migration based on the similarity can effectively avoid negative migration, so that the new task can be subjected to online learning on the basis of the learning and memory of the source task, and the optimization efficiency is obviously improved; (3) by establishing an information center in a multi-agent system, complete information dynamic non-cooperative game among a plurality of agents is realized, and the finally obtained Nash equilibrium solution enables the income of all the agents to be as large as possible. When the method is used for solving the decentralized self-discipline optimization of the multiple virtual power plants, a scheduling scheme which enables the net benefits of all the virtual power plants to be as large as possible can be found.

Drawings

Fig. 1 is a flowchart of an implementation manner of the distributed autonomous optimization method for multiple virtual power plants provided by the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of an implementation manner of a distributed autonomic optimization method for multiple virtual power plants, where the method includes:

step 101, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | J_hL, penalty factor η, and reward cooperation constant W₀。

The optimization effect of the multi-virtual power plant decentralized self-discipline optimization method provided by the application is influenced by a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimization individual number | J_hL, penalty factor η, and reward cooperation constant W₀The influence of (c). The initial settings of the above parameters are provided as an example, see table 1 below.

TABLE 1 Algorithm parameter set

102, obtaining virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters. The virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity; the scene parameters include wind speed and load size. The embodiment that this application provided has contained 5 virtual power plants, 10 wind-powered electricity generation/photovoltaic output scenes, and the day-ahead price of electricity is regarded as the definite value.

Step 103, classifying the tasks and forming an initial knowledge matrix: if the current task is a source task, an initial knowledge matrix of the current task is randomly formed; and if the current task is a new task, screening out a similar source task with the maximum similarity to the current task, and calculating an initial memory matrix of the current task according to the optimal memory matrix of the similar source task.

After a task is determined, the task may be classified into a source task and a new task by classifying the task. If the source task is a source task, the initial knowledge matrix can be randomly generated.

If the task is a new task, such as a new task e, two first similar source tasks and two second similar source tasks, such as a similar source task p and a similar source task q, which have the greatest similarity to the new task e can be screened out.

The initial memory matrix of the new task e can be obtained by the optimal memory matrix of the similar source task p and the similar source task q. First, a first migration weight of the similar source task p relative to the new task e is calculated through a migration weight calculation formula, and a second migration weight of the similar source task q relative to the new task e is calculated.

The migration weight calculation formula is as follows:

r_epsimilarity of the first similar source task and the current task is obtained; r is_eqSimilarity of the second similar source task and the current task is obtained; t is t_epIs the first migration weight; t is t_eqIs the second migration weight.

The calculated migration weight, i.e., t, may be utilized_epAnd t_eqCalculating an initial memory matrix of the new task e through a preset memory matrix calculation formula;

the preset memory matrix calculation formula is as follows:

an initial memory matrix for the current task with respect to a variable i;

By introducing the memory matrix, the optimization individual can have the memory self-learning capability, and the optimization rate is greatly improved.

Step 104, information acquisition: the agents obtain the current decision variables of the remaining agents from the information center.

Wherein, an agent corresponds a virtual power plant, through regarding every virtual power plant as an agent, agrees multilateral electric power transaction price through Nash game, can realize distributed autonomous optimization. An agent also contains a plurality of optimizing individuals.

The decision variables comprise the output power of the micro fuel power generation equipment in the virtual power plant at the moment t, the output power of the fan equipment at the moment t, the adjustable power of the controllable load at the moment t, the charging and discharging power of the energy storage system at the moment t and the electricity purchasing cost of the multilateral transaction.

It should be noted that an information center exists in the multi-virtual power plant decentralized autonomous optimization model provided by the application. Each virtual power plant needs to feed back the optimized scheduling information of the region to the information center in real time, and meanwhile, the scheduling information of other subsystems can be known in real time through the information center. Considering that the whole network information center can realize the complete disclosure of the scheduling information of each subsystem, the complete information dynamic non-cooperative game among the virtual power plants can be carried out, and the direct purchase price on the multilateral contract can be obtained through the Nash equilibrium state of the non-cooperative game.

Step 105, determining the action of the optimizing individual: and determining action values corresponding to the optimized individuals according to the state-action chain.

Specifically, the action value corresponding to each optimization unit can be determined by a preset state-action chain formula;

the preset state-action chain formula is:

The fast optimization is developed by adopting a Q learning algorithm, and the original large-scale knowledge matrix can be decomposed into a plurality of small-scale knowledge matrices Q by adopting a state-action chain_imThe dimensionality of the memory matrix is effectively reduced, and dimension disaster is avoided.

Step 106, calculating the objective function value of each agent:

the preset benefit formula is as follows:

for the virtual power plant i to buy and sell power income in the real-time market and the large power grid,

the power generation cost of the virtual power plant i;

network loss cost of the virtual power plant i;

the switching cost of the virtual power plant i; n is_sIs the total number of virtual power plants; n is_TThe total time of the preset time; and converting the calculated benefits into objective function values corresponding to the intelligent agents.

The sub-targeting functions of the agent may be set as:

sub-targeting function f_hEqual to the reciprocal of the benefit formula, hence the objective function f of the agent_hThe smaller the gain, the greater the gain.

It should be noted that in the distributed autonomous optimization model of multiple virtual power plants, multiple virtual power plants are contracted by signing multilateral power trading contracts, and the direct purchase price agreed with the specified direct purchase price is agreed and obtained by all the virtual power plants; the output decision and the electricity price decision of each virtual power plant can influence the electricity purchasing cost, the network loss cost and the switch cost of other virtual power plants, and therefore benefit games exist among the virtual power plants.

The optimization goal of each virtual power plant is to consider the reconfiguration of a distribution network and the maximization of net profit participating in the competition of the power market, and the profit is derived from the power selling income of the large power grid and the multilateral contract income of the rest virtual power plants; the cost is derived from the cost of electricity purchase to the large power grid, the cost of electricity generation, the cost of grid loss, the cost of switching loss, and the cost of electricity purchase to the remaining virtual power plants.

Step 107, calculating a reward function:

calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained by conversion;

the preset reward formula is as follows:

step 108, updating the knowledge matrix:

and updating the knowledge matrix through a preset formula group.

The preset formula group is as follows:

is a knowledge increment; j is the population size in one iteration; a is_imThe action is a preset optional action; m is the total number of decision variables; a. the_imIs a preset optional action a_imA collection of (a).

And realizing optimization by sharing cooperative individuals in the population and updating the corresponding knowledge matrix, wherein the knowledge updating is carried out in a local greedy manner so as to ensure the global convergence effect of the algorithm.

Step 109, information feedback: and each agent returns the current optimal solution to the information center.

Specifically, each agent sends the current optimal solution to the information center, and replaces the decision variables after the last iteration corresponding to the agent in the archive set of the information center.

Archive set A in an information center_rThe latest policy combination for the whole multi-agent system is preserved, namely: a. the_r＝[x₁ ^*,...,x_h ^*...,x_H ^*]H is the number of agents, x_h ^*Is the latest policy of agent h. At the end of the kth iteration, agent h solves its current optimal solution x_h ^k*Sending to information center by x_h ^k*Alternative archive set A_rThe corresponding variable in (1); after the next iteration starts, the agent h obtains the decision data of the other agents from the information center, and the sub-objective function f of the agent h is based on the decision data_hOptimizing to obtain a new round of optimal solution x_h ^k+1*And sending the information to the information center again to update A_r. Due to the objective function f of the agent_hThe smaller the gain, the greater the gain. When all agents cannot reduce their objective function values unilaterally, A_rThe policy combination stored in the system is not changed any more, and the algorithm is converged at the moment, so that the policy combination is the Nash equilibrium solution.

Step 110, judging whether the maximum iteration times are reached, and if the maximum iteration times are reached, outputting an optimal knowledge matrix of a corresponding task; otherwise, return to step 104.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A multi-virtual power plant decentralized self-discipline optimization method is characterized by comprising the following steps:

s1, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | J_hL, penalty factor eta and prizeExcitation constant W₀；

s5, calculating the objective function value of each agent:

the preset benefit formula is as follows:

is deficiency ofThe simulated power plant i generates electricity purchasing and selling benefits through the multilateral contract and other virtual power plants in electricity quantity transaction;

the power generation cost of the virtual power plant i;

network loss cost of the virtual power plant i;

s6, calculating a reward function:

the preset reward formula is as follows:

representing the optimal one in the kth iteration of the ith agentA set of state-action pairs for a body; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)^k，s^k+1，a^k) In an action a^kSlave state when occurring s^kTransition to state s^k+1The reward function of (2); (s, a) represents a state-action pair;

s7, updating the knowledge matrix:

selecting action a from the k-th knowledge matrix_imArrival state

The corresponding knowledge value;

selecting an action in the k-th knowledge matrix

Arrival state

The corresponding knowledge value;

2. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, wherein the S4 specifically includes:

the preset state-action chain formula is as follows:

3. The distributed autonomous optimization method for multiple virtual power plants according to claim 1, wherein in S2, if the current task is a new task, screening out a similar source task with the greatest similarity to the current task, and calculating the initial memory matrix of the current task according to the optimal memory matrix of the similar source task specifically comprises:

the migration weight calculation formula is as follows:

the preset memory matrix calculation formula is as follows:

an initial memory matrix for the current task with respect to a variable i;

4. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, wherein the S8 specifically includes:

5. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, further comprising, before said S2:

the scene parameters comprise wind speed and load size.