CN110048461B - Multi-virtual power plant decentralized self-discipline optimization method - Google Patents

Multi-virtual power plant decentralized self-discipline optimization method Download PDF

Info

Publication number
CN110048461B
CN110048461B CN201910409906.1A CN201910409906A CN110048461B CN 110048461 B CN110048461 B CN 110048461B CN 201910409906 A CN201910409906 A CN 201910409906A CN 110048461 B CN110048461 B CN 110048461B
Authority
CN
China
Prior art keywords
task
virtual power
action
power plant
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910409906.1A
Other languages
Chinese (zh)
Other versions
CN110048461A (en
Inventor
赵瑞锋
王彬
郭文鑫
卢建刚
刘文涛
李世明
李波
徐展强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN201910409906.1A priority Critical patent/CN110048461B/en
Publication of CN110048461A publication Critical patent/CN110048461A/en
Application granted granted Critical
Publication of CN110048461B publication Critical patent/CN110048461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • H02J3/382
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers

Abstract

The application discloses many virtual power plant dispersion autonomous optimization method includes: s1, initializing parameters; s2, classifying the tasks and forming an initial knowledge matrix; s3, acquiring information; s4, determining the action of the optimizing individual; s5, calculating objective function values of all agents; s6, calculating a reward function; s7, updating the knowledge matrix; s8, information feedback: each agent returns the current optimal solution to the information center; s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, return to S3. The application discloses a distributed autonomous optimization method for multiple virtual power plants, which solves the technical problems that the existing distribution network regulation and control is difficult to meet the condition that multiple virtual power plants participate in the power market in real time for profit, and the grid-connected behavior of distributed equipment is effectively controlled to support the safe and effective operation of the distribution network.

Description

Multi-virtual power plant decentralized self-discipline optimization method
Technical Field
The application relates to the technical field of distribution network scheduling, in particular to a distributed autonomous optimization method for multiple virtual power plants.
Background
With the development of energy utilization technology, a large number of distributed power supplies are connected to a distribution network, and the number of distributed energy storage systems and controllable loads is increased day by day. On one hand, due to the huge amount, the influence of the comprehensive effect on the power grid cannot be ignored; on the other hand, the huge number of the devices makes it impossible for the power grid to directly and independently schedule each distributed device, and meanwhile, the influence of a single device on the power grid is small, and the direct control significance of the power grid on the single device is not large. And the electric power market construction is promoted orderly. Due to the characteristics of small capacity, distributed layout and high output randomness, the massive distributed power supply mainly using clean energy is difficult to participate in electric power market competition as an independent individual in fact. This will affect the enthusiasm of the construction of the distributed renewable energy sources, and the market effect will not be played the best. A virtual power plant is an ideal choice to solve the above problems. The virtual power plant technology is that a controllable load, a distributed power supply and an energy storage system are organically combined through a virtual power plant control center, and the controllable load, the distributed power supply and the energy storage system are enabled to participate in operation in the power grid and the power market in the identity of an equivalent power plant.
However, renewable energy sources have large uncertainty, and the existing distribution network regulation and control is difficult to meet the demand that a plurality of virtual power plants participate in the power market in real time and make profit by profit, and effectively control the grid-connected behavior of distributed equipment to support the safe and effective operation of a distribution network.
Disclosure of Invention
The application provides a decentralized self-discipline optimization method for multiple virtual power plants, and solves the technical problems that the existing distribution network regulation and control is difficult to meet the condition that multiple virtual power plants participate in the power market in real time for profit, and the grid-connected behavior of distributed equipment is effectively controlled to support the safe and effective operation of the distribution network.
In view of this, the present application provides a decentralized autonomous optimization method for multiple virtual power plants, including:
s1, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | JhL, penalty factor η, and reward cooperation constant W0
S2, classifying the tasks and forming an initial knowledge matrix: if the current task is a source task, an initial knowledge matrix of the current task is formed randomly; if the current task is a new task, screening out a similar source task with the maximum similarity to the current task, and calculating an initial memory matrix of the current task according to the optimal memory matrix of the similar source task;
s3, information acquisition: the method comprises the following steps that the intelligent agents obtain current decision variables of other intelligent agents from an information center, one intelligent agent corresponds to a virtual power plant, and one intelligent agent comprises a plurality of optimizing individuals; the decision variables comprise the output power of micro fuel power generation equipment in the virtual power plant at the moment t, the output power of fan equipment at the moment t, the adjustable power of a controllable load at the moment t, the charge and discharge power of an energy storage system at the moment t and the electricity purchasing cost of multilateral transactions;
s4, determining the action of the optimizing individual: determining action values corresponding to the optimizing individuals according to the state-action chain;
s5, calculating the objective function value of each agent:
calculating the corresponding benefit of the intelligent agent through a preset benefit formula according to the acquired decision variable;
the preset benefit formula is as follows:
Figure GDA0003004010910000021
wherein the control variable vector x is a decision variable inside the virtual power plant h; s is a wind power and photovoltaic output scene obtained based on historical data; pi(s) is the probability of s scene occurrence;
Figure GDA0003004010910000022
as virtual electricityThe power purchasing and selling income of the factory i in the real-time market and the large power grid,
Figure GDA0003004010910000023
generating electricity purchasing and selling benefits for the virtual power plant i through the multilateral contract and other virtual power plants in electricity quantity transaction;
Figure GDA0003004010910000024
the power generation cost of the virtual power plant i;
Figure GDA0003004010910000025
network loss cost of the virtual power plant i;
Figure GDA0003004010910000026
the switching cost of the virtual power plant i; n issIs the total number of virtual power plants; n isTThe total time of the preset time;
converting the calculated benefits into objective function values corresponding to the intelligent agents;
s6, calculating a reward function:
calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained through conversion;
the preset reward formula is as follows:
Figure GDA0003004010910000027
Figure GDA0003004010910000031
Fi Bestrepresenting the minimum value of an objective function of the optimal individual in the population in the kth iteration of the ith agent; fi kjRepresenting the objective function in the kth iteration of the ith agent; p is a radical ofmIs a positive multiple; c. CfRepresents a correction factor for ensuring that the reward function is positive;
Figure GDA00030040109100000312
a set of state-action pairs representing the optimal individual in the kth iteration of the ith agent; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)k,sk+1,ak) In an action akSlave state when occurring skTransition to state sk+1The reward function of (2); (s, a) represents a state-action pair;
s7, updating the knowledge matrix:
updating the knowledge matrix through a preset formula group; the preset formula group is as follows:
Figure GDA0003004010910000032
Figure GDA0003004010910000033
selecting action a in the knowledge matrix representing the kth iterationimArrival state
Figure GDA0003004010910000034
The corresponding knowledge value of the current knowledge value,
Figure GDA0003004010910000035
selecting an action in the k-th knowledge matrix
Figure GDA0003004010910000036
Arrival state
Figure GDA0003004010910000037
The corresponding knowledge value of the current knowledge value,
Figure GDA0003004010910000038
is a knowledge increment; j is the population size in one iteration; a isimThe action is a preset optional action; m is the total number of decision variables;
Figure GDA0003004010910000039
representing the action value searched by the jth individual in the ith agent in the kth iteration for the mth decision variable;
Figure GDA00030040109100000310
representing the state of the jth individual in the ith agent in the kth iteration with respect to the mth decision variable; alpha is a learning factor; gamma is a discount factor; a. theimIs a preset optional action aimA set of (a); s8, information feedback: each agent returns the current optimal solution to the information center;
s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, returning to the S3.
Preferably, the S4 specifically includes:
the action value corresponding to each optimizing individual is determined through a preset state-action chain formula;
the preset state-action chain formula is as follows:
Figure GDA00030040109100000311
wherein q is0A random number between 0 and 1; epsilon is the probability of adopting a greedy optimization method strategy; a israndIs the probability of adopting a random optimization strategy;
Figure GDA0003004010910000041
representing the action value found by the jth individual in the ith agent for the mth decision variable in the kth iteration.
Preferably, in S2, if the current task is a new task, screening out a similar source task with the largest similarity to the current task, and calculating the initial memory matrix of the current task according to the optimal memory matrix of the similar source task specifically includes:
if the current task is a new task, screening out a first similar source task and a second similar source task which have the maximum similarity with the current task;
calculating a first migration weight of the first similar source task relative to the current task and a second migration weight of the second similar source task relative to the current task through a migration weight calculation formula;
the migration weight calculation formula is as follows:
Figure GDA0003004010910000042
repsimilarity of the first similar source task and the current task is obtained; r iseqSimilarity of the second similar source task and the current task is obtained; t is tepIs the first migration weight; t is teqIs the second migration weight;
calculating an initial memory matrix of the current task through a preset memory matrix calculation formula according to the first migration weight and the second migration weight;
the preset memory matrix calculation formula is as follows:
Figure GDA0003004010910000043
Figure GDA0003004010910000044
an initial memory matrix for the current task with respect to a variable i;
Figure GDA0003004010910000045
an optimal memory matrix for the first similar source task with respect to a variable i;
Figure GDA0003004010910000046
an optimal memory matrix for the second similar source task with respect to variable i.
Preferably, the S8 specifically includes:
and each intelligent agent sends the current optimal solution to the information center to replace the decision variables corresponding to the intelligent agent in the archive set of the information center after the last iteration.
Preferably, before the S2, the method further includes:
acquiring virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters;
the virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity;
the scene parameters comprise wind speed and load size.
According to the technical scheme, the method has the following advantages:
the application provides a multi-virtual power plant decentralized self-discipline optimization method which comprises the steps that (1) the dimension of a memory matrix is effectively reduced by adopting a mutual-connection state-action chain, and a dimension disaster is avoided; meanwhile, a plurality of individuals are organized by means of the group intelligent technology to carry out autonomous optimization of the intelligent agent, and a memory matrix is introduced to enable the optimizing individuals to have memory self-learning capacity, so that the optimizing rate is greatly improved; (2) the memory migration based on the similarity can effectively avoid negative migration, so that the new task can be subjected to online learning on the basis of the learning and memory of the source task, and the optimization efficiency is obviously improved; (3) by establishing an information center in a multi-agent system, complete information dynamic non-cooperative game among a plurality of agents is realized, and the finally obtained Nash equilibrium solution enables the income of all the agents to be as large as possible. When the method is used for solving the decentralized self-discipline optimization of the multiple virtual power plants, a scheduling scheme which enables the net benefits of all the virtual power plants to be as large as possible can be found.
Drawings
Fig. 1 is a flowchart of an implementation manner of the distributed autonomous optimization method for multiple virtual power plants provided by the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of an implementation manner of a distributed autonomic optimization method for multiple virtual power plants, where the method includes:
step 101, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | JhL, penalty factor η, and reward cooperation constant W0
The optimization effect of the multi-virtual power plant decentralized self-discipline optimization method provided by the application is influenced by a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimization individual number | JhL, penalty factor η, and reward cooperation constant W0The influence of (c). The initial settings of the above parameters are provided as an example, see table 1 below.
TABLE 1 Algorithm parameter set
Figure GDA0003004010910000061
102, obtaining virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters. The virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity; the scene parameters include wind speed and load size. The embodiment that this application provided has contained 5 virtual power plants, 10 wind-powered electricity generation/photovoltaic output scenes, and the day-ahead price of electricity is regarded as the definite value.
Step 103, classifying the tasks and forming an initial knowledge matrix: if the current task is a source task, an initial knowledge matrix of the current task is randomly formed; and if the current task is a new task, screening out a similar source task with the maximum similarity to the current task, and calculating an initial memory matrix of the current task according to the optimal memory matrix of the similar source task.
After a task is determined, the task may be classified into a source task and a new task by classifying the task. If the source task is a source task, the initial knowledge matrix can be randomly generated.
If the task is a new task, such as a new task e, two first similar source tasks and two second similar source tasks, such as a similar source task p and a similar source task q, which have the greatest similarity to the new task e can be screened out.
The initial memory matrix of the new task e can be obtained by the optimal memory matrix of the similar source task p and the similar source task q. First, a first migration weight of the similar source task p relative to the new task e is calculated through a migration weight calculation formula, and a second migration weight of the similar source task q relative to the new task e is calculated.
The migration weight calculation formula is as follows:
Figure GDA0003004010910000062
repsimilarity of the first similar source task and the current task is obtained; r iseqSimilarity of the second similar source task and the current task is obtained; t is tepIs the first migration weight; t is teqIs the second migration weight.
The calculated migration weight, i.e., t, may be utilizedepAnd teqCalculating an initial memory matrix of the new task e through a preset memory matrix calculation formula;
the preset memory matrix calculation formula is as follows:
Figure GDA0003004010910000071
Figure GDA0003004010910000072
an initial memory matrix for the current task with respect to a variable i;
Figure GDA0003004010910000073
an optimal memory matrix for the first similar source task with respect to a variable i;
Figure GDA0003004010910000074
an optimal memory matrix for the second similar source task with respect to variable i.
By introducing the memory matrix, the optimization individual can have the memory self-learning capability, and the optimization rate is greatly improved.
Step 104, information acquisition: the agents obtain the current decision variables of the remaining agents from the information center.
Wherein, an agent corresponds a virtual power plant, through regarding every virtual power plant as an agent, agrees multilateral electric power transaction price through Nash game, can realize distributed autonomous optimization. An agent also contains a plurality of optimizing individuals.
The decision variables comprise the output power of the micro fuel power generation equipment in the virtual power plant at the moment t, the output power of the fan equipment at the moment t, the adjustable power of the controllable load at the moment t, the charging and discharging power of the energy storage system at the moment t and the electricity purchasing cost of the multilateral transaction.
It should be noted that an information center exists in the multi-virtual power plant decentralized autonomous optimization model provided by the application. Each virtual power plant needs to feed back the optimized scheduling information of the region to the information center in real time, and meanwhile, the scheduling information of other subsystems can be known in real time through the information center. Considering that the whole network information center can realize the complete disclosure of the scheduling information of each subsystem, the complete information dynamic non-cooperative game among the virtual power plants can be carried out, and the direct purchase price on the multilateral contract can be obtained through the Nash equilibrium state of the non-cooperative game.
Step 105, determining the action of the optimizing individual: and determining action values corresponding to the optimized individuals according to the state-action chain.
Specifically, the action value corresponding to each optimization unit can be determined by a preset state-action chain formula;
the preset state-action chain formula is:
Figure GDA0003004010910000081
wherein q is0A random number between 0 and 1; epsilon is the probability of adopting a greedy optimization method strategy; a israndIs the probability of adopting a random optimization strategy;
Figure GDA0003004010910000082
representing the action value found by the jth individual in the ith agent for the mth decision variable in the kth iteration.
The fast optimization is developed by adopting a Q learning algorithm, and the original large-scale knowledge matrix can be decomposed into a plurality of small-scale knowledge matrices Q by adopting a state-action chainimThe dimensionality of the memory matrix is effectively reduced, and dimension disaster is avoided.
Step 106, calculating the objective function value of each agent:
calculating the corresponding benefit of the intelligent agent through a preset benefit formula according to the acquired decision variable;
the preset benefit formula is as follows:
Figure GDA0003004010910000083
wherein the control variable vector x is a decision variable inside the virtual power plant h; s is a wind power and photovoltaic output scene obtained based on historical data; pi(s) is the probability of s scene occurrence;
Figure GDA0003004010910000084
for the virtual power plant i to buy and sell power income in the real-time market and the large power grid,
Figure GDA0003004010910000085
generating electricity purchasing and selling benefits for the virtual power plant i through the multilateral contract and other virtual power plants in electricity quantity transaction;
Figure GDA0003004010910000086
the power generation cost of the virtual power plant i;
Figure GDA0003004010910000087
network loss cost of the virtual power plant i;
Figure GDA0003004010910000088
the switching cost of the virtual power plant i; n issIs the total number of virtual power plants; n isTThe total time of the preset time; and converting the calculated benefits into objective function values corresponding to the intelligent agents.
The sub-targeting functions of the agent may be set as:
Figure GDA0003004010910000089
sub-targeting function fhEqual to the reciprocal of the benefit formula, hence the objective function f of the agenthThe smaller the gain, the greater the gain.
It should be noted that in the distributed autonomous optimization model of multiple virtual power plants, multiple virtual power plants are contracted by signing multilateral power trading contracts, and the direct purchase price agreed with the specified direct purchase price is agreed and obtained by all the virtual power plants; the output decision and the electricity price decision of each virtual power plant can influence the electricity purchasing cost, the network loss cost and the switch cost of other virtual power plants, and therefore benefit games exist among the virtual power plants.
The optimization goal of each virtual power plant is to consider the reconfiguration of a distribution network and the maximization of net profit participating in the competition of the power market, and the profit is derived from the power selling income of the large power grid and the multilateral contract income of the rest virtual power plants; the cost is derived from the cost of electricity purchase to the large power grid, the cost of electricity generation, the cost of grid loss, the cost of switching loss, and the cost of electricity purchase to the remaining virtual power plants.
Step 107, calculating a reward function:
calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained by conversion;
the preset reward formula is as follows:
Figure GDA0003004010910000091
Figure GDA0003004010910000092
Fi Bestrepresenting the minimum value of an objective function of the optimal individual in the population in the kth iteration of the ith agent; fi kjRepresenting the objective function in the kth iteration of the ith agent; p is a radical ofmIs a positive multiple; c. CfRepresents a correction factor for ensuring that the reward function is positive;
Figure GDA0003004010910000095
a set of state-action pairs representing the optimal individual in the kth iteration of the ith agent; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)k,sk+1,ak) In an action akSlave state when occurring skTransition to state sk+1The reward function of (2); (s, a) represents a state-action pair;
step 108, updating the knowledge matrix:
and updating the knowledge matrix through a preset formula group.
The preset formula group is as follows:
Figure GDA0003004010910000093
Figure GDA0003004010910000094
is a knowledge increment; j is the population size in one iteration; a isimThe action is a preset optional action; m is the total number of decision variables; a. theimIs a preset optional action aimA collection of (a).
And realizing optimization by sharing cooperative individuals in the population and updating the corresponding knowledge matrix, wherein the knowledge updating is carried out in a local greedy manner so as to ensure the global convergence effect of the algorithm.
Step 109, information feedback: and each agent returns the current optimal solution to the information center.
Specifically, each agent sends the current optimal solution to the information center, and replaces the decision variables after the last iteration corresponding to the agent in the archive set of the information center.
Archive set A in an information centerrThe latest policy combination for the whole multi-agent system is preserved, namely: a. ther=[x1 *,...,xh *...,xH *]H is the number of agents, xh *Is the latest policy of agent h. At the end of the kth iteration, agent h solves its current optimal solution xh k*Sending to information center by xh k*Alternative archive set ArThe corresponding variable in (1); after the next iteration starts, the agent h obtains the decision data of the other agents from the information center, and the sub-objective function f of the agent h is based on the decision datahOptimizing to obtain a new round of optimal solution xh k+1*And sending the information to the information center again to update Ar. Due to the objective function f of the agenthThe smaller the gain, the greater the gain. When all agents cannot reduce their objective function values unilaterally, ArThe policy combination stored in the system is not changed any more, and the algorithm is converged at the moment, so that the policy combination is the Nash equilibrium solution.
Step 110, judging whether the maximum iteration times are reached, and if the maximum iteration times are reached, outputting an optimal knowledge matrix of a corresponding task; otherwise, return to step 104.
The application provides a multi-virtual power plant decentralized self-discipline optimization method which comprises the steps that (1) the dimension of a memory matrix is effectively reduced by adopting a mutual-connection state-action chain, and a dimension disaster is avoided; meanwhile, a plurality of individuals are organized by means of the group intelligent technology to carry out autonomous optimization of the intelligent agent, and a memory matrix is introduced to enable the optimizing individuals to have memory self-learning capacity, so that the optimizing rate is greatly improved; (2) the memory migration based on the similarity can effectively avoid negative migration, so that the new task can be subjected to online learning on the basis of the learning and memory of the source task, and the optimization efficiency is obviously improved; (3) by establishing an information center in a multi-agent system, complete information dynamic non-cooperative game among a plurality of agents is realized, and the finally obtained Nash equilibrium solution enables the income of all the agents to be as large as possible. When the method is used for solving the decentralized self-discipline optimization of the multiple virtual power plants, a scheduling scheme which enables the net benefits of all the virtual power plants to be as large as possible can be found.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (5)

1. A multi-virtual power plant decentralized self-discipline optimization method is characterized by comprising the following steps:
s1, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | JhL, penalty factor eta and prizeExcitation constant W0
S2, classifying the tasks and forming an initial knowledge matrix: if the current task is a source task, an initial knowledge matrix of the current task is formed randomly; if the current task is a new task, screening out a similar source task with the maximum similarity to the current task, and calculating an initial memory matrix of the current task according to the optimal memory matrix of the similar source task;
s3, information acquisition: the method comprises the following steps that the intelligent agents obtain current decision variables of other intelligent agents from an information center, one intelligent agent corresponds to a virtual power plant, and one intelligent agent comprises a plurality of optimizing individuals; the decision variables comprise the output power of micro fuel power generation equipment in the virtual power plant at the moment t, the output power of fan equipment at the moment t, the adjustable power of a controllable load at the moment t, the charge and discharge power of an energy storage system at the moment t and the electricity purchasing cost of multilateral transactions;
s4, determining the action of the optimizing individual: determining action values corresponding to the optimizing individuals according to the state-action chain;
s5, calculating the objective function value of each agent:
calculating the corresponding benefit of the intelligent agent through a preset benefit formula according to the acquired decision variable;
the preset benefit formula is as follows:
Figure FDA0003004010900000011
wherein the control variable vector x is a decision variable inside the virtual power plant h; s is a wind power and photovoltaic output scene obtained based on historical data; pi(s) is the probability of s scene occurrence;
Figure FDA0003004010900000012
for the virtual power plant i to buy and sell power income in the real-time market and the large power grid,
Figure FDA0003004010900000013
is deficiency ofThe simulated power plant i generates electricity purchasing and selling benefits through the multilateral contract and other virtual power plants in electricity quantity transaction;
Figure FDA0003004010900000014
the power generation cost of the virtual power plant i;
Figure FDA0003004010900000015
network loss cost of the virtual power plant i;
Figure FDA0003004010900000016
the switching cost of the virtual power plant i; n issIs the total number of virtual power plants; n isTThe total time of the preset time;
converting the calculated benefits into objective function values corresponding to the intelligent agents;
s6, calculating a reward function:
calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained through conversion;
the preset reward formula is as follows:
Figure FDA0003004010900000021
Figure FDA0003004010900000022
Fi Bestrepresenting the minimum value of an objective function of the optimal individual in the population in the kth iteration of the ith agent; fi kjRepresenting the objective function in the kth iteration of the ith agent; p is a radical ofmIs a positive multiple; c. CfRepresents a correction factor for ensuring that the reward function is positive;
Figure FDA00030040109000000212
representing the optimal one in the kth iteration of the ith agentA set of state-action pairs for a body; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)k,sk+1,ak) In an action akSlave state when occurring skTransition to state sk+1The reward function of (2); (s, a) represents a state-action pair;
s7, updating the knowledge matrix:
updating the knowledge matrix through a preset formula group; the preset formula group is as follows:
Figure FDA0003004010900000023
Figure FDA0003004010900000024
selecting action a from the k-th knowledge matriximArrival state
Figure FDA0003004010900000025
The corresponding knowledge value;
Figure FDA0003004010900000026
selecting an action in the k-th knowledge matrix
Figure FDA0003004010900000027
Arrival state
Figure FDA0003004010900000028
The corresponding knowledge value;
Figure FDA0003004010900000029
is a knowledge increment; j is the population size in one iteration; a isimThe action is a preset optional action; m is the total number of decision variables;
Figure FDA00030040109000000210
representing the action value searched by the jth individual in the ith agent in the kth iteration for the mth decision variable;
Figure FDA00030040109000000211
representing the state of the jth individual in the ith agent in the kth iteration with respect to the mth decision variable; alpha is a learning factor; gamma is a discount factor; a. theimIs a preset optional action aimA set of (a); s8, information feedback: each agent returns the current optimal solution to the information center;
s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, returning to the S3.
2. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, wherein the S4 specifically includes:
the action value corresponding to each optimizing individual is determined through a preset state-action chain formula;
the preset state-action chain formula is as follows:
Figure FDA0003004010900000031
wherein q is0A random number between 0 and 1; epsilon is the probability of adopting a greedy optimization method strategy; a israndIs the probability of adopting a random optimization strategy;
Figure FDA0003004010900000032
representing the action value found by the jth individual in the ith agent for the mth decision variable in the kth iteration.
3. The distributed autonomous optimization method for multiple virtual power plants according to claim 1, wherein in S2, if the current task is a new task, screening out a similar source task with the greatest similarity to the current task, and calculating the initial memory matrix of the current task according to the optimal memory matrix of the similar source task specifically comprises:
if the current task is a new task, screening out a first similar source task and a second similar source task which have the maximum similarity with the current task;
calculating a first migration weight of the first similar source task relative to the current task and a second migration weight of the second similar source task relative to the current task through a migration weight calculation formula;
the migration weight calculation formula is as follows:
Figure FDA0003004010900000033
repsimilarity of the first similar source task and the current task is obtained; r iseqSimilarity of the second similar source task and the current task is obtained; t is tepIs the first migration weight; t is teqIs the second migration weight;
calculating an initial memory matrix of the current task through a preset memory matrix calculation formula according to the first migration weight and the second migration weight;
the preset memory matrix calculation formula is as follows:
Figure FDA0003004010900000034
Figure FDA0003004010900000035
an initial memory matrix for the current task with respect to a variable i;
Figure FDA0003004010900000036
an optimal memory matrix for the first similar source task with respect to a variable i;
Figure FDA0003004010900000037
an optimal memory matrix for the second similar source task with respect to variable i.
4. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, wherein the S8 specifically includes:
and each intelligent agent sends the current optimal solution to the information center to replace the decision variables corresponding to the intelligent agent in the archive set of the information center after the last iteration.
5. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, further comprising, before said S2:
acquiring virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters;
the virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity;
the scene parameters comprise wind speed and load size.
CN201910409906.1A 2019-05-16 2019-05-16 Multi-virtual power plant decentralized self-discipline optimization method Active CN110048461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910409906.1A CN110048461B (en) 2019-05-16 2019-05-16 Multi-virtual power plant decentralized self-discipline optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910409906.1A CN110048461B (en) 2019-05-16 2019-05-16 Multi-virtual power plant decentralized self-discipline optimization method

Publications (2)

Publication Number Publication Date
CN110048461A CN110048461A (en) 2019-07-23
CN110048461B true CN110048461B (en) 2021-07-02

Family

ID=67282341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910409906.1A Active CN110048461B (en) 2019-05-16 2019-05-16 Multi-virtual power plant decentralized self-discipline optimization method

Country Status (1)

Country Link
CN (1) CN110048461B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047071B (en) * 2019-10-29 2022-06-24 国网江苏省电力有限公司盐城供电分公司 Power system real-time supply and demand interaction method based on deep migration learning and Stackelberg game
CN110994620A (en) * 2019-11-16 2020-04-10 国网浙江省电力有限公司台州供电公司 Q-Learning algorithm-based power grid power flow intelligent adjustment method
CN115879983A (en) * 2023-02-07 2023-03-31 长园飞轮物联网技术(杭州)有限公司 Virtual power plant scheduling method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105023056A (en) * 2015-06-26 2015-11-04 华南理工大学 Power grid optimal carbon energy composite flow obtaining method based on swarm intelligence reinforcement learning
CN106296044A (en) * 2016-10-08 2017-01-04 南方电网科学研究院有限责任公司 power system risk scheduling method and system
CN108921368A (en) * 2018-05-03 2018-11-30 东南大学 Balanced cooperative game controller based on virtual power plant
CN108960510A (en) * 2018-07-04 2018-12-07 四川大学 A kind of virtual plant optimization trading strategies model based on two stage stochastic programming
CN109712019A (en) * 2018-12-13 2019-05-03 深圳供电局有限公司 A kind of multipotency building real-time power management optimization method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105023056A (en) * 2015-06-26 2015-11-04 华南理工大学 Power grid optimal carbon energy composite flow obtaining method based on swarm intelligence reinforcement learning
CN106296044A (en) * 2016-10-08 2017-01-04 南方电网科学研究院有限责任公司 power system risk scheduling method and system
CN108921368A (en) * 2018-05-03 2018-11-30 东南大学 Balanced cooperative game controller based on virtual power plant
CN108960510A (en) * 2018-07-04 2018-12-07 四川大学 A kind of virtual plant optimization trading strategies model based on two stage stochastic programming
CN109712019A (en) * 2018-12-13 2019-05-03 深圳供电局有限公司 A kind of multipotency building real-time power management optimization method

Also Published As

Publication number Publication date
CN110048461A (en) 2019-07-23

Similar Documents

Publication Publication Date Title
Askarzadeh A memory-based genetic algorithm for optimization of power generation in a microgrid
CN107958300B (en) Multi-microgrid interconnection operation coordination scheduling optimization method considering interactive response
Abdullah et al. An effective power dispatch control strategy to improve generation schedulability and supply reliability of a wind farm using a battery energy storage system
CN107545325B (en) Multi-microgrid interconnection operation optimization method based on game theory
CN110048461B (en) Multi-virtual power plant decentralized self-discipline optimization method
CN110728406B (en) Multi-agent power generation optimal scheduling method based on reinforcement learning
Leo et al. Reinforcement learning for optimal energy management of a solar microgrid
Hropko et al. Optimal dispatch of renewable energy sources included in virtual power plant using accelerated particle swarm optimization
Rayati et al. Optimal generalized Bayesian Nash equilibrium of frequency-constrained electricity market in the presence of renewable energy sources
Lazaroiu et al. Virtual power plant with energy storage optimized in an electricity market approach
CN112821470A (en) Micro-grid group optimization scheduling strategy based on niche chaos particle swarm algorithm
Tabatabaee et al. Stochastic energy management of renewable micro-grids in the correlated environment using unscented transformation
Khan et al. Short-term daily peak load forecasting using fast learning neural network
CN112508325A (en) Multi-time-scale electric energy scheduling method for household micro-grid
CN110571795A (en) arrangement method of energy storage unit in high-wind-force penetration power system
Marinescu et al. A hybrid approach to very small scale electrical demand forecasting
Ali et al. Development and planning of a hybrid power system based on advance optimization approach
Hannan et al. ANN-Based Binary Backtracking Search Algorithm for VPP Optimal Scheduling and Cost-Effective Evaluation
Changsong et al. Energy trading model for optimal microgrid scheduling based on genetic algorithm
Khorram-Nia et al. Optimal switching in reconfigurable microgrids considering electric vehicles and renewable energy sources
Dey et al. Energy Management of Microgrids with Renewables using soft computing techniques
Pourghasem et al. Reliable economic dispatch of microgrids by exchange market algorithm
Bai et al. Optimal scheduling of distributed energy resources by modern heuristic optimization technique
CN113554219A (en) Renewable energy power station shared energy storage capacity planning method and device
Abdi et al. Optimal unit commitment of renewable energy sources in the micro-grids with storage devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant