CN110048461B - Multi-virtual power plant decentralized self-discipline optimization method - Google Patents
Multi-virtual power plant decentralized self-discipline optimization method Download PDFInfo
- Publication number
- CN110048461B CN110048461B CN201910409906.1A CN201910409906A CN110048461B CN 110048461 B CN110048461 B CN 110048461B CN 201910409906 A CN201910409906 A CN 201910409906A CN 110048461 B CN110048461 B CN 110048461B
- Authority
- CN
- China
- Prior art keywords
- task
- virtual power
- action
- power plant
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000011159 matrix material Substances 0.000 claims abstract description 61
- 230000009471 action Effects 0.000 claims abstract description 31
- 239000003795 chemical substances by application Substances 0.000 claims description 75
- 230000006870 function Effects 0.000 claims description 30
- 230000005012 migration Effects 0.000 claims description 27
- 238000013508 migration Methods 0.000 claims description 27
- 230000008901 benefit Effects 0.000 claims description 20
- 230000005611 electricity Effects 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000004146 energy storage Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 7
- 238000010248 power generation Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 239000000446 fuel Substances 0.000 claims description 3
- 230000007812 deficiency Effects 0.000 claims 1
- 239000006185 dispersion Substances 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000002567 autonomic effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- H02J3/382—
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
Abstract
The application discloses many virtual power plant dispersion autonomous optimization method includes: s1, initializing parameters; s2, classifying the tasks and forming an initial knowledge matrix; s3, acquiring information; s4, determining the action of the optimizing individual; s5, calculating objective function values of all agents; s6, calculating a reward function; s7, updating the knowledge matrix; s8, information feedback: each agent returns the current optimal solution to the information center; s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, return to S3. The application discloses a distributed autonomous optimization method for multiple virtual power plants, which solves the technical problems that the existing distribution network regulation and control is difficult to meet the condition that multiple virtual power plants participate in the power market in real time for profit, and the grid-connected behavior of distributed equipment is effectively controlled to support the safe and effective operation of the distribution network.
Description
Technical Field
The application relates to the technical field of distribution network scheduling, in particular to a distributed autonomous optimization method for multiple virtual power plants.
Background
With the development of energy utilization technology, a large number of distributed power supplies are connected to a distribution network, and the number of distributed energy storage systems and controllable loads is increased day by day. On one hand, due to the huge amount, the influence of the comprehensive effect on the power grid cannot be ignored; on the other hand, the huge number of the devices makes it impossible for the power grid to directly and independently schedule each distributed device, and meanwhile, the influence of a single device on the power grid is small, and the direct control significance of the power grid on the single device is not large. And the electric power market construction is promoted orderly. Due to the characteristics of small capacity, distributed layout and high output randomness, the massive distributed power supply mainly using clean energy is difficult to participate in electric power market competition as an independent individual in fact. This will affect the enthusiasm of the construction of the distributed renewable energy sources, and the market effect will not be played the best. A virtual power plant is an ideal choice to solve the above problems. The virtual power plant technology is that a controllable load, a distributed power supply and an energy storage system are organically combined through a virtual power plant control center, and the controllable load, the distributed power supply and the energy storage system are enabled to participate in operation in the power grid and the power market in the identity of an equivalent power plant.
However, renewable energy sources have large uncertainty, and the existing distribution network regulation and control is difficult to meet the demand that a plurality of virtual power plants participate in the power market in real time and make profit by profit, and effectively control the grid-connected behavior of distributed equipment to support the safe and effective operation of a distribution network.
Disclosure of Invention
The application provides a decentralized self-discipline optimization method for multiple virtual power plants, and solves the technical problems that the existing distribution network regulation and control is difficult to meet the condition that multiple virtual power plants participate in the power market in real time for profit, and the grid-connected behavior of distributed equipment is effectively controlled to support the safe and effective operation of the distribution network.
In view of this, the present application provides a decentralized autonomous optimization method for multiple virtual power plants, including:
s1, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | JhL, penalty factor η, and reward cooperation constant W0;
S2, classifying the tasks and forming an initial knowledge matrix: if the current task is a source task, an initial knowledge matrix of the current task is formed randomly; if the current task is a new task, screening out a similar source task with the maximum similarity to the current task, and calculating an initial memory matrix of the current task according to the optimal memory matrix of the similar source task;
s3, information acquisition: the method comprises the following steps that the intelligent agents obtain current decision variables of other intelligent agents from an information center, one intelligent agent corresponds to a virtual power plant, and one intelligent agent comprises a plurality of optimizing individuals; the decision variables comprise the output power of micro fuel power generation equipment in the virtual power plant at the moment t, the output power of fan equipment at the moment t, the adjustable power of a controllable load at the moment t, the charge and discharge power of an energy storage system at the moment t and the electricity purchasing cost of multilateral transactions;
s4, determining the action of the optimizing individual: determining action values corresponding to the optimizing individuals according to the state-action chain;
s5, calculating the objective function value of each agent:
calculating the corresponding benefit of the intelligent agent through a preset benefit formula according to the acquired decision variable;
the preset benefit formula is as follows:
wherein the control variable vector x is a decision variable inside the virtual power plant h; s is a wind power and photovoltaic output scene obtained based on historical data; pi(s) is the probability of s scene occurrence;as virtual electricityThe power purchasing and selling income of the factory i in the real-time market and the large power grid,generating electricity purchasing and selling benefits for the virtual power plant i through the multilateral contract and other virtual power plants in electricity quantity transaction;the power generation cost of the virtual power plant i;network loss cost of the virtual power plant i;the switching cost of the virtual power plant i; n issIs the total number of virtual power plants; n isTThe total time of the preset time;
converting the calculated benefits into objective function values corresponding to the intelligent agents;
s6, calculating a reward function:
calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained through conversion;
the preset reward formula is as follows:
Fi Bestrepresenting the minimum value of an objective function of the optimal individual in the population in the kth iteration of the ith agent; fi kjRepresenting the objective function in the kth iteration of the ith agent; p is a radical ofmIs a positive multiple; c. CfRepresents a correction factor for ensuring that the reward function is positive;a set of state-action pairs representing the optimal individual in the kth iteration of the ith agent; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)k,sk+1,ak) In an action akSlave state when occurring skTransition to state sk+1The reward function of (2); (s, a) represents a state-action pair;
s7, updating the knowledge matrix:
updating the knowledge matrix through a preset formula group; the preset formula group is as follows:
selecting action a in the knowledge matrix representing the kth iterationimArrival stateThe corresponding knowledge value of the current knowledge value,selecting an action in the k-th knowledge matrixArrival stateThe corresponding knowledge value of the current knowledge value,is a knowledge increment; j is the population size in one iteration; a isimThe action is a preset optional action; m is the total number of decision variables;representing the action value searched by the jth individual in the ith agent in the kth iteration for the mth decision variable;representing the state of the jth individual in the ith agent in the kth iteration with respect to the mth decision variable; alpha is a learning factor; gamma is a discount factor; a. theimIs a preset optional action aimA set of (a); s8, information feedback: each agent returns the current optimal solution to the information center;
s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, returning to the S3.
Preferably, the S4 specifically includes:
the action value corresponding to each optimizing individual is determined through a preset state-action chain formula;
the preset state-action chain formula is as follows:
wherein q is0A random number between 0 and 1; epsilon is the probability of adopting a greedy optimization method strategy; a israndIs the probability of adopting a random optimization strategy;representing the action value found by the jth individual in the ith agent for the mth decision variable in the kth iteration.
Preferably, in S2, if the current task is a new task, screening out a similar source task with the largest similarity to the current task, and calculating the initial memory matrix of the current task according to the optimal memory matrix of the similar source task specifically includes:
if the current task is a new task, screening out a first similar source task and a second similar source task which have the maximum similarity with the current task;
calculating a first migration weight of the first similar source task relative to the current task and a second migration weight of the second similar source task relative to the current task through a migration weight calculation formula;
the migration weight calculation formula is as follows:
repsimilarity of the first similar source task and the current task is obtained; r iseqSimilarity of the second similar source task and the current task is obtained; t is tepIs the first migration weight; t is teqIs the second migration weight;
calculating an initial memory matrix of the current task through a preset memory matrix calculation formula according to the first migration weight and the second migration weight;
the preset memory matrix calculation formula is as follows:
an initial memory matrix for the current task with respect to a variable i;an optimal memory matrix for the first similar source task with respect to a variable i;an optimal memory matrix for the second similar source task with respect to variable i.
Preferably, the S8 specifically includes:
and each intelligent agent sends the current optimal solution to the information center to replace the decision variables corresponding to the intelligent agent in the archive set of the information center after the last iteration.
Preferably, before the S2, the method further includes:
acquiring virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters;
the virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity;
the scene parameters comprise wind speed and load size.
According to the technical scheme, the method has the following advantages:
the application provides a multi-virtual power plant decentralized self-discipline optimization method which comprises the steps that (1) the dimension of a memory matrix is effectively reduced by adopting a mutual-connection state-action chain, and a dimension disaster is avoided; meanwhile, a plurality of individuals are organized by means of the group intelligent technology to carry out autonomous optimization of the intelligent agent, and a memory matrix is introduced to enable the optimizing individuals to have memory self-learning capacity, so that the optimizing rate is greatly improved; (2) the memory migration based on the similarity can effectively avoid negative migration, so that the new task can be subjected to online learning on the basis of the learning and memory of the source task, and the optimization efficiency is obviously improved; (3) by establishing an information center in a multi-agent system, complete information dynamic non-cooperative game among a plurality of agents is realized, and the finally obtained Nash equilibrium solution enables the income of all the agents to be as large as possible. When the method is used for solving the decentralized self-discipline optimization of the multiple virtual power plants, a scheduling scheme which enables the net benefits of all the virtual power plants to be as large as possible can be found.
Drawings
Fig. 1 is a flowchart of an implementation manner of the distributed autonomous optimization method for multiple virtual power plants provided by the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of an implementation manner of a distributed autonomic optimization method for multiple virtual power plants, where the method includes:
The optimization effect of the multi-virtual power plant decentralized self-discipline optimization method provided by the application is influenced by a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimization individual number | JhL, penalty factor η, and reward cooperation constant W0The influence of (c). The initial settings of the above parameters are provided as an example, see table 1 below.
TABLE 1 Algorithm parameter set
102, obtaining virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters. The virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity; the scene parameters include wind speed and load size. The embodiment that this application provided has contained 5 virtual power plants, 10 wind-powered electricity generation/photovoltaic output scenes, and the day-ahead price of electricity is regarded as the definite value.
After a task is determined, the task may be classified into a source task and a new task by classifying the task. If the source task is a source task, the initial knowledge matrix can be randomly generated.
If the task is a new task, such as a new task e, two first similar source tasks and two second similar source tasks, such as a similar source task p and a similar source task q, which have the greatest similarity to the new task e can be screened out.
The initial memory matrix of the new task e can be obtained by the optimal memory matrix of the similar source task p and the similar source task q. First, a first migration weight of the similar source task p relative to the new task e is calculated through a migration weight calculation formula, and a second migration weight of the similar source task q relative to the new task e is calculated.
The migration weight calculation formula is as follows:
repsimilarity of the first similar source task and the current task is obtained; r iseqSimilarity of the second similar source task and the current task is obtained; t is tepIs the first migration weight; t is teqIs the second migration weight.
The calculated migration weight, i.e., t, may be utilizedepAnd teqCalculating an initial memory matrix of the new task e through a preset memory matrix calculation formula;
the preset memory matrix calculation formula is as follows:
an initial memory matrix for the current task with respect to a variable i;an optimal memory matrix for the first similar source task with respect to a variable i;an optimal memory matrix for the second similar source task with respect to variable i.
By introducing the memory matrix, the optimization individual can have the memory self-learning capability, and the optimization rate is greatly improved.
Wherein, an agent corresponds a virtual power plant, through regarding every virtual power plant as an agent, agrees multilateral electric power transaction price through Nash game, can realize distributed autonomous optimization. An agent also contains a plurality of optimizing individuals.
The decision variables comprise the output power of the micro fuel power generation equipment in the virtual power plant at the moment t, the output power of the fan equipment at the moment t, the adjustable power of the controllable load at the moment t, the charging and discharging power of the energy storage system at the moment t and the electricity purchasing cost of the multilateral transaction.
It should be noted that an information center exists in the multi-virtual power plant decentralized autonomous optimization model provided by the application. Each virtual power plant needs to feed back the optimized scheduling information of the region to the information center in real time, and meanwhile, the scheduling information of other subsystems can be known in real time through the information center. Considering that the whole network information center can realize the complete disclosure of the scheduling information of each subsystem, the complete information dynamic non-cooperative game among the virtual power plants can be carried out, and the direct purchase price on the multilateral contract can be obtained through the Nash equilibrium state of the non-cooperative game.
Specifically, the action value corresponding to each optimization unit can be determined by a preset state-action chain formula;
the preset state-action chain formula is:
wherein q is0A random number between 0 and 1; epsilon is the probability of adopting a greedy optimization method strategy; a israndIs the probability of adopting a random optimization strategy;representing the action value found by the jth individual in the ith agent for the mth decision variable in the kth iteration.
The fast optimization is developed by adopting a Q learning algorithm, and the original large-scale knowledge matrix can be decomposed into a plurality of small-scale knowledge matrices Q by adopting a state-action chainimThe dimensionality of the memory matrix is effectively reduced, and dimension disaster is avoided.
calculating the corresponding benefit of the intelligent agent through a preset benefit formula according to the acquired decision variable;
the preset benefit formula is as follows:
wherein the control variable vector x is a decision variable inside the virtual power plant h; s is a wind power and photovoltaic output scene obtained based on historical data; pi(s) is the probability of s scene occurrence;for the virtual power plant i to buy and sell power income in the real-time market and the large power grid,generating electricity purchasing and selling benefits for the virtual power plant i through the multilateral contract and other virtual power plants in electricity quantity transaction;the power generation cost of the virtual power plant i;network loss cost of the virtual power plant i;the switching cost of the virtual power plant i; n issIs the total number of virtual power plants; n isTThe total time of the preset time; and converting the calculated benefits into objective function values corresponding to the intelligent agents.
The sub-targeting functions of the agent may be set as:
sub-targeting function fhEqual to the reciprocal of the benefit formula, hence the objective function f of the agenthThe smaller the gain, the greater the gain.
It should be noted that in the distributed autonomous optimization model of multiple virtual power plants, multiple virtual power plants are contracted by signing multilateral power trading contracts, and the direct purchase price agreed with the specified direct purchase price is agreed and obtained by all the virtual power plants; the output decision and the electricity price decision of each virtual power plant can influence the electricity purchasing cost, the network loss cost and the switch cost of other virtual power plants, and therefore benefit games exist among the virtual power plants.
The optimization goal of each virtual power plant is to consider the reconfiguration of a distribution network and the maximization of net profit participating in the competition of the power market, and the profit is derived from the power selling income of the large power grid and the multilateral contract income of the rest virtual power plants; the cost is derived from the cost of electricity purchase to the large power grid, the cost of electricity generation, the cost of grid loss, the cost of switching loss, and the cost of electricity purchase to the remaining virtual power plants.
calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained by conversion;
the preset reward formula is as follows:
Fi Bestrepresenting the minimum value of an objective function of the optimal individual in the population in the kth iteration of the ith agent; fi kjRepresenting the objective function in the kth iteration of the ith agent; p is a radical ofmIs a positive multiple; c. CfRepresents a correction factor for ensuring that the reward function is positive;a set of state-action pairs representing the optimal individual in the kth iteration of the ith agent; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)k,sk+1,ak) In an action akSlave state when occurring skTransition to state sk+1The reward function of (2); (s, a) represents a state-action pair;
and updating the knowledge matrix through a preset formula group.
The preset formula group is as follows:
is a knowledge increment; j is the population size in one iteration; a isimThe action is a preset optional action; m is the total number of decision variables; a. theimIs a preset optional action aimA collection of (a).
And realizing optimization by sharing cooperative individuals in the population and updating the corresponding knowledge matrix, wherein the knowledge updating is carried out in a local greedy manner so as to ensure the global convergence effect of the algorithm.
Specifically, each agent sends the current optimal solution to the information center, and replaces the decision variables after the last iteration corresponding to the agent in the archive set of the information center.
Archive set A in an information centerrThe latest policy combination for the whole multi-agent system is preserved, namely: a. ther=[x1 *,...,xh *...,xH *]H is the number of agents, xh *Is the latest policy of agent h. At the end of the kth iteration, agent h solves its current optimal solution xh k*Sending to information center by xh k*Alternative archive set ArThe corresponding variable in (1); after the next iteration starts, the agent h obtains the decision data of the other agents from the information center, and the sub-objective function f of the agent h is based on the decision datahOptimizing to obtain a new round of optimal solution xh k+1*And sending the information to the information center again to update Ar. Due to the objective function f of the agenthThe smaller the gain, the greater the gain. When all agents cannot reduce their objective function values unilaterally, ArThe policy combination stored in the system is not changed any more, and the algorithm is converged at the moment, so that the policy combination is the Nash equilibrium solution.
The application provides a multi-virtual power plant decentralized self-discipline optimization method which comprises the steps that (1) the dimension of a memory matrix is effectively reduced by adopting a mutual-connection state-action chain, and a dimension disaster is avoided; meanwhile, a plurality of individuals are organized by means of the group intelligent technology to carry out autonomous optimization of the intelligent agent, and a memory matrix is introduced to enable the optimizing individuals to have memory self-learning capacity, so that the optimizing rate is greatly improved; (2) the memory migration based on the similarity can effectively avoid negative migration, so that the new task can be subjected to online learning on the basis of the learning and memory of the source task, and the optimization efficiency is obviously improved; (3) by establishing an information center in a multi-agent system, complete information dynamic non-cooperative game among a plurality of agents is realized, and the finally obtained Nash equilibrium solution enables the income of all the agents to be as large as possible. When the method is used for solving the decentralized self-discipline optimization of the multiple virtual power plants, a scheduling scheme which enables the net benefits of all the virtual power plants to be as large as possible can be found.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (5)
1. A multi-virtual power plant decentralized self-discipline optimization method is characterized by comprising the following steps:
s1, parameter initialization: presetting a learning factor alpha, a discount factor gamma, a greedy utilization probability epsilon and an intelligent agent optimizing individual number | JhL, penalty factor eta and prizeExcitation constant W0;
S2, classifying the tasks and forming an initial knowledge matrix: if the current task is a source task, an initial knowledge matrix of the current task is formed randomly; if the current task is a new task, screening out a similar source task with the maximum similarity to the current task, and calculating an initial memory matrix of the current task according to the optimal memory matrix of the similar source task;
s3, information acquisition: the method comprises the following steps that the intelligent agents obtain current decision variables of other intelligent agents from an information center, one intelligent agent corresponds to a virtual power plant, and one intelligent agent comprises a plurality of optimizing individuals; the decision variables comprise the output power of micro fuel power generation equipment in the virtual power plant at the moment t, the output power of fan equipment at the moment t, the adjustable power of a controllable load at the moment t, the charge and discharge power of an energy storage system at the moment t and the electricity purchasing cost of multilateral transactions;
s4, determining the action of the optimizing individual: determining action values corresponding to the optimizing individuals according to the state-action chain;
s5, calculating the objective function value of each agent:
calculating the corresponding benefit of the intelligent agent through a preset benefit formula according to the acquired decision variable;
the preset benefit formula is as follows:
wherein the control variable vector x is a decision variable inside the virtual power plant h; s is a wind power and photovoltaic output scene obtained based on historical data; pi(s) is the probability of s scene occurrence;for the virtual power plant i to buy and sell power income in the real-time market and the large power grid,is deficiency ofThe simulated power plant i generates electricity purchasing and selling benefits through the multilateral contract and other virtual power plants in electricity quantity transaction;the power generation cost of the virtual power plant i;network loss cost of the virtual power plant i;the switching cost of the virtual power plant i; n issIs the total number of virtual power plants; n isTThe total time of the preset time;
converting the calculated benefits into objective function values corresponding to the intelligent agents;
s6, calculating a reward function:
calculating the reward corresponding to the intelligent agent through a preset reward formula according to the objective function value obtained through conversion;
the preset reward formula is as follows:
Fi Bestrepresenting the minimum value of an objective function of the optimal individual in the population in the kth iteration of the ith agent; fi kjRepresenting the objective function in the kth iteration of the ith agent; p is a radical ofmIs a positive multiple; c. CfRepresents a correction factor for ensuring that the reward function is positive;representing the optimal one in the kth iteration of the ith agentA set of state-action pairs for a body; superscript k denotes the kth iteration, superscript j denotes the jth individual, subscript i denotes the ith agent, and subscript m denotes the mth decision variable; r(s)k,sk+1,ak) In an action akSlave state when occurring skTransition to state sk+1The reward function of (2); (s, a) represents a state-action pair;
s7, updating the knowledge matrix:
updating the knowledge matrix through a preset formula group; the preset formula group is as follows:
selecting action a from the k-th knowledge matriximArrival stateThe corresponding knowledge value;selecting an action in the k-th knowledge matrixArrival stateThe corresponding knowledge value;is a knowledge increment; j is the population size in one iteration; a isimThe action is a preset optional action; m is the total number of decision variables;representing the action value searched by the jth individual in the ith agent in the kth iteration for the mth decision variable;representing the state of the jth individual in the ith agent in the kth iteration with respect to the mth decision variable; alpha is a learning factor; gamma is a discount factor; a. theimIs a preset optional action aimA set of (a); s8, information feedback: each agent returns the current optimal solution to the information center;
s9, judging whether the maximum iteration times are reached, and if so, outputting an optimal knowledge matrix of the corresponding task; otherwise, returning to the S3.
2. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, wherein the S4 specifically includes:
the action value corresponding to each optimizing individual is determined through a preset state-action chain formula;
the preset state-action chain formula is as follows:
wherein q is0A random number between 0 and 1; epsilon is the probability of adopting a greedy optimization method strategy; a israndIs the probability of adopting a random optimization strategy;representing the action value found by the jth individual in the ith agent for the mth decision variable in the kth iteration.
3. The distributed autonomous optimization method for multiple virtual power plants according to claim 1, wherein in S2, if the current task is a new task, screening out a similar source task with the greatest similarity to the current task, and calculating the initial memory matrix of the current task according to the optimal memory matrix of the similar source task specifically comprises:
if the current task is a new task, screening out a first similar source task and a second similar source task which have the maximum similarity with the current task;
calculating a first migration weight of the first similar source task relative to the current task and a second migration weight of the second similar source task relative to the current task through a migration weight calculation formula;
the migration weight calculation formula is as follows:
repsimilarity of the first similar source task and the current task is obtained; r iseqSimilarity of the second similar source task and the current task is obtained; t is tepIs the first migration weight; t is teqIs the second migration weight;
calculating an initial memory matrix of the current task through a preset memory matrix calculation formula according to the first migration weight and the second migration weight;
the preset memory matrix calculation formula is as follows:
4. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, wherein the S8 specifically includes:
and each intelligent agent sends the current optimal solution to the information center to replace the decision variables corresponding to the intelligent agent in the archive set of the information center after the last iteration.
5. The multi-virtual power plant decentralized autonomous optimization method according to claim 1, further comprising, before said S2:
acquiring virtual power plant parameters, scene parameters, electricity price parameters and distribution network operation parameters;
the virtual power plant parameters comprise wind power output, photovoltaic output, energy storage equipment capacity and electric storage quantity;
the scene parameters comprise wind speed and load size.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910409906.1A CN110048461B (en) | 2019-05-16 | 2019-05-16 | Multi-virtual power plant decentralized self-discipline optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910409906.1A CN110048461B (en) | 2019-05-16 | 2019-05-16 | Multi-virtual power plant decentralized self-discipline optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110048461A CN110048461A (en) | 2019-07-23 |
CN110048461B true CN110048461B (en) | 2021-07-02 |
Family
ID=67282341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910409906.1A Active CN110048461B (en) | 2019-05-16 | 2019-05-16 | Multi-virtual power plant decentralized self-discipline optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110048461B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047071B (en) * | 2019-10-29 | 2022-06-24 | 国网江苏省电力有限公司盐城供电分公司 | Power system real-time supply and demand interaction method based on deep migration learning and Stackelberg game |
CN110994620A (en) * | 2019-11-16 | 2020-04-10 | 国网浙江省电力有限公司台州供电公司 | Q-Learning algorithm-based power grid power flow intelligent adjustment method |
CN115879983A (en) * | 2023-02-07 | 2023-03-31 | 长园飞轮物联网技术(杭州)有限公司 | Virtual power plant scheduling method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105023056A (en) * | 2015-06-26 | 2015-11-04 | 华南理工大学 | Power grid optimal carbon energy composite flow obtaining method based on swarm intelligence reinforcement learning |
CN106296044A (en) * | 2016-10-08 | 2017-01-04 | 南方电网科学研究院有限责任公司 | power system risk scheduling method and system |
CN108921368A (en) * | 2018-05-03 | 2018-11-30 | 东南大学 | Balanced cooperative game controller based on virtual power plant |
CN108960510A (en) * | 2018-07-04 | 2018-12-07 | 四川大学 | A kind of virtual plant optimization trading strategies model based on two stage stochastic programming |
CN109712019A (en) * | 2018-12-13 | 2019-05-03 | 深圳供电局有限公司 | A kind of multipotency building real-time power management optimization method |
-
2019
- 2019-05-16 CN CN201910409906.1A patent/CN110048461B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105023056A (en) * | 2015-06-26 | 2015-11-04 | 华南理工大学 | Power grid optimal carbon energy composite flow obtaining method based on swarm intelligence reinforcement learning |
CN106296044A (en) * | 2016-10-08 | 2017-01-04 | 南方电网科学研究院有限责任公司 | power system risk scheduling method and system |
CN108921368A (en) * | 2018-05-03 | 2018-11-30 | 东南大学 | Balanced cooperative game controller based on virtual power plant |
CN108960510A (en) * | 2018-07-04 | 2018-12-07 | 四川大学 | A kind of virtual plant optimization trading strategies model based on two stage stochastic programming |
CN109712019A (en) * | 2018-12-13 | 2019-05-03 | 深圳供电局有限公司 | A kind of multipotency building real-time power management optimization method |
Also Published As
Publication number | Publication date |
---|---|
CN110048461A (en) | 2019-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Askarzadeh | A memory-based genetic algorithm for optimization of power generation in a microgrid | |
CN107958300B (en) | Multi-microgrid interconnection operation coordination scheduling optimization method considering interactive response | |
Abdullah et al. | An effective power dispatch control strategy to improve generation schedulability and supply reliability of a wind farm using a battery energy storage system | |
CN107545325B (en) | Multi-microgrid interconnection operation optimization method based on game theory | |
CN110048461B (en) | Multi-virtual power plant decentralized self-discipline optimization method | |
CN110728406B (en) | Multi-agent power generation optimal scheduling method based on reinforcement learning | |
Leo et al. | Reinforcement learning for optimal energy management of a solar microgrid | |
Hropko et al. | Optimal dispatch of renewable energy sources included in virtual power plant using accelerated particle swarm optimization | |
Rayati et al. | Optimal generalized Bayesian Nash equilibrium of frequency-constrained electricity market in the presence of renewable energy sources | |
Lazaroiu et al. | Virtual power plant with energy storage optimized in an electricity market approach | |
CN112821470A (en) | Micro-grid group optimization scheduling strategy based on niche chaos particle swarm algorithm | |
Tabatabaee et al. | Stochastic energy management of renewable micro-grids in the correlated environment using unscented transformation | |
Khan et al. | Short-term daily peak load forecasting using fast learning neural network | |
CN112508325A (en) | Multi-time-scale electric energy scheduling method for household micro-grid | |
CN110571795A (en) | arrangement method of energy storage unit in high-wind-force penetration power system | |
Marinescu et al. | A hybrid approach to very small scale electrical demand forecasting | |
Ali et al. | Development and planning of a hybrid power system based on advance optimization approach | |
Hannan et al. | ANN-Based Binary Backtracking Search Algorithm for VPP Optimal Scheduling and Cost-Effective Evaluation | |
Changsong et al. | Energy trading model for optimal microgrid scheduling based on genetic algorithm | |
Khorram-Nia et al. | Optimal switching in reconfigurable microgrids considering electric vehicles and renewable energy sources | |
Dey et al. | Energy Management of Microgrids with Renewables using soft computing techniques | |
Pourghasem et al. | Reliable economic dispatch of microgrids by exchange market algorithm | |
Bai et al. | Optimal scheduling of distributed energy resources by modern heuristic optimization technique | |
CN113554219A (en) | Renewable energy power station shared energy storage capacity planning method and device | |
Abdi et al. | Optimal unit commitment of renewable energy sources in the micro-grids with storage devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |