CN113177655A - Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning - Google Patents

Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning Download PDF

Info

Publication number
CN113177655A
CN113177655A CN202110318894.9A CN202110318894A CN113177655A CN 113177655 A CN113177655 A CN 113177655A CN 202110318894 A CN202110318894 A CN 202110318894A CN 113177655 A CN113177655 A CN 113177655A
Authority
CN
China
Prior art keywords
energy
game
action
optimal
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110318894.9A
Other languages
Chinese (zh)
Inventor
肖迁
穆云飞
贾宏杰
陆文标
李天翔
余晓丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110318894.9A priority Critical patent/CN113177655A/en
Publication of CN113177655A publication Critical patent/CN113177655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a comprehensive energy system multi-main-body operation optimization method and a device based on reinforcement learning, wherein the method comprises the following steps: constructing a comprehensive energy system model; layering the constructed system model, wherein the upper layer is a multi-subject game, and the lower layer is equipment scheduling optimization; in order to solve the upper-layer multi-body game process, Nash equilibrium points are screened in a permutation and combination mode based on the Stackelberg game definition, and a system-wide optimal strategy combination is obtained by combining a Nash-Q algorithm; and solving the optimal running state of each lower-layer main body device by using a CPLEX solver with the minimum production cost of each main body as a target function. The device comprises: the system comprises a construction module, a division and interaction module, a screening and solving module and an obtaining module. The method solves the problems that flexible resources are not fully excavated, multi-party interaction is not taken into account, optimal power flow calculation is not facilitated and the like when the prior algorithm guides the optimized operation of the park.

Description

Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning
Technical Field
The invention relates to the field of comprehensive energy system operation optimization, in particular to a comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning.
Background
The energy is the basis of human survival and development and is the basic guarantee of social progress. In recent years, with the consumption of fossil energy and the increase of world energy demand, how to efficiently utilize energy has become a very important research topic. Therefore, it is very urgent to develop new energy and improve the utilization efficiency of the existing energy. An Integrated Energy System (IES) is a System for mixing and utilizing various Energy sources and supplying Energy through coordination and complementation among different Energy sources, breaks through the existing mode of independent planning, independent design and independent operation of each original Energy supply System, performs Integrated planning design and operation optimization of a social Energy System, and can improve the utilization efficiency of various Energy sources. A plurality of beneficial agents often exist in the comprehensive energy system, and each agent can flexibly coordinate according to own beneficial targets under the condition of meeting supply requirements, so that certain difficulty is brought to the analysis of the behavior of each agent.
When analyzing a multi-subject game of an integrated energy system, most students currently adopt a particle swarm algorithm. However, the heuristic algorithm has long calculation time and slow analysis game, is easy to converge on a local optimum point, and is not easy to obtain a global optimum solution through single optimization. In practical engineering application, a control strategy formulated by a park operator is relatively lagged due to long calculation time, which is not beneficial to fully mining flexible resources and optimizing operation of a whole system; when the system operates at a local optimal point, the interaction capacity of each main body is not fully mined, the actual benefit is lower than the theoretical optimal benefit, and meanwhile, the optimal load flow calculation of a network level is not facilitated. To solve such problems, many researchers have introduced artificial intelligence algorithms into multi-subject gaming and achieved certain results.
In the process of implementing the invention, the inventor finds that the prior art has at least the following disadvantages and shortcomings:
1. the traditional heuristic algorithm has long calculation time and slow analysis game, and the control strategy made by a park operator is relatively lagged due to the long calculation time, so that the full excavation of flexible resources and the optimized operation of the whole system are not facilitated;
2. in the prior art, the interaction among a plurality of main bodies of an operator, a service party and a user cannot be fully considered, the interaction capacity of each main body is not fully mined, and the actual income is lower than the theoretical optimum;
3. the prior art is easy to converge on a local optimal point, a global optimal solution is not easy to obtain through single optimization, and optimal power flow calculation of a network level is not facilitated.
Disclosure of Invention
In order to solve the problems that flexible resources are not fully excavated, benefits of multiple parties are low, optimal power flow calculation is not facilitated and the like caused by traditional particle swarm and other heuristic algorithms during park solving operation, the invention provides a comprehensive energy system multi-agent operation optimization method and device based on reinforcement learning, and the method and device are described in detail as follows:
in a first aspect, a method for optimizing multi-agent operation of an integrated energy system based on reinforcement learning includes:
building a multi-subject model of a park comprehensive energy system, dividing the optimization process of the multi-subject model into an upper-layer multi-subject game and a lower-layer equipment scheduling optimization, and adopting a source-load double-side game interaction;
screening Nash equilibrium points in a permutation and combination mode based on the Stackelberg game definition, and obtaining the optimal combination action in the whole time period by combining a Nash-Q algorithm, namely the optimal strategy of the current typical day; and solving the optimal running state of each main body device by using a CPLEX solver with the minimum production cost of each main body as a target function.
In one implementation, the screening of the Nash balance points in the form of permutation and combination based on the Stackelberg game definition specifically includes:
and (3) applying a reinforcement signal in reinforcement learning to describe the actual physical significance of the Nash equilibrium point in the multi-master-slave game, judging whether the combined action meets the return constraint condition of any intelligent agent or not according to the reinforcement signal, and if so, determining that the combined action is a Nash equilibrium solution.
In an implementation manner, the acquiring of the optimal combined action in the whole time period by combining the Nash-Q algorithm is specifically as follows:
1) dispersing the motion space;
2) each agent removes the action combinations which do not meet the constraint according to the return constraint condition, and reserves the actions which meet the constraint as an action set;
3) calculating the income of each intelligent agent under all combined actions in the action set, and storing income data in a table;
4) selecting one agent according to the sequence from agent 1 to agent n, searching the optimal action of the selected agent under all the combined actions of all the other unselected agents, deleting the other actions of the selected agent, and only keeping the optimal action;
5) and storing the combined actions in the existing action set, wherein the stored combined actions are the optimal strategy in the whole time period.
In a second aspect, an apparatus for optimizing the multi-agent operation of an integrated energy system based on reinforcement learning, the apparatus comprising:
the building module is used for building a multi-body model of the park comprehensive energy system;
the dividing and interacting module is used for dividing the optimization process of the multi-subject model into an upper-layer multi-subject game and a lower-layer equipment scheduling optimization, and adopts a source-load double-side game interaction;
the screening and solving module is used for screening Nash equilibrium points in a permutation and combination mode based on the Stackelberg game definition and obtaining the optimal combination action in the whole time period by combining the Nash-Q algorithm, namely the optimal strategy of the current typical day;
and the calculating module is used for calculating the optimal running state of each main body device by using a CPLEX solver with the minimum production cost of each main body as a target function.
In a third aspect, an apparatus for optimizing the operation of a multi-agent integrated energy system based on reinforcement learning includes: a processor and a memory, the memory having stored therein program instructions, the processor calling the program instructions stored in the memory to cause the apparatus to perform the method steps of the first aspect.
In a fourth aspect, a computer-readable storage medium, storing a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method steps of the first aspect.
The technical scheme provided by the invention has the beneficial effects that:
1) compared with the traditional heuristic algorithm, the comprehensive energy system multi-main-body operation optimization research method provided by the invention uses the AI algorithm to solve, can use historical data to learn, and can reduce the lag time in engineering application;
2) compared with the traditional comprehensive energy system model, the comprehensive energy system multi-main-body operation optimization research method provided by the invention considers the interaction mechanism of each main body in detail, and improves the income and the energy utilization efficiency of the system.
Drawings
FIG. 1 is a flow chart of a comprehensive energy system multi-agent operation optimization method based on reinforcement learning;
FIG. 2 is a schematic diagram of an integrated energy system multi-body model;
FIG. 3 is a schematic diagram of an upper level multi-subject gaming model;
FIG. 4 is a schematic diagram of a scheduling optimization model of each device in the lower layer;
FIG. 5 is a flow chart for solving the Nash equilibrium points by permutation and combination;
FIG. 6 is a schematic view of an initial load curve;
FIG. 7 is a schematic diagram of the initial values of renewable energy sources;
FIG. 8 is a schematic illustration of the results of a power provider game;
FIG. 9 is a schematic diagram of the results of the service provider chess;
FIG. 10 is a schematic diagram of the results of user plays;
FIG. 11 is a diagram illustrating power scheduling results;
FIG. 12 is a diagram illustrating the thermal energy scheduling result;
FIG. 13 is a diagram illustrating the results of a gas energy dispatch;
fig. 14 is a schematic structural diagram of an integrated energy system multi-subject operation optimization device based on reinforcement learning.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
In order to fully consider interaction mechanisms of all main bodies in a park and improve solving efficiency of a multi-main-body game and energy utilization efficiency of a system, the embodiment of the invention provides a comprehensive energy system multi-main-body operation optimization method based on reinforcement learning.
The scheme of the invention is further described by combining a specific calculation formula, a drawing and an example, and the details are described in the following description. A comprehensive energy system multi-main-body operation optimization method based on reinforcement learning comprises the following steps:
step 101: constructing a multi-body model of the park comprehensive energy system;
(1) park comprehensive energy system
The park integrated energy system often includes a variety of energy suppliers and energy conversion equipment. In the embodiment of the invention, a park comprehensive energy system model as shown in fig. 2 is established, wherein energy suppliers comprise a power grid company, a heat source plant and an energy supplier, the power grid company can only provide electric energy, the heat source plant can only provide heat energy, and the energy supplier can provide three kinds of energy of electricity, heat and gas. The service side has a park service provider, which is responsible for purchasing energy from the energy supplier, selectively calling various devices in the park and providing the devices to the user. The equipment that the park service provider can control includes Wind Turbine Generator (WTG) Photovoltaic Generator (PG), Power to Gas equipment (P2G), Combined Heat and Power generation (CHP), and Gas Boiler (Gas Boiler, GB). The user uses the energy, which comprises three loads of electricity, heat and gas.
When the user has energy demand, the user can only purchase energy from the park service provider, and meanwhile, the user can determine the specific energy demand response quantity according to factors such as pricing of the park service provider, self energy demand quantity, user energy utilization preference and the like. And the park service provider can adjust the energy price of the park service provider to the user, and can flexibly select different energy suppliers and different energy conversion equipment. The energy provider may decide the price of the energy sold to the facilitator, who may order the energy based on the energy provider's price.
(2) Energy supplier
The energy supplier is used as a supplier of park energy, and the main work of the energy supplier is used for allocating energy production equipment and carrying out game interaction of various types of energy with park service providers. The profit is the difference between the energy sales revenue and the energy supply cost to the campus service provider.
Objective function I of energy supplierESThe following were used:
Figure BDA0002991982340000051
wherein the content of the first and second substances,
Figure BDA0002991982340000052
the selling price of the electric energy of the ith generator set at the moment t of the energy supplier is shown, and then the t of the invention is shown as the moment;
Figure BDA0002991982340000053
represents the gas energy selling price of the energy supplier;
Figure BDA0002991982340000054
represents a heat energy selling price of the energy supplier;
Figure BDA0002991982340000055
the electric energy selling power of the ith unit of the energy supplier is represented;
Figure BDA0002991982340000056
and
Figure BDA0002991982340000057
respectively representing the selling power of gas energy and heat energy of an energy supplier; t is the total time set by the invention, and N is the number of generator sets; c. CnetRepresenting the amount of the net charge to be paid by the energy supplier; ge,t,iRepresenting the ith genset operating cost; gs,tAnd Gh,tRespectively representing the running cost of the air supply and the heat supply of the energy supplier;
Figure BDA0002991982340000058
represents a satisfaction function, whose expression is as follows:
Figure BDA0002991982340000059
wherein, b and a are satisfaction coefficients, and the values are positive and are selected according to specific conditions; k belongs to { e, h, s }, and e, h and s respectively represent electricity, heat and gas;
Figure BDA00029919823400000510
the price of k-type energy of an energy supplier meets the upper and lower limit constraints; rhok,tThe market price of k-type energy sources can be generally considered as marginal prices of an electric power market, a heat power market and a natural gas market.
(3) Park service provider
The park facilitator is an intermediary between park energy suppliers and users, and it achieves the most efficient use of park energy through selection of each supplier, distribution of various energies, and control of each equipment in the park. The revenue for the campus service provider is the difference between the benefit it can obtain for its sale to the campus users and the overall cost of the campus service provider. The combined cost of the park facilitator is offset by the cost of demand response to park customers
Figure BDA00029919823400000511
Cost of energy purchase from energy suppliers
Figure BDA00029919823400000512
Environmental governance costs
Figure BDA00029919823400000513
And customer satisfaction cost to the electricity, gas, heat supply of the park service provider
Figure BDA00029919823400000514
And (4) forming. Park facilitator objective function IEHThe following were used:
Figure BDA00029919823400000515
wherein, the set K is { e, h, s }, e represents electric energy, h represents heat energy, and s represents natural gas;
Figure BDA00029919823400000516
representing the price at which the campus service provider sells class k energy to the customer,
Figure BDA00029919823400000517
representing the actual energy used by the user after responding to the demand of the k-type energy. The demand response approach considered herein is interruptible load.
Figure BDA00029919823400000518
The compensation cost of the park service provider for the users participating in the demand response is represented by the following calculation formula:
Figure BDA0002991982340000061
wherein the content of the first and second substances,
Figure BDA0002991982340000062
price per unit, L, representing compensation for type k energy demand responsek,tRepresenting the initial load power of the class k energy source.
Figure BDA0002991982340000063
Presentation park serviceThe cost of commercial purchase energy is expressed as follows:
Figure BDA0002991982340000064
wherein the content of the first and second substances,
Figure BDA0002991982340000065
represents the total cost of the energy purchased from the energy supplier by the park service provider, and the value of the total cost is the sum of the electricity cost, the gas cost and the heat cost purchased from the energy supplier;
Figure BDA0002991982340000066
represents the total cost of the park service provider for purchasing electric energy from the power grid, and the value of the total cost is the unit price of purchasing electric energy from the power grid
Figure BDA0002991982340000067
Purchasing electric quantity P with power gridt EThe product of (a);
Figure BDA0002991982340000068
represents the total cost of the park service provider to purchase heat energy from the heat source plant, and the value is the heat purchase unit price of the heat source plant
Figure BDA0002991982340000069
Purchase heat quantity P with heat source plantt HThe product of (a).
Figure BDA00029919823400000610
Representing the environmental remediation cost, the expression of which is as follows:
Figure BDA00029919823400000611
wherein r represents the unit cost of environmental governance,
Figure BDA00029919823400000612
representing the actual electricity consumption of the user, Pt PWAnd Pt PVRespectively representing the wind power generation capacity and the photovoltaic power generation capacity at the moment t. Cost of satisfaction
Figure BDA00029919823400000613
The calculation of (2) is the same as the equation, and only the satisfaction coefficient needs to be changed and the supplier price is replaced with the service provider price.
Meanwhile, the power of each device of the park service provider needs to meet the following constraints:
Figure BDA00029919823400000614
where, F { CHP, GB, P2G, WTG, PVG } represents a set of campus service provider devices, and F represents a current device and satisfies F ∈ F.
The energy allocation relationship for the campus service provider is represented by the following equation:
Figure BDA00029919823400000615
wherein the content of the first and second substances,
Figure BDA00029919823400000616
indicating the actual heat usage of the user,
Figure BDA00029919823400000617
represents the actual gas usage by the user,
Figure BDA00029919823400000618
representing the total purchased electric power, and the value of the total purchased electric power is the sum of the purchased electric power of the power grid and the purchased electric power of the energy supplier;
Figure BDA00029919823400000619
represents the total heat purchasing power, and the value of the total heat purchasing power is the sum of the heat purchasing power from the heat source plant and the heat purchasing power from the energy supplier;
Figure BDA00029919823400000620
to total gas purchase powerThe park service provider only purchases gas from the energy supplier; etaCHP,e、ηCHP,h、ηGBAnd ηP2GRespectively representing CHP electric efficiency, CHP heat efficiency, gas boiler efficiency and P2G equipment efficiency; k is a radical of1、k2And gamma1、γ2、γ3Respectively representing the power and gas purchasing power regulating coefficients, which represent the proportion of each purchased energy source to be transferred into the corresponding unit.
(4) User' s
The park users comprise three types of loads including electricity, gas and heat, the users can comprehensively consider the energy purchasing cost and the comfort level function to determine the value of the interrupted load, and the target function of the users is as follows:
Figure BDA0002991982340000071
wherein, ω is1And ω2And the weight coefficients are coefficients of the user purchase energy cost and the comfort level cost respectively. CtFor the cost of energy purchase of the user, DtFor the comfort cost of the user, the expression is shown as follows:
Figure BDA0002991982340000072
wherein the content of the first and second substances,
Figure BDA0002991982340000073
the actual amount of energy of the user, k, can be one of electricity, gas, heat energy,
Figure BDA0002991982340000074
a price per unit compensated for response to a class k energy demand.
Figure BDA0002991982340000075
Wherein, ykIs the preference coefficient of the user to the k-type energy, and the value of the preference coefficient isA positive number. y iskThe smaller the value, the less the impact of the energy source on the comfort of the user, and the higher the interruptible load value.
Step 102: preprocessing a multi-subject model of the park comprehensive energy system based on hierarchical control;
(1) top level multi-body gaming
The traditional source load game of the park integrated energy system refers to the game of two main bodies of a service provider and a user in a park, namely the traditional load side game, wherein the service provider directly purchases energy from an energy supply side, and the source load game does not relate to the source load game. But gaming by energy providers and servers, i.e., source-side gaming, is contemplated herein.
In order to cooperate with the algorithm provided by the invention, the multi-subject operation optimization process of the whole park is divided into an upper multi-subject game part and a lower device scheduling optimization part. The upper-layer multi-subject game solving process is shown in fig. 3.
The benefits of energy suppliers, park service providers and park users are comprehensively considered, compared with the 'load' side single-side demand response, the 'source-load' double-side game interaction can effectively improve the economic benefits of all main bodies in the park comprehensive energy system, and the influence of the 'source-load' double-side game interaction on the operation economy of the park comprehensive energy system can be analyzed. In the upper layer game, the objective functions of the energy supplier and the park server are adjusted. The target function of the energy supplier only considers the maximum difference between the total energy selling yield of the supplier and the satisfaction cost, and can control the prices of electric energy, heat energy and natural gas between the supplier and the service provider without considering the dispatching cost of the unit of the supplier; the objective function of the park service provider only considers the maximum difference between the total energy selling yield of the service provider and the satisfaction cost, and the objective function can be controlled to be realized as the energy price between the service provider and the user without considering the cost generated by energy conversion equipment in the park; the objective function of the user is unchanged. The constraints of the three subjects are unchanged.
Wherein, the above-mentioned two side games (that is, upper multi-body games) mean: source-side gaming (energy providers and campus servers) and load-side gaming (campus servers and users).
(2) Lower layer device scheduling optimization
Namely: and according to the Nash equilibrium solution of the game, a device scheduling optimization strategy is appointed. A game search method is matched with a Nash-Q algorithm, so that an optimal Nash equilibrium point in a group of T time periods can be obtained, and the specific scheme is as follows:
after the Nash balance point of the T time period is obtained, the internal units of the energy supplier and the park service provider are scheduled according to the load condition and the price condition under the Nash balance point, and the lower-layer equipment scheduling optimization solving flow is shown in fig. 4.
In the lower-layer optimization, only two main bodies, namely a supplier and a service provider, are provided, the objective functions of the two main bodies are the minimum self comprehensive production cost, the controllable strategy of the supplier is the output of each machine set of the supplier, the controllable strategy of the service provider is the output and the energy purchasing power of each device of the service provider, and the constraint condition is unchanged.
Step 103: solving a Nash equilibrium point based on a Stackelberg game;
(1) game solving principle based on reinforcement learning
Reinforcement Learning (RL) is a common machine Learning method, which is a method in which an agent obtains a reward by interacting with the environment as a Reinforcement signal to direct the agent's behavior, with the goal of maximizing the reward obtained by the agent.
Each benefit agent of the park is considered as an agent, and the reward of each agent is not only dependent on the strategy selected by the agent, but also related to the strategies of other agents. In the invention, each intelligent agent selects a greedy strategy to give priority to the benefits of the intelligent agents. If there is for any agent i:
Figure BDA0002991982340000081
wherein s represents a state, a represents an action,
Figure BDA0002991982340000082
representing agent i in state action combination
Figure BDA0002991982340000083
Obtained byIn return, Ri(s,a1,a2,...,ai,...,an) Representing the reward that agent i gets using either action in state s.
(2) Game searching method based on enhanced signals
The enhanced signal refers to a reward obtained in the interaction process between the agent and the environment, namely, a reward obtained when the agent performs a certain action. In order to rapidly solve the Nash equilibrium solution, the invention provides a game search method based on an enhanced signal, which applies the enhanced signal in the enhanced learning to describe the actual physical significance of the Nash equilibrium point in the multi-master-slave game. Under different conditions, for all combined actions, the invention judges whether the combined action at the moment satisfies the formula (12) (namely a reporting constraint formula) according to the strengthening signal of the agent, if so, the combined action is a Nash equilibrium solution. The execution flow of the method is shown in fig. 5.
The method can quickly solve the Nash equilibrium point in a certain state, and comprises the following steps:
the first step is as follows: dispersing the motion space;
the second step is that: each agent removes the action combinations which do not meet the constraint according to the constraint conditions, and leaves the actions which meet the constraint as an action set;
the third step: calculating the income of each intelligent agent under all combined actions in the action set, storing income data in a table, and naming the table as an R table;
the fourth step: selecting one agent according to the sequence from agent 1 to agent n, searching the optimal action of the selected agent under all the combined actions of all the other unselected agents, deleting the other actions of the selected agent, and only keeping the optimal action. The method for selecting the optimal action is to select the action with the maximum return value in the table R. For the selected agent, the action set only has the optimal action.
The fifth step: and storing the combined action in the existing action set, wherein the stored combined action is the Nash equilibrium point in the state.
Step 104: acquiring a multi-subject full-time optimal strategy based on Nash-Q learning;
the Nash-Q algorithm is a commonly used artificial intelligence algorithm for solving a multi-agent game, and an iterative formula of the algorithm is as follows:
Figure BDA0002991982340000091
wherein the content of the first and second substances,
Figure BDA0002991982340000092
indicates that agent i is in a state action combination (s, a)1,a2,...,ai,...,an) Iteration value of the next k-th time, Ri(s,a1,a2,...,ai,...,an) Indicates that each agent uses the action combination (a) in the state s1,a2,...,ai,...,an) The direct benefit obtained by agent i, α represents the learning rate, β represents the discount factor, s 'represents the next state, NashQ (s') represents a Nash equilibrium solution for the next state.
Step 105: and formulating a general scheme for solving and considering optimal operation of the park comprehensive energy system of the multi-subject game. And (4) solving the output of the optimal equipment of each main body, and realizing the optimal scheduling under the Nash balance of the whole system.
And (5) taking the Nash equilibrium point obtained in the steps 103 and 104 as an input, and obtaining a device scheduling result by using a CPLEX solver with the aim of minimum production cost.
And adjusting the interaction strategy among the main bodies according to the result obtained in the step 104, and coordinating the output of the internal equipment of each main body according to the result obtained in the step 105, so that the park comprehensive energy system can operate at the optimal point under Nash balance.
Compared with the traditional campus multi-subject game solving scheme, the scheme provided by the invention has two main advantages:
1. the game searching method provided by the invention can obviously improve the speed of multi-subject game solving under the condition of not losing too much benefit or not losing benefit, the shortest computing time of the game searching method can reach 0.075% of the traditional particle swarm optimization (the detailed data is shown in the table 2 in the embodiment of the invention), and the specific advantages are as follows:
(1) in an actual park, renewable energy sources, user loads and the like are prone to have certain uncertainty, the traditional algorithm is long in calculation time, and the given strategy is prone to have high time delay. If the game search method is used for calculation, the faster calculation speed provides conditions for real-time prediction of the park following renewable energy sources and user loads, so that the real-time performance of the park operation strategy is guaranteed, and the economic benefit of the park is improved.
(2) There are many emergency situations in the power system, such as: customer overload, line short, generator failure, etc., and the processing time of such an emergency is only on the order of seconds at the maximum. The traditional algorithm has too long calculation time, and cannot give a control strategy in time when an emergency occurs, which brings great economic loss to the whole park. If a game search method is used, the power system can still operate at the optimal point quickly in an emergency, and the economic benefit of the whole park is improved.
(3) The power system contains more inductive elements, which causes the power system to have a higher time lag, so the scheduling process of the power center needs to be completed within an hour or even several hours. The game search method reduces the delay time in the game of the power system and provides conditions for the rapid scheduling of the power system.
2. The hierarchical control scheme provided by the invention can reduce the dimensionality of the multi-subject game under the condition of not losing too much benefit or not losing benefit, and has the specific advantages that:
(1) and by matching with a Nash-Q algorithm and a game search method, the multi-body game solving speed is further improved, so that the power system is more leisurely in dealing with various conditions.
(2) The dimensionality of the multi-subject game is reduced, so that more computer space can be saved, and the computing resources of a power system are saved.
(3) The information required to be transmitted in the power system is reduced, and the network channel flow is saved.
Specific examples are given below, in order to verify the feasibility of the above method, as described in detail below:
the embodiment of the invention sets a park comprehensive energy multi-subject game model with T being 24h, and constructs a park comprehensive energy system model as shown in figure 2, wherein an energy supplier comprises: the system comprises a power grid, a heat source plant and an energy supplier, wherein a service party is a park service provider, and a user comprises three loads of cold, heat and gas. The park service provider owns a cogeneration unit, P2G equipment, a gas boiler, a wind generating set and a photovoltaic generating set. The embodiment of the invention sets a power grid electricity purchase price of 110USD/MWh, a heat source plant heat purchase price of 100USD/MWh, a power grid company network passing fee of 10USD/MWh, and a load reduction compensation cost of 5 USD/MWh; the punishment cost of the environmental pollution unit is that the efficiency of a 3USD/MWh transformer is set to be 0.95, the efficiency of P2G equipment is set to be 0.7, the electric energy production efficiency of the cogeneration unit is 0.25, the heat energy production efficiency is 0.65, and the production efficiency of a gas boiler is 0.9. The initial load has been given in fig. 6.
The embodiment of the invention sets that the selling price of the electric energy of the energy supplier is not higher than 115USD/MWh, and the selling prices of the heat energy and the gas energy are not higher than 110 USD/MWh. For the park service provider, the pricing rate of the three energy sources is between 85USD/MWh and 90 USD/MWh. Meanwhile, the embodiment of the invention considers that the predicted values of wind power and photovoltaic power generation are taken as the maximum values of the wind generating set and the photovoltaic generating set at each moment, and the specific numerical values are shown in fig. 7. The game search method and the Nash-Q algorithm are utilized to carry out game solving on the whole game process, the learning rate alpha is set to be 0.01, and the discount factor beta is set to be 0.9.
The upper and lower layer optimization results of the example are analyzed to truly reflect the situation of the park.
The upper level game results are shown in fig. 8, 9 and 10, respectively.
As can be seen from the comparison between fig. 6 and fig. 10, the user load at each time is reduced, which is a result of the interaction between the purchase energy cost function and the comfort level function in the user objective function; as can be seen from the analysis of the pricing curves of the energy suppliers and the park servers in fig. 8 and 9, they always tend to select higher energy prices at times when the customer load is higher, because the increase in energy prices at times when the load is higher brings an energy sales yield greater than the loss of satisfaction in the satisfaction function, for example at times 8-12 and 18-21 when the electrical load power is higher, at which time both the energy suppliers and the park servers increase their electrical energy prices. Whereas for time periods 1-5 and 22-24 the electrical load on the user is lower, at which time the benefit of lowering the price and thus the cost of satisfaction is higher.
The scheduling results of the electrical, thermal and pneumatic devices in the lower layer optimization are shown in fig. 11, 12 and 13, respectively.
In fig. 11, wind power and photovoltaic power are almost input according to predicted values, because the wind power and photovoltaic power in this embodiment are very low in production cost and do not need to pay for environmental pollution abatement expenses; when the wind power and the photoelectricity can not meet the electric load demand, the load is preferentially supplied by the modes of power purchasing of a power grid and power purchasing of a supplier, and the power purchasing power in the graph represents the sum of the power purchasing amounts of the two power purchasing modes; the electrical load may also be provided by the CHP aggregate, for example at times 6, 13, 17, etc., since these times have not only a certain electrical power shortage but also a certain thermal power shortage, in which case the CHP aggregate is only activated.
In FIG. 12, the park facilitator has two forms of purchasing heat from the hot grid and from the suppliers, and which form is selected for purchasing heat depends on the relative size of the supplier heat rate and the grid heat rate during the current time period; the input of the gas boiler GB depends on the gas price of an energy supplier, and when the heat energy generated by the gas is positive for the facilitator, the facilitator can choose to use the gas boiler; the condition of the CHP unit has already been analyzed and is not described in detail.
In fig. 13, all the gas loads were satisfied by directly purchasing from the supplier, because the P2G equipment was set to 0.7 in efficiency, which is less economical, and the gas loads were satisfied by other forms in the example, and thus the P2G equipment was not used. If the gas load value at the tenth hour is changed to three times the original value, the power of P2G is 2.06 MW.
In order to highlight the advantages of the game, the park is subjected to scene analysis:
scene 1: the energy supplier, the park service provider and the user perform multi-subject game interaction of electricity, heat and gas, and the user considers demand response;
scene 2: only the electricity and heat game process is carried out among the energy suppliers, the park service providers and the users, all gas prices are fixed, and the users consider demand response;
scene 3: the energy supplier, the park service provider and the user only carry out an electric game process, all heat prices and gas prices are fixed, and the user considers demand response;
scene 4: and (4) not playing the game, fixing all prices, and considering the demand response by the user.
The income results under different scenes are shown in table 1, and as the types of energy sources participating in the game are more, the income of the service providers and the suppliers is increased, so that the effectiveness of the multi-subject game is verified.
Table 1 revenue table for service provider and supplier under different scenes
Figure BDA0002991982340000121
In order to verify the rapidity and the correctness of the game search method, the game search method is compared with the particle swarm algorithm, and a result of a scene one is calculated, as shown in table 2. The numbers in the first column of the table represent the discrete levels of the game search method, 50 represents the use of the game search method with the discrete level being selected as 50, 80 represents the use of the game search method with the discrete level being selected as 80, and so on. The population number of the particle swarm algorithm is 50, and the maximum iteration number is 80.
Table 2 scene game search method and particle swarm algorithm comparison result
Figure BDA0002991982340000122
As can be seen from the table 2, the calculation speed of the game search method is obviously improved, and the calculation result is not changed greatly.
Based on the same inventive concept, as an implementation of the above method, referring to fig. 14, an embodiment of the present invention further provides an apparatus for optimizing multi-agent operation of an integrated energy system based on reinforcement learning, where the apparatus includes:
the building module 1 is used for building a multi-body model of the park comprehensive energy system;
the dividing and interaction module 2 is used for dividing the optimization process of the multi-subject model into an upper-layer multi-subject game and a lower-layer equipment scheduling optimization, and adopts a source-load double-side game interaction;
the screening and solving module 3 is used for screening the Nash equilibrium points in a permutation and combination mode based on the Stackelberg game definition, and obtaining optimally stored combination actions by combining a Nash-Q algorithm, namely the Nash equilibrium points in the current state;
and the calculating module 4 is used for calculating the optimal operation state of each main body device by using a CPLEX solver with the minimum production cost of each main body as a target function.
It should be noted that the device description in the above embodiments corresponds to the description of the method embodiments, and the embodiments of the present invention are not described herein again.
The execution main bodies of the modules and units can be devices with calculation functions, such as a computer, a single chip microcomputer and a microcontroller, and in the specific implementation, the execution main bodies are not limited in the embodiment of the invention and are selected according to the requirements in practical application.
Based on the same inventive concept, the embodiment of the invention also provides a comprehensive energy system multi-main-body operation optimization device based on reinforcement learning, which comprises: a processor and a memory, the memory having stored therein program instructions, the processor calling the program instructions stored in the memory to cause the apparatus to perform the following method steps in an embodiment:
building a multi-subject model of a park comprehensive energy system, dividing the optimization process of the multi-subject model into an upper-layer multi-subject game and a lower-layer equipment scheduling optimization, and adopting a source-load double-side game interaction;
screening Nash equilibrium points in a permutation and combination mode based on the Stackelberg game definition, and obtaining optimally stored combination actions by combining a Nash-Q algorithm, namely the Nash equilibrium points in the current state;
and solving the optimal running state of each main body device by using a CPLEX solver with the minimum production cost of each main body as a target function.
The method comprises the following steps of obtaining an optimally stored combined action by combining a Nash-Q algorithm, namely a Nash equilibrium point in the current state:
1) dispersing the motion space;
2) each agent removes the action combinations which do not meet the constraint according to the return constraint condition, and reserves the actions which meet the constraint as an action set;
3) calculating the income of each intelligent agent under all combined actions in the action set, and storing income data in a table;
4) selecting one agent according to the sequence from agent 1 to agent n, searching the optimal action of the selected agent under all the combined actions of all the other unselected agents, deleting the other actions of the selected agent, and only keeping the optimal action;
5) and storing the combined actions in the existing action set, wherein the stored combined actions are the optimal strategy in the whole time period.
It should be noted that the device description in the above embodiments corresponds to the method description in the embodiments, and the embodiments of the present invention are not described herein again.
The execution main bodies of the processor and the memory can be devices with calculation functions such as a computer, a single chip microcomputer and a microcontroller, and in the specific implementation, the execution main bodies are not limited in the embodiment of the invention and are selected according to the requirements in practical application.
The data signals are transmitted between the memory and the processor through the bus, which is not described in detail in the embodiments of the present invention.
Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute the method steps in the foregoing embodiments.
The computer readable storage medium includes, but is not limited to, flash memory, hard disk, solid state disk, and the like.
It should be noted that the descriptions of the readable storage medium in the above embodiments correspond to the descriptions of the method in the embodiments, and the descriptions of the embodiments of the present invention are not repeated here.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer.
The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium or a semiconductor medium, etc.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A comprehensive energy system multi-main-body operation optimization method based on reinforcement learning is characterized by comprising the following steps:
building a multi-subject model of a park comprehensive energy system, dividing the optimization process of the multi-subject model into an upper-layer multi-subject game and a lower-layer equipment scheduling optimization, and adopting a source-load double-side game interaction;
screening Nash equilibrium points in a permutation and combination mode based on the Stackelberg game definition, and obtaining the optimal combination action in the whole time period by combining a Nash-Q algorithm, namely the optimal strategy of the current typical day; and solving the optimal running state of each main body device by using a CPLEX solver with the minimum production cost of each main body as a target function.
2. The comprehensive energy system multi-body operation optimization method based on reinforcement learning as claimed in claim 1, wherein the step of screening the Nash equilibrium points in a permutation and combination manner based on the Stackelberg game definition specifically comprises the steps of:
and (3) applying a reinforcement signal in reinforcement learning to describe the actual physical significance of the Nash equilibrium point in the multi-master-slave game, judging whether the combined action meets the return constraint condition of any intelligent agent or not according to the reinforcement signal, and if so, determining that the combined action is a Nash equilibrium solution.
3. The comprehensive energy system multi-body operation optimization method based on reinforcement learning according to claim 1 or 2, characterized in that the optimal combination action obtained in the whole time period by combining with Nash-Q algorithm, that is, the optimal strategy of the current typical day is specifically:
1) dispersing the motion space;
2) each agent removes the action combinations which do not meet the constraint according to the return constraint condition, and reserves the actions which meet the constraint as an action set;
3) calculating the income of each intelligent agent under all combined actions in the action set, and storing income data in a table;
4) selecting one agent according to the sequence from agent 1 to agent n, searching the optimal action of the selected agent under all the combined actions of all the other unselected agents, deleting the other actions of the selected agent, and only keeping the optimal action;
5) and storing the combined actions in the existing action set, wherein the stored combined actions are the optimal strategy in the whole time period.
4. The comprehensive energy system multi-agent operation optimization method based on reinforcement learning according to claim 3, wherein the optimal action of searching for the selected agent is specifically as follows:
and selecting the action with the maximum return value in the table, wherein the action set of the selected agent only has the optimal action.
5. An integrated energy system multi-subject operation optimization device based on reinforcement learning, the device comprising:
the building module is used for building a multi-body model of the park comprehensive energy system;
the dividing and interacting module is used for dividing the optimization process of the multi-subject model into an upper-layer multi-subject game and a lower-layer equipment scheduling optimization, and adopts a source-load double-side game interaction;
the screening and solving module is used for screening the Nash equilibrium points in a permutation and combination mode based on the Stackelberg game definition, and obtaining optimally stored combination actions by combining a Nash-Q algorithm, namely the Nash equilibrium points in the current state;
and the calculating module is used for calculating the optimal running state of each main body device by using a CPLEX solver with the minimum production cost of each main body as a target function.
6. An integrated energy system multi-subject operation optimization device based on reinforcement learning, the device comprising: a processor and a memory, the memory having stored therein program instructions, the processor calling upon the program instructions stored in the memory to cause the apparatus to perform the method steps of any of claims 1-4.
7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method steps of any of claims 1-4.
CN202110318894.9A 2021-03-25 2021-03-25 Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning Pending CN113177655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110318894.9A CN113177655A (en) 2021-03-25 2021-03-25 Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110318894.9A CN113177655A (en) 2021-03-25 2021-03-25 Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN113177655A true CN113177655A (en) 2021-07-27

Family

ID=76922700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110318894.9A Pending CN113177655A (en) 2021-03-25 2021-03-25 Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113177655A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081758A (en) * 2022-08-22 2022-09-20 广东电网有限责任公司肇庆供电局 Calculation transfer demand response system oriented to coordination data center and power grid

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081758A (en) * 2022-08-22 2022-09-20 广东电网有限责任公司肇庆供电局 Calculation transfer demand response system oriented to coordination data center and power grid
CN115081758B (en) * 2022-08-22 2023-01-03 广东电网有限责任公司肇庆供电局 Calculation transfer demand response system oriented to coordination data center and power grid

Similar Documents

Publication Publication Date Title
Shafiekhani et al. Strategic bidding of virtual power plant in energy markets: A bi-level multi-objective approach
Li et al. Distributed tri-layer risk-averse stochastic game approach for energy trading among multi-energy microgrids
Giuntoli et al. Optimized thermal and electrical scheduling of a large scale virtual power plant in the presence of energy storages
Kagiannas et al. Power generation planning: a survey from monopoly to competition
Varkani et al. A new self-scheduling strategy for integrated operation of wind and pumped-storage power plants in power markets
Basnet et al. Integrating gas energy storage system in a peer-to-peer community energy market for enhanced operation
Liu et al. Worst-case conditional value-at-risk based bidding strategy for wind-hydro hybrid systems under probability distribution uncertainties
Liu et al. Research on cloud energy storage service in residential microgrids
Yang et al. Optimal bidding strategy of renewable-based virtual power plant in the day-ahead market
Dadashi et al. Coordination of wind power producers with an energy storage system for the optimal participation in wholesale electricity markets
Bagheri et al. Stochastic optimization and scenario generation for peak load shaving in Smart District microgrid: sizing and operation
Wang et al. A Stackelberg game-based approach to transaction optimization for distributed integrated energy system
Khouzestani et al. Virtual smart energy hub: A powerful tool for integrated multi energy systems operation
Ali Development and Improvement of Renewable Energy Integrated with Energy Trading Schemes based on Advanced Optimization Approaches
Peng et al. Review on bidding strategies for renewable energy power producers participating in electricity spot markets
Peng et al. Sequential coalition formation for wind-thermal combined bidding
Zhu et al. A bilevel bidding and clearing model incorporated with a pricing strategy for the trading of energy storage use rights
Chasparis et al. A cooperative demand-response framework for day-ahead optimization in battery pools
Kong et al. Independence enhancement of distributed generation systems by integrating shared energy storage system and energy community with internal market
Shen et al. The economics of renewable energy portfolio management in solar based microgrids: A comparative study of smart strategies in the market
CN113363973A (en) Combined heat and power dispatching method and device
Aguilar et al. Intent profile strategy for virtual power plant participation in simultaneous energy markets with dynamic storage management
CN113177655A (en) Comprehensive energy system multi-main-body operation optimization method and device based on reinforcement learning
Haidar et al. A market framework for energy bidding decision-making strategy to provide a competitive mechanism in the context of deregulated electricity market
Zhang et al. Joint optimal operation and bidding strategy of scenic reservoir group considering energy storage sharing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210727