CN113809780A - An optimal scheduling method for microgrid based on improved Q-learning penalty selection - Google Patents

An optimal scheduling method for microgrid based on improved Q-learning penalty selection Download PDF

Info

Publication number
CN113809780A
CN113809780A CN202111115317.6A CN202111115317A CN113809780A CN 113809780 A CN113809780 A CN 113809780A CN 202111115317 A CN202111115317 A CN 202111115317A CN 113809780 A CN113809780 A CN 113809780A
Authority
CN
China
Prior art keywords
cost
microgrid
power
grid
wind
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111115317.6A
Other languages
Chinese (zh)
Other versions
CN113809780B (en
Inventor
姜河
周航
安琦
叶瀚文
李兆滢
赵琰
林盛
赵涛
胡宸嘉
白金禹
辛长庆
何雨桐
王亚茹
姜铭坤
魏莫杋
孙笑雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Engineering
Original Assignee
Shenyang Institute of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Engineering filed Critical Shenyang Institute of Engineering
Priority to CN202111115317.6A priority Critical patent/CN113809780B/en
Publication of CN113809780A publication Critical patent/CN113809780A/en
Application granted granted Critical
Publication of CN113809780B publication Critical patent/CN113809780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/50Photovoltaic [PV] energy
    • Y02E10/56Power conversion systems, e.g. maximum power point trackers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E70/00Other energy conversion or management systems reducing GHG emissions
    • Y02E70/30Systems combining energy storage with energy generation of non-fossil origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Power Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Feedback Control In General (AREA)

Abstract

本发明涉及一种基于改进Q学习惩罚选择的微电网优化调度方法,包括如下步骤:步骤1:以微电网内部常规机组运行成本、环境效益成本、大电网功率交互成本构建目标函数;步骤2:建立微电网运行的约束条件;步骤3:构造以最高弃风弃光成本与风光完全消纳成本的为最高与最低阈值的惩罚回报函数;步骤4:采用多元宇宙优化算法改进传统Q学习算法;步骤5:将步骤1所得目标函数进行马尔科夫决策描述处理,并以改进的Q学习算法对所得状态与空间描述进行规划求解。本发明降低了微电网运行调度中可再生能源的弃用率,减少了微电网与大电网能量交互的波动性,解决了传统优化方法响应慢、不收敛的问题,提升了微电网运行的稳定性与经济性。

Figure 202111115317

The invention relates to a microgrid optimization scheduling method based on improved Q-learning penalty selection, comprising the following steps: Step 1: constructing an objective function based on the operating cost of conventional units in the microgrid, the environmental benefit cost, and the power interaction cost of the large power grid; Step 2: Establish the constraints for the operation of the microgrid; Step 3: Construct the penalty reward function with the highest and lowest thresholds of the highest cost of abandoning wind and solar energy and the cost of complete wind and light absorption as the highest and lowest thresholds; Step 4: Using the multiverse optimization algorithm to improve the traditional Q-learning algorithm; Step 5: The objective function obtained in step 1 is subjected to Markov decision description processing, and the obtained state and space description are solved by the improved Q-learning algorithm. The invention reduces the abandonment rate of renewable energy in the operation and scheduling of the microgrid, reduces the volatility of the energy interaction between the microgrid and the large grid, solves the problems of slow response and non-convergence of the traditional optimization method, and improves the stability of the microgrid operation. sex and economy.

Figure 202111115317

Description

Microgrid optimization scheduling method based on improved Q learning penalty selection
Technical Field
The invention relates to a microgrid economic dispatching method, in particular to a microgrid optimal dispatching method based on improved Q learning penalty selection.
Background
Along with the continuous adjustment of energy structures, a micro-grid system which is composed of various types of energy equipment and widely dispersed is widely applied by virtue of the advantages of independent power transmission, power distribution, rapid scheduling, large renewable energy ratio, island operation and the like. The micro-grid system can improve the power supply quality of remote areas and can effectively prevent the problems of power supply interruption and the like caused by natural disasters.
With the continuous support of national policies on new energy industries, the wind-solar grid-connected scale is continuously increased. However, due to the fluctuation and uncertainty of wind power and photovoltaic output, the large-scale access of the photovoltaic grid to the microgrid causes the problems of unbalanced power inside the system, reduced power quality and the like. How to promote the new energy power generation ratio while ensuring the stable and safe operation in the micro-grid system is a problem which needs to be solved urgently at present.
The inside of the microgrid comprises a traditional unit, a new energy generator set, an energy storage unit and various load requirements, and the problem of the power generation cost of a single unit considered by the traditional scheduling problem cannot meet the requirements of quick, economic, environmental protection and safe scheduling pursued by the microgrid system. Therefore, the method has important significance for multi-target comprehensive scheduling of the micro-grid system, new operating conditions of various units and optimization and coordination of various units and load requirements.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a microgrid optimization scheduling method based on improved Q learning penalty selection, a reward and penalty step type wind and light abandoning penalty return function is introduced into a traditional microgrid scheduling method in which a conventional unit, a wind and light unit and an energy storage unit run in a coordinated mode, and the state and action of the microgrid scheduling problem are described through a Q learning algorithm improved by a multi-universe optimization algorithm, so that the lowest overall scheduling cost is realized on the basis of meeting the penalty return function, the abandonment rate of renewable energy sources is reduced, the volatility of energy interaction between a microgrid and a large power grid is reduced, the problems of slow response and non-convergence of the traditional optimization method are solved, and the stability and the economy of microgrid operation are improved.
In order to solve the problems in the prior art, the technical scheme adopted by the invention is as follows:
a microgrid optimization scheduling method based on improved Q learning penalty selection comprises the following steps:
step 1: constructing a target function according to the running cost, the environmental benefit cost and the large power grid power interaction cost of a conventional unit inside a micro-grid;
step 2: establishing constraint conditions of micro-grid operation;
and step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
and 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the state-action function of the optimized improved Q learning algorithm is represented as follows:
Figure BDA0003275050220000021
in the formula: fsAs a state feature of traditional Q learning;
Figure BDA0003275050220000022
the motion characteristics are optimized by a multivariate universe optimization algorithm;
Figure BDA0003275050220000023
respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;
Figure BDA0003275050220000024
YTrespectively is a reward value and a discount coefficient under iteration;
and 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
Wherein, the step 1 comprises the following steps:
step 1.1: under the condition of wind-solar high-proportion grid connection, a conventional unit is divided into a conventional operation state and a low-load operation state, and the conventional power generation cost inside a microgrid is represented as follows:
Figure BDA0003275050220000031
in the formula: a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxCritical power of the ith conventional unit in a normal operation state and a low-power operation state;
step 1.2: under the condition of uncertain wind and light output, the start-stop cost of the conventional unit is expressed as follows:
Figure BDA0003275050220000032
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe shutdown critical time is the shutdown critical time of the unit in cold-state starting and hot-state starting;
step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Figure BDA0003275050220000033
Em(Pi)=(αi,mi,mPii,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants;
αi,m、βi,m、γi,m、ζi,m、δi,mthe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
Figure BDA0003275050220000034
in the formula: lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;
Figure BDA0003275050220000041
the price of electricity sold and purchased by a large power grid;
step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
minF=Fcf+Fon-off+Fg+Fgrid
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridThe operation cost, the start-stop cost, the pollution treatment cost and the power interaction cost of the micro-grid and the large grid are respectively the conventional unit operation cost, the start-stop cost, the pollution treatment cost and the micro-grid and large grid power interaction cost.
Wherein, the step 2 comprises the following steps:
step 2.1: the power balance constraint is expressed as follows:
Figure BDA0003275050220000042
in the formula:
Figure BDA0003275050220000043
respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;
Figure BDA0003275050220000044
storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; t is the total operating time period of the micro-grid, and 24 hours are taken;
step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively;
step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
Figure BDA0003275050220000045
in the formula:
Figure BDA0003275050220000046
the minimum continuous stop time of the unit;
Figure BDA0003275050220000047
the minimum continuous starting time of the unit.
Wherein, the step 3 comprises the following steps:
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
Figure BDA0003275050220000051
Figure BDA0003275050220000052
in the formula:
Figure BDA0003275050220000053
the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; lambda is the growth step length of the specified amount of growth;
step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
Figure BDA0003275050220000054
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient; k is the interval increase step of the penalty factor.
Wherein, the step 5 comprises the following steps:
step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, and the state description of each main body in the system in the iterative process T is represented as:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
Figure BDA0003275050220000061
step 5.3: the method for solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm comprises the following steps:
5.31) defining minimum and maximum limits of abandoned wind and abandoned light quantity in the microgrid, and dividing abandoned windAbandoning the light punishment interval, initializing each parameter of the multi-element universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration number MAX and the initial wormhole position Xij
5.32) randomly selecting the initial state of the Q learning algorithm
Figure BDA0003275050220000062
5.33) initial action of the multivariate cosmic algorithm optimized Q learning greedy strategy
Figure BDA0003275050220000063
5.34) outputting an initial state based on a greedy strategy
Figure BDA0003275050220000064
Performing initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the action
Figure BDA0003275050220000065
And calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
Figure BDA0003275050220000066
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
Figure BDA0003275050220000067
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
Figure BDA0003275050220000071
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) judging whether the objective function value is a global optimum value, if not, returning to the step 5.38);
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
Further, in the step 3.2, the reward punishment step-type wind and light abandonment punishment return function is used as an action value in the improved Q learning method.
Further, in the step 4, a multivariate cosmic optimization algorithm is adopted to improve the optimal value of the state feature corresponding to the objective function in the traditional Q learning algorithm.
Further, the step 4 adopts a multivariate cosmic optimization algorithm to improve the conventional Q learning algorithm, and specifically comprises the following steps:
the multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(ii) a And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
Figure BDA0003275050220000072
Figure BDA0003275050220000073
the invention has the advantages and beneficial effects that:
the method provided by the invention gives consideration to wind-light consumption, environmental benefits and economic benefits, establishes a mathematical model for a target function by considering conventional units, wind-light units, energy storage units, large power grid interaction processes and pollutant treatment inside a microgrid, and introduces a reward and punishment step type wind and light abandoning punishment return function to further plan wind-light power generation grid connection. Meanwhile, a Q learning algorithm improved by a multi-universe algorithm is provided, the state and the action parameters of the traditional Q learning are corresponding to the target function and the constraint condition of the micro-grid dispatching and the light abandoning and punishment of the abandoned wind, and the maximum environmental benefit and the complete wind and light consumption are realized while the stable power supply of the system is met. The improved Q learning algorithm provided by the invention adopts a planning mechanism for optimization, avoids the problem of optimal value local convergence generated in the optimization process of the traditional algorithm, considers a selection mechanism of wind and light abandoning punishment return, and solves the problem of multi-objective optimization in a microgrid scheduling model.
The method reduces the abandonment rate of renewable energy sources in the operation scheduling of the micro-grid, reduces the fluctuation of energy interaction between the micro-grid and the large grid, solves the problems of slow response and non-convergence of the traditional optimization method, and improves the stability and the economy of the operation of the micro-grid.
Drawings
The invention is described in further detail below with reference to the following figures and examples:
FIG. 1 is a flow chart of a Q learning algorithm optimization of a multivariate universe optimization algorithm improvement;
FIG. 2 is a simulation plot wind-solar energy consumption curve;
FIG. 3 is a simulation graph composite cost curve;
fig. 4 is a flowchart of a microgrid optimization scheduling method based on improved Q learning penalty selection according to the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
As shown in fig. 4, the method for optimizing and scheduling a microgrid based on improved Q learning penalty selection of the present invention includes the following steps:
step 1: constructing an objective function according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit inside a microgrid;
step 1.1: under the condition of wind-solar high-proportion grid connection, the conventional unit is divided into a conventional operation state and a low-load operation state, namely the conventional power generation cost inside the microgrid is expressed as follows:
Figure BDA0003275050220000091
in the formula: fcfThe running cost of the conventional unit is reduced; a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxThe critical power of the normal operation state and the low-power operation state of the ith conventional unit.
Step 1.2: under the condition of uncertain wind and light processing, the start-stop cost of the conventional unit is expressed as follows:
Figure BDA0003275050220000092
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe unit is the shutdown critical time of cold-state start and hot-state start.
Step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Figure BDA0003275050220000093
Em(Pi)=(αi,mi,mPii,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants; alpha is alphai,m、βi,m、γi,m、ζi,m、δi,mThe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
Figure BDA0003275050220000101
in the formula: fgridThe cost is the power interaction cost of the micro-grid and the large grid; lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;
Figure BDA0003275050220000102
the price of electricity sold and purchased by a large power grid.
Step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
minF=Fcf+Fon-off+Fg+Fgrid
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridRespectively the running cost, the starting and stopping cost, the pollution treatment cost, the micro-grid and the large-scale power grid of the conventional unitGrid power interaction cost.
Step 2: establishing constraint conditions of micro-grid operation;
step 2.1: the power balance constraint is expressed as follows:
Figure BDA0003275050220000103
in the formula:
Figure BDA0003275050220000104
respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;
Figure BDA0003275050220000105
storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; and T is the total operating time period of the micro-grid, and 24h is taken.
Step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively.
Step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
Figure BDA0003275050220000111
in the formula:
Figure BDA0003275050220000112
the minimum continuous stop time of the unit;
Figure BDA0003275050220000113
the minimum continuous starting time of the unit.
And step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
Figure BDA0003275050220000114
Figure BDA0003275050220000115
in the formula:
Figure BDA0003275050220000116
the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; λ is an increase step length of a prescribed quota increase amount.
Step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
Figure BDA0003275050220000117
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient; k is the interval increase step of the penalty factor.
And 3.2, taking the reward punishment step type wind and light abandoning punishment return function as an action value in the improved Q learning method.
And 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the multivariate universe optimization algorithm is used as a heuristic search algorithm, the universe is used as a feasible problem solution, and cyclic iteration is performed through the interaction of the black holes, the white holes and the wormholes, namely, the optimal selection of the traditional Q learning algorithm in an unsupervised state is subjected to iterative optimization, so that an enhanced target solution is obtained. The state-action function of the optimized improved Q learning algorithm is represented as follows:
Figure BDA0003275050220000121
in the formula: fsAs the state characteristic of the traditional Q learning, the state characteristic corresponds to a target function F operated by the micro-grid system;
Figure BDA0003275050220000122
corresponding to the reward punishment step type wind and light abandoning punishment return function value d for the action characteristics optimized by the multi-universe optimization algorithmab
Figure BDA0003275050220000123
Respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;
Figure BDA0003275050220000124
YTrespectively, the reward value and discount coefficient under iteration.
The multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(initial error precision is γ)T0). And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
Figure BDA0003275050220000125
Figure BDA0003275050220000126
in the formula:
Figure BDA0003275050220000127
the action characteristic and the state characteristic at the T-1 moment are obtained;
Figure BDA0003275050220000128
state characteristics at time T;
Figure BDA0003275050220000129
is the reward value at time T-1
And improving the optimal value of the state characteristic corresponding to the objective function in the traditional Q learning algorithm by the multivariate universe optimization algorithm.
And 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
Step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, so that the state description of each main body in the system in the iterative process T is represented as follows:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
Figure BDA0003275050220000131
step 5.3: as shown in fig. 1, the steps of solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm are as follows:
5.31) micro-gridsDividing the minimum and maximum limit of the internal abandoned wind light quantity, dividing the abandoned wind light punishment interval, initializing each parameter of the multivariate universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration number MAX and the initial wormhole position Xij
5.32) randomly selecting the initial state of the Q learning algorithm
Figure BDA0003275050220000132
5.33) initial action of the multivariate cosmic algorithm optimized Q learning greedy strategy
Figure BDA0003275050220000133
5.34) outputting an initial state based on a greedy strategy
Figure BDA0003275050220000134
Performing initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the action
Figure BDA0003275050220000135
And calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
Figure BDA0003275050220000136
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
Figure BDA0003275050220000141
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
Figure BDA0003275050220000142
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) whether the objective function value is the global optimum value or not, and if not, returning to the step 5.38).
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
Carrying out experiment simulation by adopting the classic electric load requirement in the conventional micro-grid, wherein the experiment parameters are set as follows:
Figure BDA0003275050220000143
the method provided by the invention is used for carrying out optimized dispatching on a typical micro-grid comprising a wind power plant, a photovoltaic power plant, a gas turbine unit and an energy storage unit, supposing that power interaction exists between the micro-grid and a large power grid, and carrying out optimized solving on an objective function by adopting a traditional particle swarm algorithm and the improved Q learning algorithm to obtain a system comprehensive dispatching plan meeting the maximum wind and light consumption. As shown in FIGS. 2 and 3, through comparative analysis of simulation experiments, the total wind and light consumption of the micro-grid dispatching by using the method provided by the invention is improved by 33.18%, and the comprehensive cost is reduced by 6.51%. Therefore, the wind-solar energy consumption ratio can be greatly improved in the scheduling planning process of the micro-grid, and the maximization of the economic benefit is achieved while the environmental benefit is met.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (8)

1.一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于:包括如下步骤:1. a microgrid optimization scheduling method based on improving Q learning penalty selection, is characterized in that: comprise the steps: 步骤1:以微电网内部常规机组运行成本、环境效益成本、大电网功率交互成本构建目标函数;Step 1: Construct an objective function based on the operating cost of conventional units in the microgrid, the cost of environmental benefits, and the cost of interaction of large grid power; 步骤2:建立微电网运行的约束条件;Step 2: Establish constraints for microgrid operation; 步骤3:构造以最高弃风弃光成本与风光完全消纳成本为最高与最低阈值的惩罚回报函数;Step 3: Construct a penalty reward function with the highest cost of abandoning wind and light and the cost of complete absorption of scenery as the highest and lowest thresholds; 步骤4:采用多元宇宙优化算法改进传统Q学习算法;Step 4: Use the multiverse optimization algorithm to improve the traditional Q-learning algorithm; 优化后的改进Q学习算法的状态-动作函数表示如下:The state-action function of the optimized improved Q-learning algorithm is expressed as follows:
Figure FDA0003275050210000011
Figure FDA0003275050210000011
式中:Fs作为传统Q学习的状态特征;
Figure FDA0003275050210000012
为经多元宇宙优化算法优化后的动作特征;
Figure FDA0003275050210000013
分别为状态特征与动作特征的初始值;Emvo-p为MVO-Q策略下的期望值;T为迭代次数;
Figure FDA0003275050210000014
YT分别为迭代下的奖赏值与折扣系数;
where: F s is the state feature of traditional Q-learning;
Figure FDA0003275050210000012
is the action feature optimized by the multiverse optimization algorithm;
Figure FDA0003275050210000013
are the initial values of state features and action features, respectively; E mvo-p is the expected value under the MVO-Q strategy; T is the number of iterations;
Figure FDA0003275050210000014
Y T are the reward value and discount coefficient under the iteration, respectively;
步骤5:将步骤1所得目标函数进行马尔科夫决策描述处理,并以改进的Q学习算法对所得状态与动作描述进行规划求解。Step 5: Perform Markov decision description processing on the objective function obtained in Step 1, and use the improved Q-learning algorithm to solve the obtained state and action description.
2.根据权利要求1所述的一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于所述步骤1包括如下步骤:2. a kind of microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, is characterized in that described step 1 comprises the steps: 步骤1.1:在风光高比例并网情况下,将常规机组分为常规运行与在低负荷的运行状态,微电网内部常规发电成本表示如下:Step 1.1: In the case of a high proportion of wind and solar power grid-connected, the conventional units are divided into normal operation and low-load operation. The conventional power generation cost in the microgrid is expressed as follows:
Figure FDA0003275050210000015
Figure FDA0003275050210000015
式中:a、b、c为常规机组正常运行状态下的成本因子;Pi为第i台常规机组出力;g、h、l、p为低负荷运行状态下的成本因子;kPi,max为第i台常规机组的正常运行状态与低功率运行状态的临界功率;In the formula: a, b, and c are the cost factors of the conventional unit under normal operation; P i is the output of the i-th conventional unit; g, h, l, and p are the cost factors under low-load operation; kP i,max is the critical power of the i-th conventional unit in the normal operating state and the low-power operating state; 步骤1.2:风光不确定出力情况下,常规机组的启停成本表示如下:Step 1.2: In the case of uncertain wind and solar output, the start and stop costs of conventional units are expressed as follows:
Figure FDA0003275050210000021
Figure FDA0003275050210000021
式中:Fon-off为常规机组启停成本;C为机组的启停次数;K(ti,r)为第i机组第r次启动的成本;ti,r为第i机组在C次启动前的连续停运时间;C(ti,r)为机组冷态启动是相关辅助系统的操作成本;tcold-hot为机组冷态启动与热态启动的停运临界时间;In the formula: F on-off is the start and stop cost of the conventional unit; C is the start and stop times of the unit; K(t i,r ) is the rth start-up cost of the i-th unit; t i,r is the i-th unit at C C(t i,r ) is the operating cost of the relevant auxiliary systems for the cold start of the unit; t cold-hot is the critical time of shutdown between cold start and hot start of the unit; 步骤1.3:常规机组发电排放污染物主要含有氮氧化物、硫氧化物以及二氧化碳等,其治理成本表示如下:Step 1.3: The pollutants emitted by conventional power generation units mainly contain nitrogen oxides, sulfur oxides and carbon dioxide, etc. The treatment costs are expressed as follows:
Figure FDA0003275050210000022
Figure FDA0003275050210000022
Em(Pi)=(αi,mi,mPii,mPi 2)+ζi,mexp(δi,mPi)E m (P i )=(α i,mi,m P ii,m P i 2 )+ζ i,m exp(δ i,m P i ) 式中:Fg为常规机组污染治理成本;M为排放污染物的种类;Em(Pi)为第i台机组污染物的排放量;ηm为第m类污染物的治理成本系数;αi,m、βi,m、γi,m、ζi,m、δi,m为第i台机组排放的第m种污染物的排放系数;In the formula: F g is the pollution control cost of conventional units; M is the type of pollutants discharged; E m (P i ) is the discharge amount of pollutants of the i-th unit; η m is the treatment cost coefficient of the m-th type of pollutants; α i,m , β i,m , γ i,m , ζ i,m , δ i,m are the emission coefficients of the mth pollutant discharged by the ith unit; 步骤1.4:微电网与大电网的功率交换成本表示如下:Step 1.4: The power exchange cost between the microgrid and the large grid is expressed as follows: Fgrid=λpPsu/shCt grid F gridp P su/sh C t grid 式中:λp为微电网售购电状态,售电取值为1,购电取值为-1;Psu/sh为微电网内部的功率盈余与缺额;
Figure FDA0003275050210000023
为大电网的售购电价格;
In the formula: λ p is the state of electricity sales and purchase of the microgrid, the value of electricity sales is 1, and the value of electricity purchase is -1; P su/sh is the power surplus and shortage within the microgrid;
Figure FDA0003275050210000023
The purchase price of electricity for the large power grid;
步骤1.5:以微电网内部常规机组运行成本、环境效益成本、主电网功率交换成本构建目标函数表示如下:Step 1.5: Construct the objective function based on the operating cost of conventional units in the microgrid, the cost of environmental benefits, and the cost of power exchange of the main grid, and the expression is as follows: min F=Fcf+Fon-off+Fg+Fgridmin F=F cf +F on-off +F g +F grid . 式中:F为微电网系统运行的目标函数值;Fcf、Fon-off、Fg、Fgrid分别为常规机组运行成本、启停成本、污染治理成本以及微电网与大电网功率交互成本。In the formula: F is the objective function value of the microgrid system operation; F cf , F on-off , F g , and F grid are the operating cost of conventional units, the cost of starting and stopping, the cost of pollution control, and the cost of power interaction between the micro grid and the large grid. .
3.根据权利要求1所述的一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于所述步骤2包括如下步骤:3. a kind of microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, is characterized in that described step 2 comprises the steps: 步骤2.1:功率平衡约束表示如下:Step 2.1: The power balance constraint is expressed as follows:
Figure FDA0003275050210000031
Figure FDA0003275050210000031
式中:
Figure FDA0003275050210000032
分别表示t时段常规机组、风电与光伏输出功率;
Figure FDA0003275050210000033
为t时段蓄电池的储释功率;Pt grid为与大电网交互功率;Pt L为t时段的总负荷功率;T为微电网运行总时段,取24h;
where:
Figure FDA0003275050210000032
Respectively represent the output power of conventional units, wind power and photovoltaics in period t;
Figure FDA0003275050210000033
is the storage and release power of the battery in the t period; P t grid is the interactive power with the large grid; P t L is the total load power in the t period; T is the total operating period of the micro grid, which is 24h;
步骤2.2:蓄电池储释状态约束表示如下:Step 2.2: The battery storage and release state constraints are expressed as follows: SOCmin≤SOC(t)≤SOCmax SOC min ≤SOC(t)≤SOC max 式中:SOC(t)为蓄电池t时刻荷电状态;SOCmin与SOCmax分别代表蓄电池的最大与最小荷电状态;In the formula: SOC(t) is the state of charge of the battery at time t; SOC min and SOC max represent the maximum and minimum state of charge of the battery, respectively; 步骤2.3:对于常规机组而言,其累计的启停时间应该大于最小连续启停时间,其约束表示如下:Step 2.3: For conventional units, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraints are expressed as follows:
Figure FDA0003275050210000034
Figure FDA0003275050210000034
式中:
Figure FDA0003275050210000035
为机组最小的连续停止时间;
Figure FDA0003275050210000036
为机组最小的连续启动时间。
where:
Figure FDA0003275050210000035
is the minimum continuous stop time of the unit;
Figure FDA0003275050210000036
It is the minimum continuous start time of the unit.
4.根据权利要求1所述的一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于所述步骤3包括如下步骤:4. a kind of microgrid optimization scheduling method based on improving Q learning penalty selection according to claim 1, is characterized in that described step 3 comprises the steps: 步骤3.1:规定微电网内部弃风弃光量的最低与最高额度,划分风光完全消纳量至弃风弃光量最高额度的增长区间χn,区间表示如下:Step 3.1: Specify the minimum and maximum amount of curtailment of wind and light in the microgrid, and divide the growth interval χ n from the complete consumption of wind and solar to the maximum amount of curtailed wind and light, and the interval is expressed as follows:
Figure FDA0003275050210000041
Figure FDA0003275050210000041
Figure FDA0003275050210000042
Figure FDA0003275050210000042
式中:
Figure FDA0003275050210000043
分别为系统内部规定的弃风弃光量的最高与最低额度;n为所划分的区间个数;λ为规定额度增长量的增长步长;
where:
Figure FDA0003275050210000043
are the maximum and minimum limits of the amount of abandoned wind and light within the system, respectively; n is the number of divided intervals; λ is the growth step of the specified amount increase;
步骤3.2:根据系统对于弃风弃光量所规定的额度区间,将其进行线性化处理获得奖惩阶梯型弃风弃光惩罚回报函数,函数表示如下:Step 3.2: According to the quota range specified by the system for the amount of abandoned wind and light, linearize it to obtain a reward and punishment ladder type punishment return function for abandoning wind and light, and the function is expressed as follows:
Figure FDA0003275050210000044
Figure FDA0003275050210000044
式中:dab弃风弃光惩罚回报函数值;Pab,wp为系统的弃风弃光量;c为弃风弃光惩罚系数;k为惩罚系数的区间增长步长。In the formula: d ab is the value of the penalty reward function for abandoning wind and light; P ab,wp is the amount of abandoning wind and light of the system; c is the penalty coefficient for abandoning wind and light; k is the interval growth step size of the penalty coefficient.
5.根据权利要求1所述的一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于所述步骤5包括如下步骤:5. a kind of microgrid optimization scheduling method based on improving Q learning penalty selection according to claim 1, is characterized in that described step 5 comprises the steps: 步骤5.1:步骤1所述目标函数包含机组运行成本、环境效益成本、主电网功率交换成本,将系统内各主体在迭代过程T中的状态描述表示为:Step 5.1: The objective function described in Step 1 includes the operating cost of the unit, the cost of environmental benefits, and the cost of power exchange of the main grid, and the state description of each subject in the system in the iterative process T is expressed as: Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]F s =[F cf ,F on-off ,E m (P i ),F g ,F grid ,F] 步骤5.2:步骤2所述约束条件包含常规机组输出功率、风电与光伏输出功率、蓄电池的储释功率、大电网交互功率、总负荷功率,同时兼顾弃风弃光量奖惩原则,将其进行离散化处理为N个动作所得到的系统内各主体在迭代过程T中的动作描述,表示为:Step 5.2: The constraints described in Step 2 include the output power of conventional units, the output power of wind power and photovoltaics, the storage and release power of the battery, the interactive power of the large grid, and the total load power. At the same time, the principle of reward and punishment for abandoning wind and light is discretized. The action description of each subject in the system in the iterative process T obtained by processing N actions is expressed as:
Figure FDA0003275050210000051
Figure FDA0003275050210000051
步骤5.3:多元宇宙算法改进的Q学习算法求解目标函数的最优值步骤如下:Step 5.3: The improved Q-learning algorithm of the multiverse algorithm solves the optimal value of the objective function. The steps are as follows: 5.31)规定微电网内部弃风弃光量的最低与最高额度,划分弃风弃光惩罚区间,初始化多元宇宙算法各项参数,其中宇宙个体数N,维数n,最大迭代次数MAX,初始虫洞位置Xij5.31) Specify the minimum and maximum amount of abandoning wind and light in the microgrid, divide the penalty interval for abandoning wind and light, and initialize the parameters of the multiverse algorithm, where the number of universes is N, the dimension is n, the maximum number of iterations MAX, the initial wormhole position X ij ; 5.32)随机选定Q学习算法的初始状态
Figure FDA0003275050210000052
5.32) Randomly select the initial state of the Q-learning algorithm
Figure FDA0003275050210000052
5.33)多元宇宙算法优化Q学习贪婪策略的初始动作
Figure FDA0003275050210000053
5.33) The multiverse algorithm optimizes the initial actions of the Q-learning greedy policy
Figure FDA0003275050210000053
5.34)基于贪婪策略输出初始状态为
Figure FDA0003275050210000054
的初始动作,进行初始寻优准备;
5.34) Based on the greedy strategy, the output initial state is
Figure FDA0003275050210000054
The initial action of , to prepare for the initial optimization;
5.35)依据优化后的初始动作进行目标函数最优值minF的求解;5.35) According to the optimized initial action, the optimal value minF of the objective function is solved; 5.36)判断是否满足误差精度;5.36) Judging whether the error accuracy is met; 5.37)若满足误差精度,选定动作
Figure FDA0003275050210000055
并计算多元宇宙算法的最优值更新与虫洞距离,同时进行下一次迭代,最优值更新公式如下:
5.37) If the error accuracy is satisfied, select the action
Figure FDA0003275050210000055
And calculate the optimal value update of the multiverse algorithm and the wormhole distance, and perform the next iteration at the same time. The optimal value update formula is as follows:
Figure FDA0003275050210000056
Figure FDA0003275050210000056
式中:Xj为最优宇宙个体所在位置;p1/p2/p3∈[0,1],为随机数;ε为宇宙膨胀率;uj,lj为x的上下限;η为虫洞在所有个体中占比,由迭代次数l与最大迭代次数L规定,表示如下:In the formula: X j is the location of the optimal universe individual; p 1 /p 2 /p 3 ∈[0,1] is a random number; ε is the expansion rate of the universe; u j , l j are the upper and lower limits of x; η is the proportion of wormholes in all individuals, specified by the number of iterations l and the maximum number of iterations L, expressed as follows:
Figure FDA0003275050210000057
Figure FDA0003275050210000057
多元宇宙算法寻优机制为黑洞与摆动遵循轮盘赌机制进行选择、个体通过膨胀与自变向当前最优宇宙移动,移动过程中最优移动距离与迭代精度p有关,表示如下:The optimization mechanism of the multiverse algorithm is that the black hole and the swing follow the roulette mechanism for selection, and the individual moves to the current optimal universe through expansion and self-transformation. The optimal moving distance in the moving process is related to the iteration accuracy p, which is expressed as follows:
Figure FDA0003275050210000061
Figure FDA0003275050210000061
5.38)若不满足误差精度,则抛弃本次迭代动作重新进行动作选择并返回步骤5.35);5.38) If the error accuracy is not met, discard this iterative action and re-select the action and return to step 5.35); 5.39)判断是否目标函数值是否为全局最优值,如果不是,则返回步骤5.38);5.39) Determine whether the objective function value is the global optimal value, if not, return to step 5.38); 5.40)若为全局最优值,则输出最终状态与动作;5.40) If it is the global optimal value, output the final state and action; 5.41)计算最终结果。5.41) Calculate the final result.
6.根据权利要求4所述的一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于所述步骤3.2将奖惩阶梯型弃风弃光惩罚回报函数作为改进Q学习方法中的动作值。6. a kind of microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 4, it is characterized in that described step 3.2 takes reward and punishment ladder type abandoning wind and abandoning light penalty reward function as the action in improving Q learning method value. 7.根据权利要求1所述的一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于:所述步骤4采用多元宇宙优化算法改进传统Q学习算法中的状态特征对应目标函数的最优值。7. a kind of microgrid optimization scheduling method based on improving Q learning penalty selection according to claim 1, is characterized in that: described step 4 adopts multiverse optimization algorithm to improve the state characteristic in traditional Q learning algorithm corresponding to the target function. The optimal value. 8.根据权利要求1所述的一种基于改进Q学习惩罚选择的微电网优化调度方法,其特征在于所述步骤4采用多元宇宙优化算法改进传统Q学习算法的改进方法包括以下步骤:8. a kind of microgrid optimization scheduling method based on improving Q learning penalty selection according to claim 1, it is characterized in that described step 4 adopts multiverse optimization algorithm to improve the improvement method of traditional Q learning algorithm comprises the following steps: 使用多元宇宙算法对Q学习的多级贪婪动作进行优化,降低寻优中冗余动作的发生,进而降低本次迭代结果Qmvo-q的误差精度γT;在不满足本次迭代误差精度的情况下进行下一次状态-动作策略,采用多元宇宙算法进行下一次的优化处理,优化公式表示如下:Use the multiverse algorithm to optimize the multi-level greedy actions of Q learning, reduce the occurrence of redundant actions in the optimization, and then reduce the error accuracy γ T of the iterative result Q mvo-q ; In this case, the next state-action strategy is carried out, and the multiverse algorithm is used to carry out the next optimization process. The optimization formula is expressed as follows:
Figure FDA0003275050210000062
Figure FDA0003275050210000062
Figure FDA0003275050210000063
Figure FDA0003275050210000063
CN202111115317.6A 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection Active CN113809780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111115317.6A CN113809780B (en) 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111115317.6A CN113809780B (en) 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection

Publications (2)

Publication Number Publication Date
CN113809780A true CN113809780A (en) 2021-12-17
CN113809780B CN113809780B (en) 2023-06-30

Family

ID=78940309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111115317.6A Active CN113809780B (en) 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection

Country Status (1)

Country Link
CN (1) CN113809780B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418198A (en) * 2021-12-30 2022-04-29 国网辽宁省电力有限公司电力科学研究院 A piecewise functional calculation method for the penalty cost of abandoning new energy
CN114862048A (en) * 2022-05-30 2022-08-05 哈尔滨理工大学 Optimization method of permanent magnet synchronous motor based on improved multiverse optimization algorithm
CN117439190A (en) * 2023-10-26 2024-01-23 华中科技大学 Water, fire and wind system dispatching method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Microgrid energy storage scheduling method and device based on deep Q-value network reinforcement learning
US20190244108A1 (en) * 2018-02-08 2019-08-08 Cognizant Technology Solutions U.S. Corporation System and Method For Pseudo-Task Augmentation in Deep Multitask Learning
JP6667785B1 (en) * 2019-01-09 2020-03-18 裕樹 有光 A program for learning by associating a three-dimensional model with a depth image
CN112084680A (en) * 2020-09-02 2020-12-15 沈阳工程学院 An energy internet optimization strategy method based on DQN algorithm
US20210194424A1 (en) * 2019-04-25 2021-06-24 Shandong University Method and system for power prediction of photovoltaic power station based on operating data of grid-connected inverters

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190244108A1 (en) * 2018-02-08 2019-08-08 Cognizant Technology Solutions U.S. Corporation System and Method For Pseudo-Task Augmentation in Deep Multitask Learning
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Microgrid energy storage scheduling method and device based on deep Q-value network reinforcement learning
JP6667785B1 (en) * 2019-01-09 2020-03-18 裕樹 有光 A program for learning by associating a three-dimensional model with a depth image
US20210194424A1 (en) * 2019-04-25 2021-06-24 Shandong University Method and system for power prediction of photovoltaic power station based on operating data of grid-connected inverters
CN112084680A (en) * 2020-09-02 2020-12-15 沈阳工程学院 An energy internet optimization strategy method based on DQN algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
叶亮;吕智林;王蒙;杨啸;: "基于最优潮流的含多微网的主动配电网双层优化调度", 电力系统保护与控制 *
马留洋;孟安波;葛佳菲;: "基于纵横交叉算法优化BP神经网络的风机齿轮箱故障诊断方法", 广东工业大学学报 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418198A (en) * 2021-12-30 2022-04-29 国网辽宁省电力有限公司电力科学研究院 A piecewise functional calculation method for the penalty cost of abandoning new energy
CN114862048A (en) * 2022-05-30 2022-08-05 哈尔滨理工大学 Optimization method of permanent magnet synchronous motor based on improved multiverse optimization algorithm
CN114862048B (en) * 2022-05-30 2024-09-17 哈尔滨理工大学 Permanent magnet synchronous motor optimization method based on improved multi-element universe optimization algorithm
CN117439190A (en) * 2023-10-26 2024-01-23 华中科技大学 Water, fire and wind system dispatching method, device, equipment and storage medium
CN117439190B (en) * 2023-10-26 2024-06-11 华中科技大学 A method, device, equipment and storage medium for dispatching water, fire and wind systems

Also Published As

Publication number Publication date
CN113809780B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN109193636B (en) Electric power system environmental economy robust scheduling method based on classification uncertain set
CN113809780A (en) An optimal scheduling method for microgrid based on improved Q-learning penalty selection
Zhu et al. Multi-objective optimal scheduling of an integrated energy system under the multi-time scale ladder-type carbon trading mechanism
Umeozor et al. Operational scheduling of microgrids via parametric programming
CN107370188A (en) A kind of power system Multiobjective Scheduling method of meter and wind power output
Li et al. A hybrid dynamic economic environmental dispatch model for balancing operating costs and pollutant emissions in renewable energy: A novel improved mayfly algorithm
CN116187601A (en) Comprehensive energy system operation optimization method based on load prediction
CN114221338B (en) Optimal dispatch method for multi-energy power systems considering power source flexibility and complementarity
CN108985524B (en) Coordination control method of multi-energy complementary system
Yao et al. Multi-level model predictive control based multi-objective optimal energy management of integrated energy systems considering uncertainty
CN104299173B (en) It is a kind of to optimize dispatching method a few days ago suitable for the robust that various energy resources are accessed
CN117833285A (en) A microgrid energy storage optimization scheduling method based on deep reinforcement learning
CN116667325B (en) Micro-grid-connected operation optimization scheduling method based on improved cuckoo algorithm
CN111293718A (en) AC/DC hybrid microgrid partition two-layer optimized operation method based on scene analysis
Zhu et al. Optimal scheduling of a wind energy dominated distribution network via a deep reinforcement learning approach
Yang et al. Data-driven optimal dynamic dispatch for hydro-PV-PHS integrated power systems using deep reinforcement learning approach
CN116418001A (en) Reservoir group multi-energy complementary dispatching method and system to deal with uncertainty of new energy sources
Yang et al. Distributionally robust optimal dispatch modelling of renewable-dominated power system and implementation path for carbon peak
Lv et al. Data-based optimal microgrid management for energy trading with integral Q-learning scheme
Fan et al. Multi-agent deep reinforced co-dispatch of energy and hydrogen storage in low-carbon building clusters
CN116468215A (en) Comprehensive energy system scheduling method and device considering uncertainty of source load
CN115795992A (en) An Online Scheduling Method of Park Energy Internet Based on Virtual Deduction of Operating Situation
CN114239372A (en) Multi-target unit maintenance double-layer optimization method and system considering unit combination
Zhao et al. Research on Multiobjective Optimal Operation Strategy for Wind‐Photovoltaic‐Hydro Complementary Power System
CN116191421A (en) Novel power system multi-objective optimized scheduling method based on improved NSGA-II algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant