CN113809780A - Microgrid optimization scheduling method based on improved Q learning penalty selection - Google Patents

Microgrid optimization scheduling method based on improved Q learning penalty selection Download PDF

Info

Publication number
CN113809780A
CN113809780A CN202111115317.6A CN202111115317A CN113809780A CN 113809780 A CN113809780 A CN 113809780A CN 202111115317 A CN202111115317 A CN 202111115317A CN 113809780 A CN113809780 A CN 113809780A
Authority
CN
China
Prior art keywords
grid
cost
power
wind
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111115317.6A
Other languages
Chinese (zh)
Other versions
CN113809780B (en
Inventor
姜河
周航
安琦
叶瀚文
李兆滢
赵琰
林盛
赵涛
胡宸嘉
白金禹
辛长庆
何雨桐
王亚茹
姜铭坤
魏莫杋
孙笑雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Engineering
Original Assignee
Shenyang Institute of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Engineering filed Critical Shenyang Institute of Engineering
Priority to CN202111115317.6A priority Critical patent/CN113809780B/en
Publication of CN113809780A publication Critical patent/CN113809780A/en
Application granted granted Critical
Publication of CN113809780B publication Critical patent/CN113809780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/50Photovoltaic [PV] energy
    • Y02E10/56Power conversion systems, e.g. maximum power point trackers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E70/00Other energy conversion or management systems reducing GHG emissions
    • Y02E70/30Systems combining energy storage with energy generation of non-fossil origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Power Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to a microgrid optimization scheduling method based on improved Q learning penalty selection, which comprises the following steps: step 1: constructing a target function according to the running cost, the environmental benefit cost and the large power grid power interaction cost of a conventional unit inside a micro-grid; step 2: establishing constraint conditions of micro-grid operation; and step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values; and 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm; and 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and space description by using an improved Q learning algorithm. The method reduces the abandonment rate of renewable energy sources in the operation scheduling of the micro-grid, reduces the fluctuation of energy interaction between the micro-grid and the large-scale grid, solves the problems of slow response and non-convergence of the traditional optimization method, and improves the stability and the economical efficiency of the operation of the micro-grid.

Description

Microgrid optimization scheduling method based on improved Q learning penalty selection
Technical Field
The invention relates to a microgrid economic dispatching method, in particular to a microgrid optimal dispatching method based on improved Q learning penalty selection.
Background
Along with the continuous adjustment of energy structures, a micro-grid system which is composed of various types of energy equipment and widely dispersed is widely applied by virtue of the advantages of independent power transmission, power distribution, rapid scheduling, large renewable energy ratio, island operation and the like. The micro-grid system can improve the power supply quality of remote areas and can effectively prevent the problems of power supply interruption and the like caused by natural disasters.
With the continuous support of national policies on new energy industries, the wind-solar grid-connected scale is continuously increased. However, due to the fluctuation and uncertainty of wind power and photovoltaic output, the large-scale access of the photovoltaic grid to the microgrid causes the problems of unbalanced power inside the system, reduced power quality and the like. How to promote the new energy power generation ratio while ensuring the stable and safe operation in the micro-grid system is a problem which needs to be solved urgently at present.
The inside of the microgrid comprises a traditional unit, a new energy generator set, an energy storage unit and various load requirements, and the problem of the power generation cost of a single unit considered by the traditional scheduling problem cannot meet the requirements of quick, economic, environmental protection and safe scheduling pursued by the microgrid system. Therefore, the method has important significance for multi-target comprehensive scheduling of the micro-grid system, new operating conditions of various units and optimization and coordination of various units and load requirements.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a microgrid optimization scheduling method based on improved Q learning penalty selection, a reward and penalty step type wind and light abandoning penalty return function is introduced into a traditional microgrid scheduling method in which a conventional unit, a wind and light unit and an energy storage unit run in a coordinated mode, and the state and action of the microgrid scheduling problem are described through a Q learning algorithm improved by a multi-universe optimization algorithm, so that the lowest overall scheduling cost is realized on the basis of meeting the penalty return function, the abandonment rate of renewable energy sources is reduced, the volatility of energy interaction between a microgrid and a large power grid is reduced, the problems of slow response and non-convergence of the traditional optimization method are solved, and the stability and the economy of microgrid operation are improved.
In order to solve the problems in the prior art, the technical scheme adopted by the invention is as follows:
a microgrid optimization scheduling method based on improved Q learning penalty selection comprises the following steps:
step 1: constructing a target function according to the running cost, the environmental benefit cost and the large power grid power interaction cost of a conventional unit inside a micro-grid;
step 2: establishing constraint conditions of micro-grid operation;
and step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
and 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the state-action function of the optimized improved Q learning algorithm is represented as follows:
Figure BDA0003275050220000021
in the formula: fsAs a state feature of traditional Q learning;
Figure BDA0003275050220000022
the motion characteristics are optimized by a multivariate universe optimization algorithm;
Figure BDA0003275050220000023
respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;
Figure BDA0003275050220000024
YTrespectively is a reward value and a discount coefficient under iteration;
and 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
Wherein, the step 1 comprises the following steps:
step 1.1: under the condition of wind-solar high-proportion grid connection, a conventional unit is divided into a conventional operation state and a low-load operation state, and the conventional power generation cost inside a microgrid is represented as follows:
Figure BDA0003275050220000031
in the formula: a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxCritical power of the ith conventional unit in a normal operation state and a low-power operation state;
step 1.2: under the condition of uncertain wind and light output, the start-stop cost of the conventional unit is expressed as follows:
Figure BDA0003275050220000032
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe shutdown critical time is the shutdown critical time of the unit in cold-state starting and hot-state starting;
step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Figure BDA0003275050220000033
Em(Pi)=(αi,mi,mPii,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants;
αi,m、βi,m、γi,m、ζi,m、δi,mthe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
Figure BDA0003275050220000034
in the formula: lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;
Figure BDA0003275050220000041
the price of electricity sold and purchased by a large power grid;
step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
minF=Fcf+Fon-off+Fg+Fgrid
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridThe operation cost, the start-stop cost, the pollution treatment cost and the power interaction cost of the micro-grid and the large grid are respectively the conventional unit operation cost, the start-stop cost, the pollution treatment cost and the micro-grid and large grid power interaction cost.
Wherein, the step 2 comprises the following steps:
step 2.1: the power balance constraint is expressed as follows:
Figure BDA0003275050220000042
in the formula:
Figure BDA0003275050220000043
respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;
Figure BDA0003275050220000044
storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; t is the total operating time period of the micro-grid, and 24 hours are taken;
step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively;
step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
Figure BDA0003275050220000045
in the formula:
Figure BDA0003275050220000046
the minimum continuous stop time of the unit;
Figure BDA0003275050220000047
the minimum continuous starting time of the unit.
Wherein, the step 3 comprises the following steps:
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
Figure BDA0003275050220000051
Figure BDA0003275050220000052
in the formula:
Figure BDA0003275050220000053
the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; lambda is the growth step length of the specified amount of growth;
step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
Figure BDA0003275050220000054
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient; k is the interval increase step of the penalty factor.
Wherein, the step 5 comprises the following steps:
step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, and the state description of each main body in the system in the iterative process T is represented as:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
Figure BDA0003275050220000061
step 5.3: the method for solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm comprises the following steps:
5.31) defining minimum and maximum limits of abandoned wind and abandoned light quantity in the microgrid, and dividing abandoned windAbandoning the light punishment interval, initializing each parameter of the multi-element universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration number MAX and the initial wormhole position Xij
5.32) randomly selecting the initial state of the Q learning algorithm
Figure BDA0003275050220000062
5.33) initial action of the multivariate cosmic algorithm optimized Q learning greedy strategy
Figure BDA0003275050220000063
5.34) outputting an initial state based on a greedy strategy
Figure BDA0003275050220000064
Performing initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the action
Figure BDA0003275050220000065
And calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
Figure BDA0003275050220000066
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
Figure BDA0003275050220000067
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
Figure BDA0003275050220000071
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) judging whether the objective function value is a global optimum value, if not, returning to the step 5.38);
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
Further, in the step 3.2, the reward punishment step-type wind and light abandonment punishment return function is used as an action value in the improved Q learning method.
Further, in the step 4, a multivariate cosmic optimization algorithm is adopted to improve the optimal value of the state feature corresponding to the objective function in the traditional Q learning algorithm.
Further, the step 4 adopts a multivariate cosmic optimization algorithm to improve the conventional Q learning algorithm, and specifically comprises the following steps:
the multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(ii) a And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
Figure BDA0003275050220000072
Figure BDA0003275050220000073
the invention has the advantages and beneficial effects that:
the method provided by the invention gives consideration to wind-light consumption, environmental benefits and economic benefits, establishes a mathematical model for a target function by considering conventional units, wind-light units, energy storage units, large power grid interaction processes and pollutant treatment inside a microgrid, and introduces a reward and punishment step type wind and light abandoning punishment return function to further plan wind-light power generation grid connection. Meanwhile, a Q learning algorithm improved by a multi-universe algorithm is provided, the state and the action parameters of the traditional Q learning are corresponding to the target function and the constraint condition of the micro-grid dispatching and the light abandoning and punishment of the abandoned wind, and the maximum environmental benefit and the complete wind and light consumption are realized while the stable power supply of the system is met. The improved Q learning algorithm provided by the invention adopts a planning mechanism for optimization, avoids the problem of optimal value local convergence generated in the optimization process of the traditional algorithm, considers a selection mechanism of wind and light abandoning punishment return, and solves the problem of multi-objective optimization in a microgrid scheduling model.
The method reduces the abandonment rate of renewable energy sources in the operation scheduling of the micro-grid, reduces the fluctuation of energy interaction between the micro-grid and the large grid, solves the problems of slow response and non-convergence of the traditional optimization method, and improves the stability and the economy of the operation of the micro-grid.
Drawings
The invention is described in further detail below with reference to the following figures and examples:
FIG. 1 is a flow chart of a Q learning algorithm optimization of a multivariate universe optimization algorithm improvement;
FIG. 2 is a simulation plot wind-solar energy consumption curve;
FIG. 3 is a simulation graph composite cost curve;
fig. 4 is a flowchart of a microgrid optimization scheduling method based on improved Q learning penalty selection according to the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
As shown in fig. 4, the method for optimizing and scheduling a microgrid based on improved Q learning penalty selection of the present invention includes the following steps:
step 1: constructing an objective function according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit inside a microgrid;
step 1.1: under the condition of wind-solar high-proportion grid connection, the conventional unit is divided into a conventional operation state and a low-load operation state, namely the conventional power generation cost inside the microgrid is expressed as follows:
Figure BDA0003275050220000091
in the formula: fcfThe running cost of the conventional unit is reduced; a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxThe critical power of the normal operation state and the low-power operation state of the ith conventional unit.
Step 1.2: under the condition of uncertain wind and light processing, the start-stop cost of the conventional unit is expressed as follows:
Figure BDA0003275050220000092
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe unit is the shutdown critical time of cold-state start and hot-state start.
Step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Figure BDA0003275050220000093
Em(Pi)=(αi,mi,mPii,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants; alpha is alphai,m、βi,m、γi,m、ζi,m、δi,mThe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
Figure BDA0003275050220000101
in the formula: fgridThe cost is the power interaction cost of the micro-grid and the large grid; lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;
Figure BDA0003275050220000102
the price of electricity sold and purchased by a large power grid.
Step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
minF=Fcf+Fon-off+Fg+Fgrid
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridRespectively the running cost, the starting and stopping cost, the pollution treatment cost, the micro-grid and the large-scale power grid of the conventional unitGrid power interaction cost.
Step 2: establishing constraint conditions of micro-grid operation;
step 2.1: the power balance constraint is expressed as follows:
Figure BDA0003275050220000103
in the formula:
Figure BDA0003275050220000104
respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;
Figure BDA0003275050220000105
storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; and T is the total operating time period of the micro-grid, and 24h is taken.
Step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively.
Step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
Figure BDA0003275050220000111
in the formula:
Figure BDA0003275050220000112
the minimum continuous stop time of the unit;
Figure BDA0003275050220000113
the minimum continuous starting time of the unit.
And step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
Figure BDA0003275050220000114
Figure BDA0003275050220000115
in the formula:
Figure BDA0003275050220000116
the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; λ is an increase step length of a prescribed quota increase amount.
Step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
Figure BDA0003275050220000117
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient; k is the interval increase step of the penalty factor.
And 3.2, taking the reward punishment step type wind and light abandoning punishment return function as an action value in the improved Q learning method.
And 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the multivariate universe optimization algorithm is used as a heuristic search algorithm, the universe is used as a feasible problem solution, and cyclic iteration is performed through the interaction of the black holes, the white holes and the wormholes, namely, the optimal selection of the traditional Q learning algorithm in an unsupervised state is subjected to iterative optimization, so that an enhanced target solution is obtained. The state-action function of the optimized improved Q learning algorithm is represented as follows:
Figure BDA0003275050220000121
in the formula: fsAs the state characteristic of the traditional Q learning, the state characteristic corresponds to a target function F operated by the micro-grid system;
Figure BDA0003275050220000122
corresponding to the reward punishment step type wind and light abandoning punishment return function value d for the action characteristics optimized by the multi-universe optimization algorithmab
Figure BDA0003275050220000123
Respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;
Figure BDA0003275050220000124
YTrespectively, the reward value and discount coefficient under iteration.
The multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(initial error precision is γ)T0). And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
Figure BDA0003275050220000125
Figure BDA0003275050220000126
in the formula:
Figure BDA0003275050220000127
the action characteristic and the state characteristic at the T-1 moment are obtained;
Figure BDA0003275050220000128
state characteristics at time T;
Figure BDA0003275050220000129
is the reward value at time T-1
And improving the optimal value of the state characteristic corresponding to the objective function in the traditional Q learning algorithm by the multivariate universe optimization algorithm.
And 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
Step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, so that the state description of each main body in the system in the iterative process T is represented as follows:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
Figure BDA0003275050220000131
step 5.3: as shown in fig. 1, the steps of solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm are as follows:
5.31) micro-gridsDividing the minimum and maximum limit of the internal abandoned wind light quantity, dividing the abandoned wind light punishment interval, initializing each parameter of the multivariate universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration number MAX and the initial wormhole position Xij
5.32) randomly selecting the initial state of the Q learning algorithm
Figure BDA0003275050220000132
5.33) initial action of the multivariate cosmic algorithm optimized Q learning greedy strategy
Figure BDA0003275050220000133
5.34) outputting an initial state based on a greedy strategy
Figure BDA0003275050220000134
Performing initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the action
Figure BDA0003275050220000135
And calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
Figure BDA0003275050220000136
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
Figure BDA0003275050220000141
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
Figure BDA0003275050220000142
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) whether the objective function value is the global optimum value or not, and if not, returning to the step 5.38).
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
Carrying out experiment simulation by adopting the classic electric load requirement in the conventional micro-grid, wherein the experiment parameters are set as follows:
Figure BDA0003275050220000143
the method provided by the invention is used for carrying out optimized dispatching on a typical micro-grid comprising a wind power plant, a photovoltaic power plant, a gas turbine unit and an energy storage unit, supposing that power interaction exists between the micro-grid and a large power grid, and carrying out optimized solving on an objective function by adopting a traditional particle swarm algorithm and the improved Q learning algorithm to obtain a system comprehensive dispatching plan meeting the maximum wind and light consumption. As shown in FIGS. 2 and 3, through comparative analysis of simulation experiments, the total wind and light consumption of the micro-grid dispatching by using the method provided by the invention is improved by 33.18%, and the comprehensive cost is reduced by 6.51%. Therefore, the wind-solar energy consumption ratio can be greatly improved in the scheduling planning process of the micro-grid, and the maximization of the economic benefit is achieved while the environmental benefit is met.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (8)

1. A microgrid optimization scheduling method based on improved Q learning penalty selection is characterized in that: the method comprises the following steps:
step 1: constructing a target function according to the running cost, the environmental benefit cost and the large power grid power interaction cost of a conventional unit inside a micro-grid;
step 2: establishing constraint conditions of micro-grid operation;
and step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
and 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the state-action function of the optimized improved Q learning algorithm is represented as follows:
Figure FDA0003275050210000011
in the formula: fsAs a state feature of traditional Q learning;
Figure FDA0003275050210000012
the motion characteristics are optimized by a multivariate universe optimization algorithm;
Figure FDA0003275050210000013
respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;
Figure FDA0003275050210000014
YTrespectively is a reward value and a discount coefficient under iteration;
and 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
2. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the step 1 comprises the following steps:
step 1.1: under the condition of wind-solar high-proportion grid connection, a conventional unit is divided into a conventional operation state and a low-load operation state, and the conventional power generation cost inside a microgrid is represented as follows:
Figure FDA0003275050210000015
in the formula: a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxCritical power of the ith conventional unit in a normal operation state and a low-power operation state;
step 1.2: under the condition of uncertain wind and light output, the start-stop cost of the conventional unit is expressed as follows:
Figure FDA0003275050210000021
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe shutdown critical time is the shutdown critical time of the unit in cold-state starting and hot-state starting;
step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Figure FDA0003275050210000022
Em(Pi)=(αi,mi,mPii,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants; alpha is alphai,m、βi,m、γi,m、ζi,m、δi,mThe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
Fgrid=λpPsu/shCt grid
in the formula: lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;
Figure FDA0003275050210000023
the price of electricity sold and purchased by a large power grid;
step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
min F=Fcf+Fon-off+Fg+Fgrid
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridRespectively the running cost and the start and stop of the conventional unitCost of pollution abatement, and cost of power interaction between the microgrid and the large power grid.
3. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the step 2 comprises the following steps:
step 2.1: the power balance constraint is expressed as follows:
Figure FDA0003275050210000031
in the formula:
Figure FDA0003275050210000032
respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;
Figure FDA0003275050210000033
storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; t is the total operating time period of the micro-grid, and 24 hours are taken;
step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively;
step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
Figure FDA0003275050210000034
in the formula:
Figure FDA0003275050210000035
the minimum continuous stop time of the unit;
Figure FDA0003275050210000036
the minimum continuous starting time of the unit.
4. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the step 3 comprises the following steps:
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
Figure FDA0003275050210000041
Figure FDA0003275050210000042
in the formula:
Figure FDA0003275050210000043
the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; lambda is the growth step length of the specified amount of growth;
step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
Figure FDA0003275050210000044
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient;k is the interval increase step of the penalty factor.
5. The method according to claim 1, wherein the step 5 comprises the following steps:
step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, and the state description of each main body in the system in the iterative process T is represented as:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
Figure FDA0003275050210000051
step 5.3: the method for solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm comprises the following steps:
5.31) specifying the minimum and maximum limits of the wind abandoning light abandoning amount in the microgrid, dividing a wind abandoning light abandoning punishment interval, and initializing various parameters of a multi-element universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration times MAX, and the initial wormhole position Xij
5.32) randomly selecting the initial state of the Q learning algorithm
Figure FDA0003275050210000052
5.33) initial action of the multivariate cosmic algorithm optimized Q learning greedy strategy
Figure FDA0003275050210000053
5.34) outputting an initial state based on a greedy strategy
Figure FDA0003275050210000054
Performing initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the action
Figure FDA0003275050210000055
And calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
Figure FDA0003275050210000056
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
Figure FDA0003275050210000057
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
Figure FDA0003275050210000061
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) judging whether the objective function value is a global optimum value, if not, returning to the step 5.38);
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
6. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 4, characterized in that in the step 3.2, a reward penalty step-type wind curtailment light curtailment penalty return function is used as an action value in the improved Q learning method.
7. The microgrid optimized scheduling method based on improved Q learning penalty selection as claimed in claim 1, wherein: and 4, improving the optimal value of the state feature corresponding to the objective function in the traditional Q learning algorithm by adopting a multi-universe optimization algorithm.
8. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the improvement method of improving the traditional Q learning algorithm by adopting a multivariate cosmic optimization algorithm in the step 4 comprises the following steps:
the multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(ii) a And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
Figure FDA0003275050210000062
Figure FDA0003275050210000063
CN202111115317.6A 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection Active CN113809780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111115317.6A CN113809780B (en) 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111115317.6A CN113809780B (en) 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection

Publications (2)

Publication Number Publication Date
CN113809780A true CN113809780A (en) 2021-12-17
CN113809780B CN113809780B (en) 2023-06-30

Family

ID=78940309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111115317.6A Active CN113809780B (en) 2021-09-23 2021-09-23 Micro-grid optimal scheduling method based on improved Q learning punishment selection

Country Status (1)

Country Link
CN (1) CN113809780B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418198A (en) * 2021-12-30 2022-04-29 国网辽宁省电力有限公司电力科学研究院 Segmented function type calculation method for punishment cost of abandoned new energy
CN114862048A (en) * 2022-05-30 2022-08-05 哈尔滨理工大学 Permanent magnet synchronous motor optimization method based on improved multivariate universe optimization algorithm
CN117439190A (en) * 2023-10-26 2024-01-23 华中科技大学 Water, fire and wind system dispatching method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
US20190244108A1 (en) * 2018-02-08 2019-08-08 Cognizant Technology Solutions U.S. Corporation System and Method For Pseudo-Task Augmentation in Deep Multitask Learning
JP6667785B1 (en) * 2019-01-09 2020-03-18 裕樹 有光 A program for learning by associating a three-dimensional model with a depth image
CN112084680A (en) * 2020-09-02 2020-12-15 沈阳工程学院 Energy Internet optimization strategy method based on DQN algorithm
US20210194424A1 (en) * 2019-04-25 2021-06-24 Shandong University Method and system for power prediction of photovoltaic power station based on operating data of grid-connected inverters

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190244108A1 (en) * 2018-02-08 2019-08-08 Cognizant Technology Solutions U.S. Corporation System and Method For Pseudo-Task Augmentation in Deep Multitask Learning
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
JP6667785B1 (en) * 2019-01-09 2020-03-18 裕樹 有光 A program for learning by associating a three-dimensional model with a depth image
US20210194424A1 (en) * 2019-04-25 2021-06-24 Shandong University Method and system for power prediction of photovoltaic power station based on operating data of grid-connected inverters
CN112084680A (en) * 2020-09-02 2020-12-15 沈阳工程学院 Energy Internet optimization strategy method based on DQN algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
叶亮;吕智林;王蒙;杨啸;: "基于最优潮流的含多微网的主动配电网双层优化调度", 电力系统保护与控制 *
马留洋;孟安波;葛佳菲;: "基于纵横交叉算法优化BP神经网络的风机齿轮箱故障诊断方法", 广东工业大学学报 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418198A (en) * 2021-12-30 2022-04-29 国网辽宁省电力有限公司电力科学研究院 Segmented function type calculation method for punishment cost of abandoned new energy
CN114862048A (en) * 2022-05-30 2022-08-05 哈尔滨理工大学 Permanent magnet synchronous motor optimization method based on improved multivariate universe optimization algorithm
CN114862048B (en) * 2022-05-30 2024-09-17 哈尔滨理工大学 Permanent magnet synchronous motor optimization method based on improved multi-element universe optimization algorithm
CN117439190A (en) * 2023-10-26 2024-01-23 华中科技大学 Water, fire and wind system dispatching method, device, equipment and storage medium
CN117439190B (en) * 2023-10-26 2024-06-11 华中科技大学 Water, fire and wind system dispatching method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113809780B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
Li et al. Sizing of a stand-alone microgrid considering electric power, cooling/heating, hydrogen loads and hydrogen storage degradation
CN113809780B (en) Micro-grid optimal scheduling method based on improved Q learning punishment selection
Amer et al. Optimization of hybrid renewable energy systems (HRES) using PSO for cost reduction
CN112036611B (en) Power grid optimization planning method considering risks
CN111210079B (en) Operation optimization method and system for distributed energy virtual power plant
CN105870976B (en) A kind of low-carbon dispatching method and device based on energy environment efficiency
CN111030188A (en) Hierarchical control strategy containing distributed and energy storage
CN114221338B (en) Multi-energy power system optimal scheduling method considering power supply flexibility and complementarity
CN112966444B (en) Intelligent energy optimization method and device for building multi-energy system
CN113408962A (en) Power grid multi-time scale and multi-target energy optimal scheduling method
Li et al. A hybrid dynamic economic environmental dispatch model for balancing operating costs and pollutant emissions in renewable energy: A novel improved mayfly algorithm
Zhu et al. Multi-objective optimal scheduling of an integrated energy system under the multi-time scale ladder-type carbon trading mechanism
CN114676991B (en) Multi-energy complementary system optimal scheduling method based on source-load double-side uncertainty
CN111668878A (en) Optimal configuration method and system for renewable micro-energy network
CN107634547A (en) Contributed based on new energy and predict that the electric association system of error goes out electric control method
Yao et al. Multi-level model predictive control based multi-objective optimal energy management of integrated energy systems considering uncertainty
CN108985524A (en) One kind is provided multiple forms of energy to complement each other system coordination control method
CN111682531B (en) PL-IMOCS-based wind, light, water and fire primary energy complementary short-term optimization scheduling method and device
CN114493222A (en) Wind power plant energy storage power station multi-market participation strategy considering output prediction and price
CN116468215A (en) Comprehensive energy system scheduling method and device considering uncertainty of source load
CN117578537A (en) Micro-grid optimal scheduling method based on carbon transaction and demand response
CN116822695A (en) Capacity optimization configuration method, storage medium and device for wind-solar hydrogen production system
Liang et al. Real-time optimization of large-scale hydrogen production systems using off-grid renewable energy: Scheduling strategy based on deep reinforcement learning
Yang et al. Data-driven optimal dynamic dispatch for Hydro-PV-PHS integrated power systems using deep reinforcement learning approach
Fan et al. Multi-agent deep reinforced co-dispatch of energy and hydrogen storage in low-carbon building clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant