CN113809780A - Microgrid optimization scheduling method based on improved Q learning penalty selection - Google Patents
Microgrid optimization scheduling method based on improved Q learning penalty selection Download PDFInfo
- Publication number
- CN113809780A CN113809780A CN202111115317.6A CN202111115317A CN113809780A CN 113809780 A CN113809780 A CN 113809780A CN 202111115317 A CN202111115317 A CN 202111115317A CN 113809780 A CN113809780 A CN 113809780A
- Authority
- CN
- China
- Prior art keywords
- grid
- cost
- power
- wind
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000006870 function Effects 0.000 claims abstract description 56
- 230000008901 benefit Effects 0.000 claims abstract description 17
- 230000003993 interaction Effects 0.000 claims abstract description 17
- 230000007613 environmental effect Effects 0.000 claims abstract description 14
- 238000010521 absorption reaction Methods 0.000 claims abstract description 4
- 230000009471 action Effects 0.000 claims description 34
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 claims description 18
- 239000003344 environmental pollutant Substances 0.000 claims description 16
- 231100000719 pollutant Toxicity 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 15
- 230000005611 electricity Effects 0.000 claims description 12
- MWUXSHHQAYIFBG-UHFFFAOYSA-N nitrogen oxide Inorganic materials O=[N] MWUXSHHQAYIFBG-UHFFFAOYSA-N 0.000 claims description 9
- 238000010248 power generation Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 8
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 claims description 6
- 230000000977 initiatory effect Effects 0.000 claims description 6
- 238000010977 unit operation Methods 0.000 claims description 4
- DWPVVZZGGGCRRM-UHFFFAOYSA-N (4-methoxyphenyl)-(4-methylpiperazin-1-yl)methanone Chemical compound C1=CC(OC)=CC=C1C(=O)N1CCN(C)CC1 DWPVVZZGGGCRRM-UHFFFAOYSA-N 0.000 claims description 3
- 229910002092 carbon dioxide Inorganic materials 0.000 claims description 3
- 239000001569 carbon dioxide Substances 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- XTQHKBHJIVJGKJ-UHFFFAOYSA-N sulfur monoxide Chemical class S=O XTQHKBHJIVJGKJ-UHFFFAOYSA-N 0.000 claims description 3
- 229910052815 sulfur oxide Inorganic materials 0.000 claims description 3
- 230000006872 improvement Effects 0.000 claims description 2
- 230000004044 response Effects 0.000 abstract description 3
- 238000004146 energy storage Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 238000005265 energy consumption Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06315—Needs-based resource requirements planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/28—Arrangements for balancing of the load in a network by storage of energy
- H02J3/32—Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/22—The renewable source being solar energy
- H02J2300/24—The renewable source being solar energy of photovoltaic origin
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/28—The renewable source being wind energy
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/40—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E10/00—Energy generation through renewable energy sources
- Y02E10/50—Photovoltaic [PV] energy
- Y02E10/56—Power conversion systems, e.g. maximum power point trackers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E70/00—Other energy conversion or management systems reducing GHG emissions
- Y02E70/30—Systems combining energy storage with energy generation of non-fossil origin
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Marketing (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Power Engineering (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Feedback Control In General (AREA)
Abstract
The invention relates to a microgrid optimization scheduling method based on improved Q learning penalty selection, which comprises the following steps: step 1: constructing a target function according to the running cost, the environmental benefit cost and the large power grid power interaction cost of a conventional unit inside a micro-grid; step 2: establishing constraint conditions of micro-grid operation; and step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values; and 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm; and 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and space description by using an improved Q learning algorithm. The method reduces the abandonment rate of renewable energy sources in the operation scheduling of the micro-grid, reduces the fluctuation of energy interaction between the micro-grid and the large-scale grid, solves the problems of slow response and non-convergence of the traditional optimization method, and improves the stability and the economical efficiency of the operation of the micro-grid.
Description
Technical Field
The invention relates to a microgrid economic dispatching method, in particular to a microgrid optimal dispatching method based on improved Q learning penalty selection.
Background
Along with the continuous adjustment of energy structures, a micro-grid system which is composed of various types of energy equipment and widely dispersed is widely applied by virtue of the advantages of independent power transmission, power distribution, rapid scheduling, large renewable energy ratio, island operation and the like. The micro-grid system can improve the power supply quality of remote areas and can effectively prevent the problems of power supply interruption and the like caused by natural disasters.
With the continuous support of national policies on new energy industries, the wind-solar grid-connected scale is continuously increased. However, due to the fluctuation and uncertainty of wind power and photovoltaic output, the large-scale access of the photovoltaic grid to the microgrid causes the problems of unbalanced power inside the system, reduced power quality and the like. How to promote the new energy power generation ratio while ensuring the stable and safe operation in the micro-grid system is a problem which needs to be solved urgently at present.
The inside of the microgrid comprises a traditional unit, a new energy generator set, an energy storage unit and various load requirements, and the problem of the power generation cost of a single unit considered by the traditional scheduling problem cannot meet the requirements of quick, economic, environmental protection and safe scheduling pursued by the microgrid system. Therefore, the method has important significance for multi-target comprehensive scheduling of the micro-grid system, new operating conditions of various units and optimization and coordination of various units and load requirements.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a microgrid optimization scheduling method based on improved Q learning penalty selection, a reward and penalty step type wind and light abandoning penalty return function is introduced into a traditional microgrid scheduling method in which a conventional unit, a wind and light unit and an energy storage unit run in a coordinated mode, and the state and action of the microgrid scheduling problem are described through a Q learning algorithm improved by a multi-universe optimization algorithm, so that the lowest overall scheduling cost is realized on the basis of meeting the penalty return function, the abandonment rate of renewable energy sources is reduced, the volatility of energy interaction between a microgrid and a large power grid is reduced, the problems of slow response and non-convergence of the traditional optimization method are solved, and the stability and the economy of microgrid operation are improved.
In order to solve the problems in the prior art, the technical scheme adopted by the invention is as follows:
a microgrid optimization scheduling method based on improved Q learning penalty selection comprises the following steps:
step 1: constructing a target function according to the running cost, the environmental benefit cost and the large power grid power interaction cost of a conventional unit inside a micro-grid;
step 2: establishing constraint conditions of micro-grid operation;
and step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
and 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the state-action function of the optimized improved Q learning algorithm is represented as follows:
in the formula: fsAs a state feature of traditional Q learning;the motion characteristics are optimized by a multivariate universe optimization algorithm;respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;YTrespectively is a reward value and a discount coefficient under iteration;
and 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
Wherein, the step 1 comprises the following steps:
step 1.1: under the condition of wind-solar high-proportion grid connection, a conventional unit is divided into a conventional operation state and a low-load operation state, and the conventional power generation cost inside a microgrid is represented as follows:
in the formula: a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxCritical power of the ith conventional unit in a normal operation state and a low-power operation state;
step 1.2: under the condition of uncertain wind and light output, the start-stop cost of the conventional unit is expressed as follows:
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe shutdown critical time is the shutdown critical time of the unit in cold-state starting and hot-state starting;
step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Em(Pi)=(αi,m+βi,mPi+γi,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants;
αi,m、βi,m、γi,m、ζi,m、δi,mthe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
in the formula: lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;the price of electricity sold and purchased by a large power grid;
step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
minF=Fcf+Fon-off+Fg+Fgrid。
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridThe operation cost, the start-stop cost, the pollution treatment cost and the power interaction cost of the micro-grid and the large grid are respectively the conventional unit operation cost, the start-stop cost, the pollution treatment cost and the micro-grid and large grid power interaction cost.
Wherein, the step 2 comprises the following steps:
step 2.1: the power balance constraint is expressed as follows:
in the formula:respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; t is the total operating time period of the micro-grid, and 24 hours are taken;
step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively;
step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
in the formula:the minimum continuous stop time of the unit;the minimum continuous starting time of the unit.
Wherein, the step 3 comprises the following steps:
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
in the formula:the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; lambda is the growth step length of the specified amount of growth;
step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient; k is the interval increase step of the penalty factor.
Wherein, the step 5 comprises the following steps:
step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, and the state description of each main body in the system in the iterative process T is represented as:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
step 5.3: the method for solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm comprises the following steps:
5.31) defining minimum and maximum limits of abandoned wind and abandoned light quantity in the microgrid, and dividing abandoned windAbandoning the light punishment interval, initializing each parameter of the multi-element universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration number MAX and the initial wormhole position Xij;
5.34) outputting an initial state based on a greedy strategyPerforming initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the actionAnd calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) judging whether the objective function value is a global optimum value, if not, returning to the step 5.38);
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
Further, in the step 3.2, the reward punishment step-type wind and light abandonment punishment return function is used as an action value in the improved Q learning method.
Further, in the step 4, a multivariate cosmic optimization algorithm is adopted to improve the optimal value of the state feature corresponding to the objective function in the traditional Q learning algorithm.
Further, the step 4 adopts a multivariate cosmic optimization algorithm to improve the conventional Q learning algorithm, and specifically comprises the following steps:
the multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(ii) a And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
the invention has the advantages and beneficial effects that:
the method provided by the invention gives consideration to wind-light consumption, environmental benefits and economic benefits, establishes a mathematical model for a target function by considering conventional units, wind-light units, energy storage units, large power grid interaction processes and pollutant treatment inside a microgrid, and introduces a reward and punishment step type wind and light abandoning punishment return function to further plan wind-light power generation grid connection. Meanwhile, a Q learning algorithm improved by a multi-universe algorithm is provided, the state and the action parameters of the traditional Q learning are corresponding to the target function and the constraint condition of the micro-grid dispatching and the light abandoning and punishment of the abandoned wind, and the maximum environmental benefit and the complete wind and light consumption are realized while the stable power supply of the system is met. The improved Q learning algorithm provided by the invention adopts a planning mechanism for optimization, avoids the problem of optimal value local convergence generated in the optimization process of the traditional algorithm, considers a selection mechanism of wind and light abandoning punishment return, and solves the problem of multi-objective optimization in a microgrid scheduling model.
The method reduces the abandonment rate of renewable energy sources in the operation scheduling of the micro-grid, reduces the fluctuation of energy interaction between the micro-grid and the large grid, solves the problems of slow response and non-convergence of the traditional optimization method, and improves the stability and the economy of the operation of the micro-grid.
Drawings
The invention is described in further detail below with reference to the following figures and examples:
FIG. 1 is a flow chart of a Q learning algorithm optimization of a multivariate universe optimization algorithm improvement;
FIG. 2 is a simulation plot wind-solar energy consumption curve;
FIG. 3 is a simulation graph composite cost curve;
fig. 4 is a flowchart of a microgrid optimization scheduling method based on improved Q learning penalty selection according to the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
As shown in fig. 4, the method for optimizing and scheduling a microgrid based on improved Q learning penalty selection of the present invention includes the following steps:
step 1: constructing an objective function according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit inside a microgrid;
step 1.1: under the condition of wind-solar high-proportion grid connection, the conventional unit is divided into a conventional operation state and a low-load operation state, namely the conventional power generation cost inside the microgrid is expressed as follows:
in the formula: fcfThe running cost of the conventional unit is reduced; a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxThe critical power of the normal operation state and the low-power operation state of the ith conventional unit.
Step 1.2: under the condition of uncertain wind and light processing, the start-stop cost of the conventional unit is expressed as follows:
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe unit is the shutdown critical time of cold-state start and hot-state start.
Step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Em(Pi)=(αi,m+βi,mPi+γi,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants; alpha is alphai,m、βi,m、γi,m、ζi,m、δi,mThe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
in the formula: fgridThe cost is the power interaction cost of the micro-grid and the large grid; lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;the price of electricity sold and purchased by a large power grid.
Step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
minF=Fcf+Fon-off+Fg+Fgrid
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridRespectively the running cost, the starting and stopping cost, the pollution treatment cost, the micro-grid and the large-scale power grid of the conventional unitGrid power interaction cost.
Step 2: establishing constraint conditions of micro-grid operation;
step 2.1: the power balance constraint is expressed as follows:
in the formula:respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; and T is the total operating time period of the micro-grid, and 24h is taken.
Step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively.
Step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
in the formula:the minimum continuous stop time of the unit;the minimum continuous starting time of the unit.
And step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
in the formula:the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; λ is an increase step length of a prescribed quota increase amount.
Step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient; k is the interval increase step of the penalty factor.
And 3.2, taking the reward punishment step type wind and light abandoning punishment return function as an action value in the improved Q learning method.
And 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the multivariate universe optimization algorithm is used as a heuristic search algorithm, the universe is used as a feasible problem solution, and cyclic iteration is performed through the interaction of the black holes, the white holes and the wormholes, namely, the optimal selection of the traditional Q learning algorithm in an unsupervised state is subjected to iterative optimization, so that an enhanced target solution is obtained. The state-action function of the optimized improved Q learning algorithm is represented as follows:
in the formula: fsAs the state characteristic of the traditional Q learning, the state characteristic corresponds to a target function F operated by the micro-grid system;corresponding to the reward punishment step type wind and light abandoning punishment return function value d for the action characteristics optimized by the multi-universe optimization algorithmab;Respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;YTrespectively, the reward value and discount coefficient under iteration.
The multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(initial error precision is γ)T0). And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
in the formula:the action characteristic and the state characteristic at the T-1 moment are obtained;state characteristics at time T;is the reward value at time T-1
And improving the optimal value of the state characteristic corresponding to the objective function in the traditional Q learning algorithm by the multivariate universe optimization algorithm.
And 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
Step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, so that the state description of each main body in the system in the iterative process T is represented as follows:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
step 5.3: as shown in fig. 1, the steps of solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm are as follows:
5.31) micro-gridsDividing the minimum and maximum limit of the internal abandoned wind light quantity, dividing the abandoned wind light punishment interval, initializing each parameter of the multivariate universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration number MAX and the initial wormhole position Xij;
5.34) outputting an initial state based on a greedy strategyPerforming initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the actionAnd calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) whether the objective function value is the global optimum value or not, and if not, returning to the step 5.38).
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
Carrying out experiment simulation by adopting the classic electric load requirement in the conventional micro-grid, wherein the experiment parameters are set as follows:
the method provided by the invention is used for carrying out optimized dispatching on a typical micro-grid comprising a wind power plant, a photovoltaic power plant, a gas turbine unit and an energy storage unit, supposing that power interaction exists between the micro-grid and a large power grid, and carrying out optimized solving on an objective function by adopting a traditional particle swarm algorithm and the improved Q learning algorithm to obtain a system comprehensive dispatching plan meeting the maximum wind and light consumption. As shown in FIGS. 2 and 3, through comparative analysis of simulation experiments, the total wind and light consumption of the micro-grid dispatching by using the method provided by the invention is improved by 33.18%, and the comprehensive cost is reduced by 6.51%. Therefore, the wind-solar energy consumption ratio can be greatly improved in the scheduling planning process of the micro-grid, and the maximization of the economic benefit is achieved while the environmental benefit is met.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.
Claims (8)
1. A microgrid optimization scheduling method based on improved Q learning penalty selection is characterized in that: the method comprises the following steps:
step 1: constructing a target function according to the running cost, the environmental benefit cost and the large power grid power interaction cost of a conventional unit inside a micro-grid;
step 2: establishing constraint conditions of micro-grid operation;
and step 3: constructing a penalty return function taking the highest wind abandon cost and the wind-light complete absorption cost as the highest and the lowest threshold values;
and 4, step 4: improving the traditional Q learning algorithm by adopting a multi-universe optimization algorithm;
the state-action function of the optimized improved Q learning algorithm is represented as follows:
in the formula: fsAs a state feature of traditional Q learning;the motion characteristics are optimized by a multivariate universe optimization algorithm;respectively the initial values of the state characteristic and the action characteristic; emvo-pThe expected value under the MVO-Q strategy is obtained; t is the iteration number;YTrespectively is a reward value and a discount coefficient under iteration;
and 5: and (3) carrying out Markov decision description processing on the target function obtained in the step (1), and carrying out planning solution on the obtained state and action description by using an improved Q learning algorithm.
2. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the step 1 comprises the following steps:
step 1.1: under the condition of wind-solar high-proportion grid connection, a conventional unit is divided into a conventional operation state and a low-load operation state, and the conventional power generation cost inside a microgrid is represented as follows:
in the formula: a. b and c are cost factors in the normal running state of the conventional unit; piOutputting power for the ith conventional unit; g. h, l and p are cost factors in a low-load operation state; kPi,maxCritical power of the ith conventional unit in a normal operation state and a low-power operation state;
step 1.2: under the condition of uncertain wind and light output, the start-stop cost of the conventional unit is expressed as follows:
in the formula: fon-offThe start-stop cost of the conventional unit is reduced; c is the number of start-stop times of the unit; k (t)i,r) The cost of the ith unit for the starting for the r time; t is ti,rThe continuous shutdown time of the ith unit before C times of starting; c (t)i,r) It is the operating cost of the associated auxiliary system for the unit cold start; t is tcold-hotThe shutdown critical time is the shutdown critical time of the unit in cold-state starting and hot-state starting;
step 1.3: the pollutants discharged by the conventional unit for power generation mainly contain nitrogen oxides, sulfur oxides, carbon dioxide and the like, and the treatment cost is expressed as follows:
Em(Pi)=(αi,m+βi,mPi+γi,mPi 2)+ζi,mexp(δi,mPi)
in the formula: fgThe cost is reduced for the pollution treatment of the conventional unit; m is the type of the discharged pollutant; em(Pi) The discharge amount of pollutants of the ith unit is calculated; etamThe treatment cost coefficient of the m-th pollutants; alpha is alphai,m、βi,m、γi,m、ζi,m、δi,mThe discharge coefficient of the mth pollutant discharged by the ith unit;
step 1.4: the power exchange cost of the micro grid and the large grid is expressed as follows:
Fgrid=λpPsu/shCt grid
in the formula: lambda [ alpha ]pThe electricity selling value is 1 and the electricity purchasing value is-1 for the micro-grid electricity selling and purchasing state; psu/shExcess and shortage of power inside the microgrid;the price of electricity sold and purchased by a large power grid;
step 1.5: the method is characterized in that an objective function is constructed according to the running cost, the environmental benefit cost and the power exchange cost of a main power grid of a conventional unit in a microgrid, and is expressed as follows:
min F=Fcf+Fon-off+Fg+Fgrid。
in the formula: f is an objective function value of the micro-grid system operation; fcf、Fon-off、Fg、FgridRespectively the running cost and the start and stop of the conventional unitCost of pollution abatement, and cost of power interaction between the microgrid and the large power grid.
3. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the step 2 comprises the following steps:
step 2.1: the power balance constraint is expressed as follows:
in the formula:respectively representing a conventional unit, wind power and photovoltaic output power in a time period t;storing and releasing power of the storage battery for a period t; pt gridThe power is interacted with a large power grid; pt LTotal load power for a period t; t is the total operating time period of the micro-grid, and 24 hours are taken;
step 2.2: the battery storage state constraint is expressed as follows:
SOCmin≤SOC(t)≤SOCmax
in the formula: SOC (t) is the state of charge of the storage battery at the t moment; SOCminAnd SOCmaxRepresenting the maximum and minimum states of charge of the battery, respectively;
step 2.3: for a conventional unit, the accumulated start-stop time should be greater than the minimum continuous start-stop time, and the constraint is expressed as follows:
4. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the step 3 comprises the following steps:
step 3.1: the minimum and the maximum limit of the wind abandon light quantity in the micro-grid are specified, and the increase interval chi from the wind and light complete consumption to the maximum limit of the wind abandon light quantity is dividednThe intervals are as follows:
in the formula:the highest and lowest limit of the wind and light abandoning amount specified in the system respectively; n is the number of the divided intervals; lambda is the growth step length of the specified amount of growth;
step 3.2: according to a quota interval specified by the system for the abandoned wind light quantity, the abandoned wind light quantity is subjected to linearization processing to obtain a reward and punishment stepped abandoned wind light penalty return function, wherein the function is expressed as follows:
in the formula: dabWind and light abandoning punishment return function values; pab,wpThe light discarding amount of the wind discarding of the system; c is a wind and light abandoning penalty coefficient;k is the interval increase step of the penalty factor.
5. The method according to claim 1, wherein the step 5 comprises the following steps:
step 5.1: the objective function in the step 1 comprises unit operation cost, environmental benefit cost and main power grid power exchange cost, and the state description of each main body in the system in the iterative process T is represented as:
Fs=[Fcf,Fon-off,Em(Pi),Fg,Fgrid,F]
step 5.2: and 2, the constraint conditions comprise output power of a conventional unit, wind power and photovoltaic output power, storage and release power of a storage battery, large power grid interaction power and total load power, and meanwhile, the wind and light abandoning amount reward and punishment principle is considered, discretization is carried out on the principle to obtain action description of each main body in the system in an iteration process T, and the action description is expressed as follows:
step 5.3: the method for solving the optimal value of the objective function by the Q learning algorithm improved by the multivariate cosmic algorithm comprises the following steps:
5.31) specifying the minimum and maximum limits of the wind abandoning light abandoning amount in the microgrid, dividing a wind abandoning light abandoning punishment interval, and initializing various parameters of a multi-element universe algorithm, wherein the universe individual number N, the dimension N, the maximum iteration times MAX, and the initial wormhole position Xij;
5.34) outputting an initial state based on a greedy strategyPerforming initial optimization preparation;
5.35) solving an optimal value minF of the objective function according to the optimized initial action;
5.36) judging whether the error precision is met;
5.37) if the error accuracy is satisfied, selecting the actionAnd calculating the optimal value updating and wormhole distance of the multi-universe algorithm, and simultaneously carrying out the next iteration, wherein the optimal value updating formula is as follows:
in the formula: xjThe position of the optimal universe individual is determined; p is a radical of1/p2/p3∈[0,1]Is a random number; epsilon is the rate of cosmic expansion; u. ofj,ljThe upper and lower limits of x; eta is the proportion of wormholes in all individuals, is specified by the iteration number L and the maximum iteration number L, and is expressed as follows:
the multivariate cosmic algorithm optimizing mechanism is that black holes and swinging are selected according to a roulette mechanism, an individual moves in the current optimal cosmic through expansion and self-turning, and the optimal moving distance in the moving process is related to the iteration precision p and is expressed as follows:
5.38) if the error precision is not met, abandoning the iteration action to select the action again and returning to the step 5.35);
5.39) judging whether the objective function value is a global optimum value, if not, returning to the step 5.38);
5.40) if the value is the global optimum value, outputting the final state and action;
5.41) calculating the final result.
6. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 4, characterized in that in the step 3.2, a reward penalty step-type wind curtailment light curtailment penalty return function is used as an action value in the improved Q learning method.
7. The microgrid optimized scheduling method based on improved Q learning penalty selection as claimed in claim 1, wherein: and 4, improving the optimal value of the state feature corresponding to the objective function in the traditional Q learning algorithm by adopting a multi-universe optimization algorithm.
8. The microgrid optimization scheduling method based on improved Q learning penalty selection according to claim 1, characterized in that the improvement method of improving the traditional Q learning algorithm by adopting a multivariate cosmic optimization algorithm in the step 4 comprises the following steps:
the multi-universe algorithm is used for optimizing the multi-level greedy action of Q learning, the occurrence of redundant action in optimization is reduced, and the Q iteration result is further reducedmvo-qError accuracy gamma ofT(ii) a And performing next state-action strategy under the condition that the iteration error precision is not satisfied, and performing next optimization processing by adopting a multi-universe algorithm, wherein an optimization formula is expressed as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115317.6A CN113809780B (en) | 2021-09-23 | 2021-09-23 | Micro-grid optimal scheduling method based on improved Q learning punishment selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115317.6A CN113809780B (en) | 2021-09-23 | 2021-09-23 | Micro-grid optimal scheduling method based on improved Q learning punishment selection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113809780A true CN113809780A (en) | 2021-12-17 |
CN113809780B CN113809780B (en) | 2023-06-30 |
Family
ID=78940309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111115317.6A Active CN113809780B (en) | 2021-09-23 | 2021-09-23 | Micro-grid optimal scheduling method based on improved Q learning punishment selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113809780B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114418198A (en) * | 2021-12-30 | 2022-04-29 | 国网辽宁省电力有限公司电力科学研究院 | Segmented function type calculation method for punishment cost of abandoned new energy |
CN114862048A (en) * | 2022-05-30 | 2022-08-05 | 哈尔滨理工大学 | Permanent magnet synchronous motor optimization method based on improved multivariate universe optimization algorithm |
CN117439190A (en) * | 2023-10-26 | 2024-01-23 | 华中科技大学 | Water, fire and wind system dispatching method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108964042A (en) * | 2018-07-24 | 2018-12-07 | 合肥工业大学 | Regional power grid operating point method for optimizing scheduling based on depth Q network |
CN109347149A (en) * | 2018-09-20 | 2019-02-15 | 国网河南省电力公司电力科学研究院 | Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning |
US20190244108A1 (en) * | 2018-02-08 | 2019-08-08 | Cognizant Technology Solutions U.S. Corporation | System and Method For Pseudo-Task Augmentation in Deep Multitask Learning |
JP6667785B1 (en) * | 2019-01-09 | 2020-03-18 | 裕樹 有光 | A program for learning by associating a three-dimensional model with a depth image |
CN112084680A (en) * | 2020-09-02 | 2020-12-15 | 沈阳工程学院 | Energy Internet optimization strategy method based on DQN algorithm |
US20210194424A1 (en) * | 2019-04-25 | 2021-06-24 | Shandong University | Method and system for power prediction of photovoltaic power station based on operating data of grid-connected inverters |
-
2021
- 2021-09-23 CN CN202111115317.6A patent/CN113809780B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190244108A1 (en) * | 2018-02-08 | 2019-08-08 | Cognizant Technology Solutions U.S. Corporation | System and Method For Pseudo-Task Augmentation in Deep Multitask Learning |
CN108964042A (en) * | 2018-07-24 | 2018-12-07 | 合肥工业大学 | Regional power grid operating point method for optimizing scheduling based on depth Q network |
CN109347149A (en) * | 2018-09-20 | 2019-02-15 | 国网河南省电力公司电力科学研究院 | Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning |
JP6667785B1 (en) * | 2019-01-09 | 2020-03-18 | 裕樹 有光 | A program for learning by associating a three-dimensional model with a depth image |
US20210194424A1 (en) * | 2019-04-25 | 2021-06-24 | Shandong University | Method and system for power prediction of photovoltaic power station based on operating data of grid-connected inverters |
CN112084680A (en) * | 2020-09-02 | 2020-12-15 | 沈阳工程学院 | Energy Internet optimization strategy method based on DQN algorithm |
Non-Patent Citations (2)
Title |
---|
叶亮;吕智林;王蒙;杨啸;: "基于最优潮流的含多微网的主动配电网双层优化调度", 电力系统保护与控制 * |
马留洋;孟安波;葛佳菲;: "基于纵横交叉算法优化BP神经网络的风机齿轮箱故障诊断方法", 广东工业大学学报 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114418198A (en) * | 2021-12-30 | 2022-04-29 | 国网辽宁省电力有限公司电力科学研究院 | Segmented function type calculation method for punishment cost of abandoned new energy |
CN114862048A (en) * | 2022-05-30 | 2022-08-05 | 哈尔滨理工大学 | Permanent magnet synchronous motor optimization method based on improved multivariate universe optimization algorithm |
CN114862048B (en) * | 2022-05-30 | 2024-09-17 | 哈尔滨理工大学 | Permanent magnet synchronous motor optimization method based on improved multi-element universe optimization algorithm |
CN117439190A (en) * | 2023-10-26 | 2024-01-23 | 华中科技大学 | Water, fire and wind system dispatching method, device, equipment and storage medium |
CN117439190B (en) * | 2023-10-26 | 2024-06-11 | 华中科技大学 | Water, fire and wind system dispatching method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113809780B (en) | 2023-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Sizing of a stand-alone microgrid considering electric power, cooling/heating, hydrogen loads and hydrogen storage degradation | |
CN113809780B (en) | Micro-grid optimal scheduling method based on improved Q learning punishment selection | |
Amer et al. | Optimization of hybrid renewable energy systems (HRES) using PSO for cost reduction | |
CN112036611B (en) | Power grid optimization planning method considering risks | |
CN111210079B (en) | Operation optimization method and system for distributed energy virtual power plant | |
CN105870976B (en) | A kind of low-carbon dispatching method and device based on energy environment efficiency | |
CN111030188A (en) | Hierarchical control strategy containing distributed and energy storage | |
CN114221338B (en) | Multi-energy power system optimal scheduling method considering power supply flexibility and complementarity | |
CN112966444B (en) | Intelligent energy optimization method and device for building multi-energy system | |
CN113408962A (en) | Power grid multi-time scale and multi-target energy optimal scheduling method | |
Li et al. | A hybrid dynamic economic environmental dispatch model for balancing operating costs and pollutant emissions in renewable energy: A novel improved mayfly algorithm | |
Zhu et al. | Multi-objective optimal scheduling of an integrated energy system under the multi-time scale ladder-type carbon trading mechanism | |
CN114676991B (en) | Multi-energy complementary system optimal scheduling method based on source-load double-side uncertainty | |
CN111668878A (en) | Optimal configuration method and system for renewable micro-energy network | |
CN107634547A (en) | Contributed based on new energy and predict that the electric association system of error goes out electric control method | |
Yao et al. | Multi-level model predictive control based multi-objective optimal energy management of integrated energy systems considering uncertainty | |
CN108985524A (en) | One kind is provided multiple forms of energy to complement each other system coordination control method | |
CN111682531B (en) | PL-IMOCS-based wind, light, water and fire primary energy complementary short-term optimization scheduling method and device | |
CN114493222A (en) | Wind power plant energy storage power station multi-market participation strategy considering output prediction and price | |
CN116468215A (en) | Comprehensive energy system scheduling method and device considering uncertainty of source load | |
CN117578537A (en) | Micro-grid optimal scheduling method based on carbon transaction and demand response | |
CN116822695A (en) | Capacity optimization configuration method, storage medium and device for wind-solar hydrogen production system | |
Liang et al. | Real-time optimization of large-scale hydrogen production systems using off-grid renewable energy: Scheduling strategy based on deep reinforcement learning | |
Yang et al. | Data-driven optimal dynamic dispatch for Hydro-PV-PHS integrated power systems using deep reinforcement learning approach | |
Fan et al. | Multi-agent deep reinforced co-dispatch of energy and hydrogen storage in low-carbon building clusters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |