CN117335439A - Multi-load resource joint scheduling method and system - Google Patents

Multi-load resource joint scheduling method and system Download PDF

Info

Publication number
CN117335439A
CN117335439A CN202311616942.8A CN202311616942A CN117335439A CN 117335439 A CN117335439 A CN 117335439A CN 202311616942 A CN202311616942 A CN 202311616942A CN 117335439 A CN117335439 A CN 117335439A
Authority
CN
China
Prior art keywords
scheduling
model
environment
reinforcement learning
charging pile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311616942.8A
Other languages
Chinese (zh)
Other versions
CN117335439B (en
Inventor
叶吉超
章寒冰
徐永海
胡鑫威
黄慧
张程翔
丁宁
吴新华
季奥颖
娄冰
汪华
陈冰恽
尹峰
朱利锋
潘昭光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd
Lishui Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd
Lishui Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd, Lishui Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd
Priority to CN202311616942.8A priority Critical patent/CN117335439B/en
Publication of CN117335439A publication Critical patent/CN117335439A/en
Application granted granted Critical
Publication of CN117335439B publication Critical patent/CN117335439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/14Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by switching loads on to, or off from, network, e.g. progressively balanced loading
    • H02J3/144Demand-response operation of the power transmission or distribution network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Power Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a multi-load resource joint scheduling method and a system, comprising the following steps: when a first scheduling period starts, establishing an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the requirements of multiple loads; according to a first environment for observing the multiple loads, performing distributed execution on first-stage reinforcement learning with discrete first scheduling models, and deciding whether to access the multiple loads into resource scheduling or not to obtain a first scheduling decision; taking the first scheduling decision as at least one environment of second-stage reinforcement learning, and deciding the resource scheduling of the access load; uploading experience data corresponding to the first scheduling model to an experience return visit pool, selecting sampling data from the experience return visit pool for centralized training, and respectively carrying out real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to a trained second scheduling model; the accuracy of combining multiple types of load scheduling can be improved.

Description

Multi-load resource joint scheduling method and system
Technical Field
The invention relates to the technical field of power grid joint resource scheduling, in particular to a multi-load resource joint scheduling method and system.
Background
Along with the collaborative development of smart grid construction, demand response and energy efficiency management technology, multiple loads such as air conditioners, new energy automobiles and distributed energy storage quickly form new schedulable resources, and the application of reinforcement learning on the multiple loads is combined, the multiple loads can be scheduled at the same time in the prior art, so that the potential of the multiple loads for comprehensively participating in power grid interactive adjustment is expected to be explored.
The existing control strategy fully considers that the multi-element load is fully accessed into the scheduling of the system, and the same scheduling period is always adopted when the multi-element load is scheduled. However, in the practical application scene, the requirements of air conditioners, new energy automobiles and distributed energy storage on energy consumption in different areas are different, and the enthusiasm of multiple loads in power grid dispatching is different along with the influence of areas and weather; in addition, the periods of the multi-load participating in the power grid dispatching are different, and if the same period dispatching is always used, the accuracy of a control strategy for generating the multi-load is low, and accurate control of multi-load resources is difficult to realize.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a multi-load resource joint scheduling method and system, which can improve the accurate control of resources of multi-load.
In a first aspect, the present invention provides a method for jointly scheduling multiple load resources, including:
when a first scheduling period starts, establishing an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the requirements of multiple loads; the multiple load includes: charging piles, air conditioners and micro-grids;
according to a first environment for observing the multiple loads, performing distributed execution on first-stage reinforcement learning with discrete first scheduling models, and deciding whether to access the multiple loads into resource scheduling or not to obtain a first scheduling decision;
taking the first scheduling decision as at least one environment of second-stage reinforcement learning, and deciding the resource scheduling of the access load; the second-stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period;
and uploading experience data corresponding to the first scheduling model to an experience return visit pool, selecting sampling data from the experience return visit pool for centralized training, and respectively carrying out real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to a trained second scheduling model.
According to the invention, the scheduling models are respectively built for the charging pile, the air conditioner and the micro-grid, and experience data of the charging pile, the air conditioner and the micro-grid are collected for centralized training, so that the same resource scheduling model can be built for different loads, potential coupling relations of different loads can be obtained to the greatest extent, the scheduling periods of the charging pile, the air conditioner and the micro-grid under the same scheduling model are explored, and the dynamic scheduling of different types of resources is adapted for reinforcement learning, so that the precision of multi-element resource scheduling is improved.
With reference to the first aspect, in one possible implementation manner, the establishing, at the beginning of the first scheduling period, a first scheduling model of initial two-stage reinforcement learning according to requirements of multiple loads for the obtained requirement model includes:
respectively observing the requirements of multiple loads at the beginning of the first scheduling period, establishing a corresponding requirement model, observing the current period, the weather and the position of the area where the corresponding load is located, the duration of the first scheduling period, the current electricity price and the corresponding load power according to the requirement model to obtain a first sub-environment of the corresponding load, and obtaining a second sub-environment of the corresponding load at the end of a third scheduling period at the last moment;
And establishing a first scheduling model of two-stage cascade reinforcement learning by taking the corresponding load as an initial agent according to the first load sub-environment and the second load sub-environment.
The method comprehensively considers the influence of the current time period, the weather and the position of the area where the corresponding load is located, the first scheduling period duration, the current electricity price and the corresponding load power on the resource scheduling, and compared with the prior art of neglecting the geographical position and the weather of the load, the method can provide the dynamic scheduling accuracy of the multi-type resources.
With reference to the first aspect, in one possible implementation manner, the establishing a first scheduling model of two-stage cascade reinforcement learning with a corresponding load as an initial agent according to the first load sub-environment and the second load sub-environment specifically includes:
respectively taking a charging pile, an air conditioner and a micro-grid as initial intelligent agents, and correspondingly obtaining a first charging pile scheduling model, a first air conditioner scheduling model and a first micro-grid scheduling model of first-stage reinforcement learning and second-stage reinforcement learning cascading; the first charging pile scheduling model is identical to the first air conditioner scheduling model and the first micro-grid scheduling model in action space, and the first charging pile scheduling model and the first micro-grid scheduling model are identical in state space.
The invention adopts the same state space and action space for the resource scheduling of the three loads, is beneficial to the centralized training of the scheduling models of the loads of different types, is beneficial to the exploration and utilization of the optimal scheduling strategy for balancing the loads of different types, and improves the accuracy of simultaneous scheduling of the loads of different types.
With reference to the first aspect, in one possible implementation manner, the requirement model includes: a charging pile demand model, an air conditioner demand model and a micro-grid demand model; wherein, the charging pile demand model may be expressed as:
wherein,is in the first scheduling period->First->Power per unit time distributed by each charging pile,/>Is a first discrete variable for controlling whether the corresponding charging pile is connected into the resource scheduling or not>For the maximum power threshold per unit time of the charging pile, < >>Is the first power duty cycle of the charging stake;
the air conditioning demand model may be expressed as:
wherein,is in the first scheduling period->First->The power per unit time of each air conditioner; />Is a second discrete variable to control whether the corresponding air conditioner is connected into the resource scheduling or not>Rated power for unit time of the air conditioner; />Is in the first scheduling period->Interior (I) >The air-conditioning temperature of the virtual energy storage of each air-conditioner; />Is the temperature ratio of the air conditioner, +.>Is the maximum air conditioning threshold;
the microgrid demand model may be expressed as:
wherein,the working mode decision of the micro-grid participating in the resource scheduling is controlled for the third discrete variable; />The maximum power threshold is the energy storage unit time; />And a third power duty cycle for storing energy.
With reference to the first aspect, in one possible implementation manner, according to a first environment observed on the multiple loads, first stage reinforcement learning of a first scheduling model discrete performs distributed execution to determine whether to access the multiple loads into resource scheduling, so as to obtain a first scheduling decision; wherein the first environment comprises: the observed first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid are specifically:
taking the first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid as the input of first-stage reinforcement learning of a first scheduling model of the charging pile taking the charging pile as an initial agent, the input of first-stage reinforcement learning of the first scheduling model of the air conditioner taking the air conditioner as an initial agent and the input of first-stage reinforcement learning of the first scheduling model of the micro-grid taking the micro-grid as an initial agent respectively, correspondingly outputting whether the corresponding charging pile is connected to a first discrete variable in resource scheduling, taking the first discrete variable as a first sub-scheduling decision, whether the corresponding air conditioner is connected to a second discrete variable in resource scheduling, taking the second discrete variable as a second sub-scheduling decision and taking the micro-grid as a third discrete variable participating in resource scheduling, and taking the third discrete variable as a working mode decision;
Wherein the first scheduling decision comprises: the first sub-scheduling decision, the second sub-scheduling decision, and the working mode decision.
With reference to the first aspect, in a possible implementation manner, the taking the first scheduling decision as at least one environment of second-stage reinforcement learning makes a decision to access resource scheduling of a load; the second stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period, and the method comprises the following steps:
and taking the first scheduling decision as at least one environment of second-stage reinforcement learning, taking the first scheduling decision and the first environment as a second environment of second-stage reinforcement learning of a first scheduling model, taking a second scheduling period of second-stage reinforcement learning distributed decision of the first scheduling model as a third scheduling period of the next moment according to the second environment.
With reference to the first aspect, in one possible implementation manner, the second scheduling period of the second stage reinforcement learning distributed decision of the first scheduling model according to the second environment is specifically:
Taking the first sub-scheduling decision and the first charging pile environment as a second charging pile environment for second-stage reinforcement learning of a first scheduling model of the charging pile, and outputting a first sub-scheduling period of the charging pile participating in resource scheduling;
taking the second sub-scheduling decision and the air conditioner first environment as a second-stage reinforcement learning air conditioner second environment of the air conditioner first scheduling model, and outputting a second sub-scheduling period of the air conditioner participating in resource scheduling;
taking the third sub-scheduling decision and the first charging pile environment as a second-stage reinforcement learning micro-grid second environment of a micro-grid first scheduling model, and outputting a third sub-scheduling period of the micro-grid participating in resource scheduling;
and taking the first sub-scheduling period, the first sub-scheduling period and the minimum value in the first sub-scheduling period as a second scheduling period.
With reference to the first aspect, in one possible implementation manner, the performing, according to the trained second scheduling model, real-time distributed resource scheduling on the charging pile, the air conditioner, and the micro grid respectively includes:
respectively outputting a first real-time sub-scheduling decision of whether the charging pile is connected to resource allocation, a second real-time sub-scheduling decision of whether the air conditioner is connected to resource allocation and a third real-time working mode of the micro-grid participating in resource allocation according to a real-time observed charging pile third environment, an air conditioner third environment and a micro-grid third environment which are respectively used as inputs of a trained charging pile second scheduling model, an air conditioner second scheduling model and a first stage reinforcement learning of a micro-grid second scheduling model;
And taking the first real-time sub-scheduling decision and the third environment of the charging pile as the input of the second-stage reinforcement learning of the second scheduling model of the charging pile, taking the second real-time sub-scheduling decision and the third environment of the air conditioner as the input of the second-stage reinforcement learning of the second scheduling model of the air conditioner, and taking the third real-time working mode and the third environment of the micro grid as the input of the second-stage reinforcement learning of the second scheduling model of the micro grid to respectively obtain the first real-time power distribution of the charging pile, the second real-time power distribution of the air conditioner and the real-time output power of the micro grid.
With reference to the first aspect, in one possible implementation manner, the uploading the experience data corresponding to the first scheduling model to the experience return visit pool, and selecting sample data from the experience return visit pool for centralized training, where the first scheduling model includes a charging pile first scheduling model, an air conditioner first scheduling model, and a micro grid first scheduling model, specifically:
the charging pile first scheduling model, the air conditioner first scheduling model and the micro-grid first scheduling model respectively upload corresponding first environments, first scheduling decisions, input of second-stage reinforcement learning, allocated resource allocation results of second-stage reinforcement learning output and the second scheduling period into an experience return visit pool;
And randomly selecting sampling data from the experience return visit pool according to the step length to perform centralized training, and simultaneously updating parameters obtained by training on the first scheduling model of the charging pile, the first scheduling model of the air conditioner and the first scheduling model of the micro-grid.
In a second aspect, the present invention provides a multi-load resource joint scheduling system, which is based on the multi-load resource joint scheduling method in the first aspect, and includes: the system comprises a model building module, a first-stage resource allocation module, a second-stage resource allocation module and a real-time scheduling module; wherein,
the model building module is used for building an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the requirements of multiple loads when a first scheduling period starts; the multiple load includes: charging piles, air conditioners and micro-grids;
the first-stage resource allocation module is used for performing distributed execution on first-stage reinforcement learning with discrete first scheduling models according to a first environment for observing the multiple loads, and deciding whether the multiple loads are accessed into resource scheduling or not to obtain a first scheduling decision;
the second-stage resource allocation module is used for taking the first scheduling decision as at least one environment of second-stage reinforcement learning to decide the resource scheduling of the access load; the second-stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period;
And the real-time scheduling module is used for uploading experience data corresponding to the first scheduling model to an experience return visit pool, selecting sampling data from the experience return visit pool for centralized training, and respectively carrying out real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to a trained second scheduling model.
Drawings
FIG. 1 is a schematic flow chart of a multi-load resource joint scheduling method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a first scheduling model setup provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of centralized training provided by an embodiment of the present application;
fig. 4 is a schematic structural diagram of a multi-load resource joint scheduling system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, a flow chart of a multi-load resource joint scheduling method provided by an embodiment of the present application includes steps S11 to S14, specifically:
step S11, when a first scheduling period starts, establishing an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the requirements of multiple loads; the multiple load includes: charging piles, air conditioners and micro-grids.
In one embodiment of the application, the requirements of multiple loads at the beginning of the first scheduling period are observed respectively, a corresponding requirement model is built, a current period, the weather and the position of an area where the corresponding load is located, the duration of the first scheduling period, the current electricity price and the corresponding load power are observed according to the requirement model, a first sub-environment of the corresponding load is obtained, and a second sub-environment of the corresponding load at the end of a third scheduling period at the last moment is obtained; and establishing a first scheduling model of two-stage cascade reinforcement learning by taking the corresponding load as an initial agent according to the first load sub-environment and the second load sub-environment.
Illustratively, the multiple load comprises: charging pile, air conditioner and micro-grid, so when building demand model, different demand models are built for charging pile, air conditioner and micro-grid respectively.
In one embodiment of the present application, the first-stage reinforcement learning charging pile first environment of the charging pile first scheduling model may be expressed as:
wherein,charging piles->Is the current period of time, weather and location of the area where the corresponding load is located, < >>For the first scheduling period duration,/->The electricity price and the charging pile are respectively equal to the electricity price and the charging pile under the period of the first scheduling period>Is set to the rated power of (3).
In another embodiment of the present application, the first context of the first-stage reinforcement learning of the first schedule model of the charging stake further includes a charging stake charge quantity at the end of a last time schedule period.
In another real-time example of the present application, the first-stage reinforcement learning charging pile first environment of the first scheduling model of the charging pile further includes: the time period of the last scheduling period of the charging pile, the weather and the position of the area where the charging pile is located, the duration of the last scheduling period, the electricity price of the last scheduling period and the charging power or the discharging power of the charging pile.
In one embodiment of the present application, a first-stage reinforcement learning charging pile first environment of a charging pile first scheduling model includes: the method comprises the steps of a current time period of a first scheduling period, weather and position of an area where a charging pile is located, a first scheduling period time length, current electricity price and rated power, charging pile electric quantity at the end of a last time scheduling period, a time period of a last scheduling period, weather and position of the area where the charging pile is located, a last scheduling period time length, and electricity price and charging power or discharging power under the time period where the last scheduling period is located.
In one embodiment of the present application, the first level reinforcement learning air conditioner first environment of the air conditioner first scheduling model may be expressed as:
wherein,air conditioner respectively->Is the current period of time, weather and location of the area in which it is located,/->For the first scheduling period duration,/->Respectively, electricity price and air conditioner in the period of the first scheduling period>Is set to the rated power of (3).
In another real-time example of the present application, the first air conditioner environment for the first stage reinforcement learning of the first air conditioner scheduling model further includes: the time period of the last scheduling period of the air conditioner, the weather and the position of the area where the air conditioner is located, the duration of the last scheduling period, the electricity price of the air conditioner in the time period where the last scheduling period is located and the rated power of the air conditioner.
In one embodiment of the present application, a first environment of an air conditioner for a first stage reinforcement learning of a first scheduling model of the air conditioner includes: the method comprises the steps of a current time period of a first scheduling period, weather and positions of an area where an air conditioner is located, a first scheduling period time length, a current electricity price and corresponding load power, a time period of a last scheduling period, weather and positions of the area where the air conditioner is located, a last scheduling period time length, and an electricity price under the time period where the last scheduling period is located.
In one embodiment of the present application, the first stage reinforcement learning of the first dispatch model of the microgrid may be expressed as:
Wherein,respectively store energy->Is the current period of time, weather and location of the area in which it is located,/->For the first scheduling period duration,/->Respectively the electricity price and the energy storage of the first scheduling period>Is set to the rated power of (3).
In another real-time example of the present application, the first stage reinforcement learning of the first scheduling model of the micro-grid further includes: the time period of the last scheduling period of energy storage, the weather and the position of the area where the energy storage is located, the duration of the last scheduling period, and the electricity price and the charging power or the discharging power of the time period where the last scheduling period is located.
In one embodiment of the present application, a first stage reinforcement learning microgrid first environment of a first scheduling model of a microgrid comprises: the method comprises the steps of a current time period of a first dispatching cycle, weather and position of an area where energy storage is located, a first dispatching cycle time length, current electricity price and rated power, electric quantity of a charging pile at the end of a dispatching cycle at the last moment, a time period of a last dispatching cycle, weather and position of the area where the energy storage is located, a last dispatching cycle time length, and electricity price and charging power or discharging power under the time period where the last dispatching cycle is located.
It is worth noting that the conditions observed by the first stage reinforcement learning of the charging piles, the air conditioners and the micro-grids always keep the dimensions of the state space the same, and the actions of the same dimensions are output.
The method comprehensively considers the influence of the current time period, the weather and the position of the area where the corresponding load is located, the first scheduling period duration, the current electricity price and the corresponding load power on the resource scheduling, and compared with the prior art of neglecting the geographical position and the weather of the load, the method can provide the dynamic scheduling accuracy of the multi-type resources.
In one embodiment of the present application, when a charging pile is accessed into resource scheduling, the charging pile model may be expressed as:
wherein,and->Respectively in the first scheduling period +.>First->The power of each charging pile and the electric quantity of the charging pile; />And->The upper limit and the lower limit of the power of the charging pile are respectively; />And->The upper limit and the lower limit of the electric quantity of the charging pile are respectively set; />Is in the second scheduling period->First->The electric quantity of the charging piles; />For the first scheduling period duration,/->Is that the new energy automobile is in the first dispatch period +.>Leave or enter at the beginning->The individual charging stations cause a change in the charge level of the charging pile.
In one embodiment of the present application, the charging pile demand model may be expressed as:
wherein,is a first discrete variable for controlling whether the corresponding charging pile is connected into the resource scheduling or not>For the maximum power threshold of the charging pile, +.>Is the first power duty cycle of the charging stake.
In one embodiment of the present application, if=1, indicating that the charging pile participates in resource scheduling; if->And (4) indicating that the charging pile does not participate in resource scheduling.
In one embodiment of the present application, the first power duty cycleThe range of values of (1, 0) and (0, 1)]If (if)Then use +.>Charging is carried out corresponding to the charging pile; if->0, then->The power ratio of the charging pile is corresponding to the power ratio of the charging pile for discharging.
It is worth to say that the first discrete variable and the first power duty ratio are output by the first stage reinforcement learning and the second stage reinforcement learning of the first scheduling model of the charging pile respectively, and the second stage reinforcement learning also outputs a first sub-scheduling period, namely the first discrete variable, the first power duty ratio and the first sub-scheduling period are all unknowns to be solved.
In another embodiment of the present application, the charging and discharging schedules are performed on the charging piles simultaneously in the same scheduling period according to the first power ratio.
In one embodiment of the present application, the first power duty cycleThe range of values of (1, 0) and (0, 1)]If (if)Then use +.>Is charged corresponding to the charging pile while being charged with +.>As the charging time duty cycle in the first scheduling period, get +. >Charging is performed with a charging duration of (1-/for)>tDischarging is carried out for the discharging time length of (2); if->0, then->The power ratio of (2) is corresponding to that of the charging pile, and at the same time, the charging pile is discharged by +>As the discharge time duty ratio in the first scheduling period, a discharge time of +.>Is discharged for a discharge duration of (+)>) Charging is performed for a charging period of (a).
It should be noted that, due to the area where the charging pile is located, weather, time of day, electricity price, and electric quantity of the charging pile, not all the charging piles participate in the resource scheduling, and for the same reason, the charging piles participate in the resource scheduling in different manners and with different required power or output power.
In one embodiment of the present application, the air conditioning demand model may be expressed as:
wherein,is in the first scheduling period->First->The power per unit time of each air conditioner; />Is a second discrete variable to control whether the corresponding air conditioner is connected into the resource scheduling or not>Rated power for unit time of the air conditioner; />Is in the first scheduling period->Interior (I)>The air-conditioning temperature of the virtual energy storage of each air-conditioner; />Is the temperature ratio of the air conditioner, +.>Is the maximum air conditioning threshold.
In one embodiment of the present application, if=1, indicating that the corresponding air conditioner is accessed to the resource schedule; if- >And indicating that the air conditioner is not connected to the resource scheduling.
It is worth to say that, the second discrete variable is output by the first stage reinforcement learning of the first scheduling model of the air conditioner, and the second stage reinforcement learning output temperature duty ratio and the first sub-scheduling period, namely, the first discrete variable, the temperature duty ratio and the first sub-scheduling period are all unknown quantities to be solved.
In one embodiment of the present application, the temperature duty cycleThe range of values of (1, 0) and (0, 1)]If->Then use +.>The temperature ratio of the air conditioner is adjusted to be a positive temperature corresponding to the air conditioner; if->0, then->The temperature ratio of the air conditioner is adjusted to be negative.
It should be noted that the maximum air conditioning threshold should be positive, passing the temperature duty cycleNot only the temperature regulation value is controlled, but also the temperature is regulated to be zero or lower so as to adapt to different requirements.
In one embodiment of the application, the micro-grid comprises a photovoltaic system, an energy storage system and an inflexible load, and a micro-grid model is established according to the photovoltaic system and the energy storage system and can be expressed as follows:
wherein,and->Reactive power and active power are output by the photovoltaic inverter and the photovoltaic system respectively, +.>Is a power factor; />And->Respectively in the first scheduling period +. >First->The power and the energy storage electric quantity of the energy storage; />And->Respectively the upper and lower limits of the stored energy power; />Andrespectively the upper limit and the lower limit of the energy storage electric quantity; />Is at the firstTwo scheduling periods->First->The energy storage electric quantity of each energy storage; />A first scheduling period duration; />And->Indicating charge or discharge; />And->Respectively, charge and discharge efficiency.
In one embodiment of the present application, the microgrid demand model may be expressed as:
wherein,the working mode decision of the micro-grid participating in the resource scheduling is controlled for the third discrete variable; />The maximum power threshold is the energy storage unit time; />And a third power duty cycle for storing energy.
It should be noted that, in order to protect unification of the first scheduling model of the agent, the method includes: the unification of the action space and the state space of the environment, according to the observed environment, the dimension of the environment space is unified, the action space of the first scheduling model is unified, in order to improve the resource utilization rate and maintain the stability of the power grid, the micro-grid is always used as a participant of resource scheduling, the output of the first-stage reinforcement learning of the micro-grid first scheduling model is not defined as whether to be connected into the resource scheduling, but is defined as a mode that the micro-grid participates in the resource scheduling, and specifically comprises three modes of simultaneously carrying out energy storage charging and energy storage discharging, or only carrying out charging or only discharging in the same scheduling period.
In one embodiment of the present application, if=1, indicating that the micro-grid participates in resource scheduling in a working mode of charging or discharging only; otherwise, will get->And adjusting to be-1 required in the demand model, wherein the micro-grid participates in resource scheduling in a working mode of simultaneous charging and discharging.
It should be noted that, the working mode strategy of the first stage reinforcement learning output of the first scheduling model of the micro-gridIs 1 or 0, but->0 is meaningless for the demand model of the microgrid, so there is a need to get +.>After the first stage reinforcement learning output, according to the strategy for the working mode +.>The value judgment of (2) is changed into +.>. The output of the micro-grid is still 1 or 0, so as to be strong with the first level of the first scheduling model of the charging pile and the first scheduling model of the air conditionerThe action space of chemical learning is kept consistent, the first scheduling model of the micro-grid is the same as the first-stage reinforcement learning network of the first scheduling model of the charging pile and the first scheduling model of the air conditioner, three scheduling models with different functions can be allowed to perform centralized training so as to obtain the same model parameters, that is, the dimensions of the action space of the three first scheduling models and the state space of the environment are the same, the centralized controller can be allowed to perform centralized learning on collected experience data with different types, the learning process is essentially transfer learning, the same scheduling model can be trained on different data, various different resource scheduling can be controlled in real time, the coupling relation among different types of resources can be fully explored, and then the accurate resource control is performed on multiple loads.
In one embodiment of the present application,and->The value of (2) can be expressed as:
wherein,the range of values of (1, 0) and (0, 1)]。
It is worth noting that the number of the parts,representing the amount of stored energy in the micro-grid after the first scheduling period, i.e. +.>The amount of power stored in the micro-grid at the beginning of the second scheduling period.
It is worth noting that the number of the parts,and->The first reinforcement learning and the second reinforcement learning of the first dispatching model of the micro-grid are respectively output, and the second reinforcement learning also outputs a third sub-dispatching period, namely a third discrete variable, a third power duty ratio and a third sub-dispatching period are all unknowns to be solved.
It is noted that the range of values for the temperature is the same as the ranges for the first power ratio and the third power ratio, and values are at [ -1, 0) and (0, 1 ].
According to the first load sub-environment and the second load sub-environment, a first scheduling model of two-stage cascade reinforcement learning with a corresponding load as an initial agent is established, and the first scheduling model specifically comprises the following steps: respectively taking a charging pile, an air conditioner and a micro-grid as initial intelligent agents, and correspondingly obtaining a first charging pile scheduling model, a first air conditioner scheduling model and a first micro-grid scheduling model of first-stage reinforcement learning and second-stage reinforcement learning cascading; the first charging pile scheduling model is identical to the first air conditioner scheduling model and the first micro-grid scheduling model in action space, and the first charging pile scheduling model and the first micro-grid scheduling model are identical in state space.
The invention adopts the same state space and action space for the resource scheduling of the three loads, is beneficial to the centralized training of the scheduling models of the loads of different types, is beneficial to the exploration and utilization of the optimal scheduling strategy for balancing the loads of different types, and improves the accuracy of simultaneous scheduling of the loads of different types.
Referring to fig. 2, a schematic diagram of a first scheduling model is provided in an embodiment of the present invention. In the figure, the charging pile, the air conditioner and the micro-grid are respectively used as an agent, each agent is provided with two-stage reinforcement learning, the output of the first-stage reinforcement learning is used as at least one input of the second-stage reinforcement learning, namely, the output of the first-stage reinforcement learning is an environment which needs to be observed by the second-stage reinforcement learning, the second-stage reinforcement learning of the charging pile, the air conditioner and the micro-grid is provided with two outputs, in the first scheduling model, the calculated sub-scheduling period is transmitted in a broadcast mode, after receiving broadcast information, other agents are compared with the sub-scheduling period which is output by the agents, the second scheduling period with the minimum time length is selected, and the observed environment and the output actions are stored in an experience revisit pool of the centralized controller, so that centralized training (namely, transfer learning) is carried out.
It should be noted that, by modeling the requirements of the charging pile, the air conditioner and the micro-grid, a state space (first environment) with the same dimension corresponding to the first stage reinforcement learning of the first scheduling model is established, and the same action space is set, and the intelligent agent knows which of the three scheduling models is itself, so that according to the observed environment, the scheduling decision corresponding to the output of the first stage reinforcement learning is a discrete value.
It should be noted that, fig. 2 only shows one charging pile first scheduling model, one air conditioner first scheduling model and one micro-grid first scheduling model, and may further include a plurality of charging pile first scheduling models, a plurality of air conditioner first scheduling models and a plurality of micro-grid first scheduling models, and any combination of one or more charging pile first scheduling models and one or more air conditioner first scheduling models and one or more micro-grid first scheduling models.
In one embodiment of the present application, the first stage reinforcement learning includes: deep Q learning network (DQL), REINFORCE based on policy gradients, actor-reviewer (Actor-Critic) network based on policies and actions and output as discrete values, and near-end policy optimization (Proximal Policy Optimization, PPO) of discrete values.
And step S12, performing distributed execution on first-stage reinforcement learning with discrete first scheduling models according to a first environment for observing the multiple loads, and deciding whether the multiple loads are accessed into resource scheduling or not to obtain a first scheduling decision.
According to a first environment for observing the multiple loads, performing distributed execution on first-stage reinforcement learning with discrete first scheduling models, and deciding whether to access the multiple loads into resource scheduling or not to obtain a first scheduling decision; wherein the first environment comprises: the observed first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid are specifically: taking the first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid as the input of first-stage reinforcement learning of a first scheduling model of the charging pile taking the charging pile as an initial agent, the input of first-stage reinforcement learning of the first scheduling model of the air conditioner taking the air conditioner as an initial agent and the input of first-stage reinforcement learning of the first scheduling model of the micro-grid taking the micro-grid as an initial agent respectively, correspondingly outputting whether the corresponding charging pile is connected to a first discrete variable in resource scheduling, taking the first discrete variable as a first sub-scheduling decision, whether the corresponding air conditioner is connected to a second discrete variable in resource scheduling, taking the second discrete variable as a second sub-scheduling decision and taking the micro-grid as a third discrete variable participating in resource scheduling, and taking the third discrete variable as a working mode decision; wherein the first scheduling decision comprises: the first sub-scheduling decision, the second sub-scheduling decision, and the working mode decision.
In one embodiment of the present application, the action space of the first-stage reinforcement learning of the first-stage charge pile first-stage dispatch model, the action space of the first-stage reinforcement learning of the air-conditioner first-stage dispatch model, and the action space of the first-stage reinforcement learning of the micro-grid first-stage dispatch model may be expressed as:
wherein, the action of first-stage reinforcement learning of the first scheduling model of the charging pileFirst conditioner of air conditionerAction of first-stage reinforcement learning of degree model +.>Actions of first-stage reinforcement learning of micro-grid first-scheduling model>
In one embodiment of the present application, in fig. 2, first-stage reinforcement learning of a first scheduling model of a charging pile decides whether to access the charging pile into resource scheduling according to an observed environment; the first-stage reinforcement learning of the first scheduling model of the air conditioner decides whether the air conditioner is connected into resource scheduling or not according to the observed environment; first-stage reinforcement learning of a first scheduling model of a micro-grid decides an operating mode of the micro-grid in accessing resource scheduling according to an observed environment, and the first-stage reinforcement learning comprises the following steps: only charging, only discharging, or both charging and discharging are performed in one scheduling period.
Step S13, taking the first scheduling decision as at least one environment of second-stage reinforcement learning, and deciding to access the resource scheduling of the load; the second stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period.
Specifically, the first scheduling decision is taken as at least one environment of second-stage reinforcement learning, the first environment is taken as a second environment of second-stage reinforcement learning of a first scheduling model, a second scheduling period of second-stage reinforcement learning distributed decision of the first scheduling model is taken as a third scheduling period of the next moment according to the second environment.
Wherein, according to the second environment, a second scheduling period of the second stage reinforcement learning distributed decision of the first scheduling model is specifically: taking the first sub-scheduling decision and the first charging pile environment as a second charging pile environment for second-stage reinforcement learning of a first scheduling model of the charging pile, and outputting a first sub-scheduling period of the charging pile participating in resource scheduling; taking the second sub-scheduling decision and the air conditioner first environment as a second-stage reinforcement learning air conditioner second environment of the air conditioner first scheduling model, and outputting a second sub-scheduling period of the air conditioner participating in resource scheduling; taking the third sub-scheduling decision and the first charging pile environment as a second-stage reinforcement learning micro-grid second environment of a micro-grid first scheduling model, and outputting a third sub-scheduling period of the micro-grid participating in resource scheduling; and taking the first sub-scheduling period, the first sub-scheduling period and the minimum value in the first sub-scheduling period as a second scheduling period.
In another real-time example of the present application, the second-stage reinforcement learning charging pile second environment of the charging pile first scheduling model may be expressed as:
in another real-time example of the present application, the second environment of the charging pile for the second stage reinforcement learning of the first scheduling model of the charging pile may be only the output { of the first stage reinforcement learning
In another embodiment of the present application, the second context of the second-stage reinforcement learning of the first schedule model of the charging stake further includes a charging stake charge quantity at the end of the last time schedule period.
In another real-time example of the present application, the second environment of the charging pile for the second stage reinforcement learning of the first scheduling model of the charging pile further includes: the time period of the last scheduling period of the charging pile, the weather and the position of the area where the charging pile is located, the duration of the last scheduling period, the electricity price of the last scheduling period and the charging power or the discharging power of the charging pile.
In one embodiment of the present application, a second-stage reinforcement learning charging pile second environment of a charging pile first scheduling model includes: the method comprises the steps of a current time period of a first scheduling period, weather and position of an area where a charging pile is located, a first scheduling period time length, current electricity price and rated power, charging pile electric quantity at the end of a last time scheduling period, a time period of a last scheduling period, weather and position of the area where the charging pile is located, a last scheduling period time length, and electricity price and charging power or discharging power under the time period where the last scheduling period is located.
In one embodiment of the present application, the second air conditioning environment for the second stage reinforcement learning of the first scheduling model of air conditioning may be expressed as:
in another embodiment of the present application, the second air conditioning environment for the second stage reinforcement learning of the first scheduling model of air conditioning is only
In another real-time example of the present application, the second environment of the air conditioner for the second stage reinforcement learning of the first scheduling model of the air conditioner further includes: the time period of the last scheduling period of the air conditioner, the weather and the position of the area where the air conditioner is located, the duration of the last scheduling period, the electricity price of the air conditioner in the time period where the last scheduling period is located and the rated power of the air conditioner.
In one embodiment of the present application, a second environment of an air conditioner for second stage reinforcement learning of a first scheduling model of the air conditioner includes: the method comprises the steps of a current time period of a first scheduling period, weather and positions of an area where an air conditioner is located, a first scheduling period time length, a current electricity price and corresponding load power, a time period of a last scheduling period, weather and positions of the area where the air conditioner is located, a last scheduling period time length, and an electricity price under the time period where the last scheduling period is located.
In one embodiment of the present application, the second stage reinforcement learning of the first scheduling model of the microgrid may be expressed as a second environment of the microgrid:
In one embodiment of the present application, the second-stage reinforcement learning of the first scheduling model of the micro-grid may be the only micro-grid second environment
In another real-time example of the present application, the second-stage reinforcement learning of the first scheduling model of the micro-grid further includes: the time period of the last scheduling period of energy storage, the weather and the position of the area where the energy storage is located, the duration of the last scheduling period, and the electricity price and the charging power or the discharging power of the time period where the last scheduling period is located.
In one embodiment of the present application, a second stage reinforcement learning microgrid second environment of a first scheduling model of a microgrid comprises: the method comprises the steps of a current time period of a first dispatching cycle, weather and position of an area where energy storage is located, a first dispatching cycle time length, current electricity price and rated power, electric quantity of a charging pile at the end of a dispatching cycle at the last moment, a time period of a last dispatching cycle, weather and position of the area where the energy storage is located, a last dispatching cycle time length, and electricity price and charging power or discharging power under the time period where the last dispatching cycle is located.
In one embodiment of the present application, the action space of the second-stage reinforcement learning of the first scheduling model of the charging pile, the action space of the second-stage reinforcement learning of the first scheduling model of the air conditioner, and the action space of the second-stage reinforcement learning of the first scheduling model of the micro grid may be expressed as:
Wherein,、/>and->Respectively at the first scheduling instant +.>Charging pile determined by middle decision>Is a first sub-scheduling period of (a), air conditioner +.>Is a second sub-scheduling period and energy storage +.>Is a third sub-scheduling period of (a); />The upper limit and the lower limit of the scheduling period duration are respectively set; />、/>And->Respectively at the first scheduling instant +.>Charging pile determined by middle decision>Is the first power ratio of (1), air conditioner->Temperature ratio and energy storage->A third power ratio of (c).
In this embodiment, in fig. 2, since only the output of the first stage reinforcement learning is used as the input of the second stage reinforcement learning, the information content is too small, so that the convergence effect and generalization capability of reinforcement learning are poor.
And S14, uploading experience data corresponding to the first scheduling model to an experience return visit pool, selecting sampling data from the experience return visit pool for centralized training, and respectively carrying out real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to a trained second scheduling model.
Uploading experience data corresponding to a first scheduling model to an experience return visit pool, uploading experience data corresponding to the first scheduling model to the experience return visit pool, and selecting sampling data from the experience return visit pool for centralized training, wherein the first scheduling model comprises a charging pile first scheduling model, an air conditioner first scheduling model and a micro-grid first scheduling model, and specifically comprises the following steps: the charging pile first scheduling model, the air conditioner first scheduling model and the micro-grid first scheduling model respectively upload corresponding first environments, first scheduling decisions, input of second-stage reinforcement learning, allocated resource allocation results of second-stage reinforcement learning output and the second scheduling period into an experience return visit pool; and randomly selecting sampling data from the experience return visit pool according to the step length to perform centralized training, and simultaneously updating parameters obtained by training on the first scheduling model of the charging pile, the first scheduling model of the air conditioner and the first scheduling model of the micro-grid.
And respectively carrying out real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to a trained second scheduling model, wherein the method comprises the following steps: respectively outputting a first real-time sub-scheduling decision of whether the charging pile is connected to resource allocation, a second real-time sub-scheduling decision of whether the air conditioner is connected to resource allocation and a third real-time working mode of the micro-grid participating in resource allocation according to a real-time observed charging pile third environment, an air conditioner third environment and a micro-grid third environment which are respectively used as inputs of a trained charging pile second scheduling model, an air conditioner second scheduling model and a first stage reinforcement learning of a micro-grid second scheduling model; and taking the first real-time sub-scheduling decision and the third environment of the charging pile as the input of the second-stage reinforcement learning of the second scheduling model of the charging pile, taking the second real-time sub-scheduling decision and the third environment of the air conditioner as the input of the second-stage reinforcement learning of the second scheduling model of the air conditioner, and taking the third real-time working mode and the third environment of the micro grid as the input of the second-stage reinforcement learning of the second scheduling model of the micro grid to respectively obtain the first real-time power distribution of the charging pile, the second real-time power distribution of the air conditioner and the real-time output power of the micro grid.
Referring to fig. 3, a schematic diagram of centralized training is provided in an embodiment of the present application. In the figure, the scheduling model is trained by centralized training based on the first scheduling model of fig. 2. Because the first scheduling model of the charging pile, the first scheduling model of the air conditioner and the first scheduling model of the micro-grid respectively transmit the environment observed by the first-stage reinforcement learning and the first scheduling strategy executed by the first-stage reinforcement learning and the environment observed by the second-stage reinforcement learning and the resource scheduling executed by the second-stage reinforcement learning to the centralized controller, the centralized controller performs centralized training according to the uploaded experience data to obtain parameter models suitable for scheduling of the charging pile, the air conditioner and the micro-grid, and simultaneously transmits the obtained parameter models to the charging pile, the air conditioner and the micro-grid for parameter updating.
In one embodiment of the application, according to a resource scheduling strategy of two-stage reinforcement learning of a first model of a response charging pile, a first model of an air conditioner and a first model of a micro-grid, a total electricity price of a maximum basic electricity purchasing price and a minimum basic electricity selling price of the grid electricity price is obtained, the total electricity price is minimized as an optimization target, and the reciprocal of the total electricity price is used as a reward function of two-layer reinforcement learning.
In one embodiment of the present application, the second stage reinforcement learning includes: REINFORCE based on policy gradients, actor-Critic (AC) networks based on policies and actions and output as continuous values, depth deterministic policy gradient algorithms (Deterministic Policy Gradient, DDPG), and depth reinforcement learning networks such as near-end policy optimization (Proximal Policy Optimization, PPO) that output as continuous values.
It should be noted that, because the first-stage reinforcement learning and the second-stage reinforcement learning have different requirements on output, the state transition probability calculation of the first-stage reinforcement learning and the second-stage reinforcement learning are different, and in the first-stage reinforcement learning, the state transition probability is the probability of calculating to transition to the next state according to the probability of discrete data at each moment, and in the second-stage reinforcement learning, the state transition probability is the probability of calculating to transition to the next state according to the continuous probability integral at each moment, specifically, the state transition probability is related to the selected reinforcement learning network.
In one embodiment of the present application, the first stage reinforcement learning uses either discrete AC, DQN, REINFORCE or PPO, and the second stage reinforcement learning uses either continuous AC, REINFORCE, DDPG or PPO.
According to the invention, the scheduling models are respectively built for the charging pile, the air conditioner and the micro-grid, and experience data of the charging pile, the air conditioner and the micro-grid are collected for centralized training, so that the same resource scheduling model can be built for different loads, potential coupling relations of different loads can be obtained to the greatest extent, the scheduling periods of the charging pile, the air conditioner and the micro-grid under the same scheduling model are explored, and the dynamic scheduling of different types of resources is adapted for reinforcement learning, so that the precision of multi-element resource scheduling is improved.
Example 2
Referring to fig. 4, a schematic structural diagram of a multi-load resource joint scheduling system provided by an embodiment of the present invention, a multi-load resource joint scheduling method based on a first aspect, includes: the system comprises a model building module 21, a first-stage resource allocation module 22, a second-stage resource allocation module 23 and a real-time scheduling module 24.
The model building module 21 mainly builds a corresponding scheduling model according to different requirements, and transmits the initial scheduling model to the first-stage resource allocation module 22; the first-stage resource allocation module 22 performs first-stage resource scheduling according to the received scheduling model to decide whether to access the corresponding load into the resource scheduling, and issues the obtained scheduling decision to the second-stage resource allocation module 23; after receiving the scheduling decision, the second-stage resource allocation module 23 allocates resources for the load accessed into the resource scheduling, uploads the obtained resource scheduling to the experience return visit pool for training, obtains a trained scheduling model, and transmits the trained scheduling model to the real-time scheduling module 24; the real-time scheduling module 24 outputs real-time resource scheduling in a distributed manner by the charging piles, the air conditioner and the micro-grid after observing the environment according to the received scheduling model.
The model building module 21 is configured to build an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the requirement of the multiple loads when the first scheduling period starts; the multiple load includes: charging piles, air conditioners and micro-grids.
When a first scheduling period starts, according to the requirements of multiple loads, an initial two-stage reinforcement learning first scheduling model is established for the obtained requirement model, and the method comprises the following steps: respectively observing the requirements of multiple loads at the beginning of the first scheduling period, establishing a corresponding requirement model, observing the current period, the weather and the position of the area where the corresponding load is located, the duration of the first scheduling period, the current electricity price and the corresponding load power according to the requirement model to obtain a first sub-environment of the corresponding load, and obtaining a second sub-environment of the corresponding load at the end of a third scheduling period at the last moment; and establishing a first scheduling model of two-stage cascade reinforcement learning by taking the corresponding load as an initial agent according to the first load sub-environment and the second load sub-environment.
According to the first load sub-environment and the second load sub-environment, a first scheduling model of two-stage cascade reinforcement learning with a corresponding load as an initial agent is established, specifically: respectively taking a charging pile, an air conditioner and a micro-grid as initial intelligent agents, and correspondingly obtaining a first charging pile scheduling model, a first air conditioner scheduling model and a first micro-grid scheduling model of first-stage reinforcement learning and second-stage reinforcement learning cascading; the first charging pile scheduling model is identical to the first air conditioner scheduling model and the first micro-grid scheduling model in action space, and the first charging pile scheduling model and the first micro-grid scheduling model are identical in state space.
In one embodiment of the present application, the demand model includes: a charging pile demand model, an air conditioning demand model and a micro-grid demand model.
In one embodiment of the present application, the charging pile demand model may be expressed as:
wherein,is in the first scheduling period->First->Power per unit time distributed by each charging pile, +.>Is a first discrete variable for controlling whether the corresponding charging pile is connected into the resource scheduling or not>For the maximum power threshold per unit time of the charging pile, < >>Is the first power duty cycle of the charging stake.
In one embodiment of the present application, the air conditioning requirement model may be expressed as:
,/>
wherein,is in the first scheduling period->First->The power per unit time of each air conditioner; />Is a second discrete variable to control whether the corresponding air conditioner is connected into the resource scheduling or not>Rated power for unit time of the air conditioner; />Is in the first scheduling period->Interior (I)>The air-conditioning temperature of the virtual energy storage of each air-conditioner; />Is the temperature ratio of the air conditioner, +.>Is the maximum air conditioning threshold.
In one embodiment of the present application, the micro-grid demand model may be expressed as:
wherein,the working mode decision of the micro-grid participating in the resource scheduling is controlled for the third discrete variable; / >The maximum power threshold is the energy storage unit time; />And a third power duty cycle for storing energy.
The first-stage resource allocation module 22 is configured to perform distributed execution of first-stage reinforcement learning with discrete first scheduling models according to a first environment for observing the multiple loads, and determine whether to access the multiple loads to resource scheduling, so as to obtain a first scheduling decision.
The first environment includes: the observed first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid are specifically: taking the first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid as the input of first-stage reinforcement learning of a first scheduling model of the charging pile taking the charging pile as an initial agent, the input of first-stage reinforcement learning of the first scheduling model of the air conditioner taking the air conditioner as an initial agent and the input of first-stage reinforcement learning of the first scheduling model of the micro-grid taking the micro-grid as an initial agent respectively, correspondingly outputting whether the corresponding charging pile is connected to a first discrete variable in resource scheduling, taking the first discrete variable as a first sub-scheduling decision, whether the corresponding air conditioner is connected to a second discrete variable in resource scheduling, taking the second discrete variable as a second sub-scheduling decision and taking the micro-grid as a third discrete variable participating in resource scheduling, and taking the third discrete variable as a working mode decision; wherein the first scheduling decision comprises: the first sub-scheduling decision, the second sub-scheduling decision, and the working mode decision.
The second-stage resource allocation module 23 is configured to use the first scheduling decision as at least one environment for second-stage reinforcement learning, and decide to access the resource scheduling of the load; the second stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period.
Specifically, the first scheduling decision is taken as at least one environment of second-stage reinforcement learning, the first environment is taken as a second environment of second-stage reinforcement learning of a first scheduling model, a second scheduling period of second-stage reinforcement learning distributed decision of the first scheduling model is taken as a third scheduling period of the next moment according to the second environment.
Wherein, according to the second environment, a second scheduling period of the second stage reinforcement learning distributed decision of the first scheduling model is specifically: taking the first sub-scheduling decision and the first charging pile environment as a second charging pile environment for second-stage reinforcement learning of a first scheduling model of the charging pile, and outputting a first sub-scheduling period of the charging pile participating in resource scheduling; taking the second sub-scheduling decision and the air conditioner first environment as a second-stage reinforcement learning air conditioner second environment of the air conditioner first scheduling model, and outputting a second sub-scheduling period of the air conditioner participating in resource scheduling; taking the third sub-scheduling decision and the first charging pile environment as a second-stage reinforcement learning micro-grid second environment of a micro-grid first scheduling model, and outputting a third sub-scheduling period of the micro-grid participating in resource scheduling; and taking the first sub-scheduling period, the first sub-scheduling period and the minimum value in the first sub-scheduling period as a second scheduling period.
The real-time scheduling module 24 is configured to upload experience data corresponding to the first scheduling model to an experience return visit pool, select sample data from the experience return visit pool for centralized training, and perform real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to the trained second scheduling model.
Uploading experience data corresponding to the first scheduling model to an experience return visit pool, uploading experience data corresponding to the first scheduling model to the experience return visit pool, and selecting sampling data from the experience return visit pool for centralized training; the first scheduling model comprises a charging pile first scheduling model, an air conditioner first scheduling model and a micro-grid first scheduling model, and specifically comprises the following steps: the charging pile first scheduling model, the air conditioner first scheduling model and the micro-grid first scheduling model respectively upload corresponding first environments, first scheduling decisions, input of second-stage reinforcement learning, allocated resource allocation results of second-stage reinforcement learning output and the second scheduling period into an experience return visit pool; and randomly selecting sampling data from the experience return visit pool according to the step length to perform centralized training, and simultaneously updating parameters obtained by training on the first scheduling model of the charging pile, the first scheduling model of the air conditioner and the first scheduling model of the micro-grid.
Specifically, according to a third environment of the charging pile, a third environment of the air conditioner and a third environment of the micro-grid, which are observed in real time, respectively used as inputs of first-stage reinforcement learning of a trained second scheduling model of the charging pile, a trained second scheduling model of the air conditioner and a trained second scheduling model of the micro-grid, respectively outputting a first real-time sub-scheduling decision whether the charging pile is connected to resource allocation, a second real-time sub-scheduling decision whether the air conditioner is connected to resource allocation and a third real-time working mode of the micro-grid participating in resource allocation; and taking the first real-time sub-scheduling decision and the third environment of the charging pile as the input of the second-stage reinforcement learning of the second scheduling model of the charging pile, taking the second real-time sub-scheduling decision and the third environment of the air conditioner as the input of the second-stage reinforcement learning of the second scheduling model of the air conditioner, and taking the third real-time working mode and the third environment of the micro grid as the input of the second-stage reinforcement learning of the second scheduling model of the micro grid to respectively obtain the first real-time power distribution of the charging pile, the second real-time power distribution of the air conditioner and the real-time output power of the micro grid.
According to the invention, the scheduling models are respectively built for the charging pile, the air conditioner and the micro-grid, and experience data of the charging pile, the air conditioner and the micro-grid are collected for centralized training, so that the same resource scheduling model can be built for different loads, potential coupling relations of different loads can be obtained to the greatest extent, the scheduling periods of the charging pile, the air conditioner and the micro-grid under the same scheduling model are explored, and the dynamic scheduling of different types of resources is adapted for reinforcement learning, so that the precision of multi-element resource scheduling is improved.
It will be appreciated by those skilled in the art that embodiments of the present application may also provide a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (10)

1. The multi-load resource joint scheduling method is characterized by comprising the following steps of:
when a first scheduling period starts, establishing an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the requirements of multiple loads; the multiple load includes: charging piles, air conditioners and micro-grids;
according to a first environment for observing the multiple loads, performing distributed execution on first-stage reinforcement learning with discrete first scheduling models, and deciding whether to access the multiple loads into resource scheduling or not to obtain a first scheduling decision;
taking the first scheduling decision as at least one environment of second-stage reinforcement learning, and deciding the resource scheduling of the access load; the second-stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period;
and uploading experience data corresponding to the first scheduling model to an experience return visit pool, selecting sampling data from the experience return visit pool for centralized training, and respectively carrying out real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to a trained second scheduling model.
2. The method for jointly scheduling multiple load resources according to claim 1, wherein the step of establishing an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the demand of multiple loads at the beginning of the first scheduling period comprises:
respectively observing the requirements of multiple loads at the beginning of the first scheduling period, establishing a corresponding requirement model, observing the current period, the weather and the position of the area where the corresponding load is located, the duration of the first scheduling period, the current electricity price and the corresponding load power according to the requirement model to obtain a first sub-environment of the corresponding load, and obtaining a second sub-environment of the corresponding load at the end of a third scheduling period at the last moment;
and establishing a first scheduling model of two-stage cascade reinforcement learning by taking the corresponding load as an initial agent according to the first load sub-environment and the second load sub-environment.
3. The method for jointly scheduling multiple load resources according to claim 2, wherein the establishing a first scheduling model of two-stage cascade reinforcement learning with the corresponding load as an initial agent according to the first load sub-environment and the second load sub-environment is specifically as follows:
Respectively taking a charging pile, an air conditioner and a micro-grid as initial intelligent agents, and correspondingly obtaining a first charging pile scheduling model, a first air conditioner scheduling model and a first micro-grid scheduling model of first-stage reinforcement learning and second-stage reinforcement learning cascading; the first charging pile scheduling model is identical to the first air conditioner scheduling model and the first micro-grid scheduling model in action space, and the first charging pile scheduling model and the first micro-grid scheduling model are identical in state space.
4. The multi-load resource joint scheduling method of claim 2, wherein the demand model comprises: a charging pile demand model, an air conditioner demand model and a micro-grid demand model; wherein, the charging pile demand model may be expressed as:
wherein,is in the first scheduling period->First->Power per unit time distributed by each charging pile, +.>Is a first discrete variable for controlling whether the corresponding charging pile is connected into the resource scheduling or not>For the maximum power threshold per unit time of the charging pile, < >>Is the first power duty cycle of the charging stake;
the air conditioning demand model may be expressed as:
wherein,is in the first scheduling period->First->The power per unit time of each air conditioner; />Is a second discrete variable to control whether the corresponding air conditioner is connected into the resource scheduling or not >Rated power for unit time of the air conditioner; />Is in the first scheduling period->Interior (I)>The air-conditioning temperature of the virtual energy storage of each air-conditioner; />Is the temperature ratio of the air conditioner, +.>Is the maximum air conditioning threshold;
the microgrid demand model may be expressed as:
wherein,the working mode decision of the micro-grid participating in the resource scheduling is controlled for the third discrete variable; />The maximum power threshold is the energy storage unit time; />And a third power duty cycle for storing energy.
5. The multi-load resource joint scheduling method of claim 1, wherein the first stage reinforcement learning of the first scheduling model discrete is performed in a distributed manner according to a first environment observed for the multi-load, and a decision is made as to whether to access the multi-load into resource scheduling to obtain a first scheduling decision; wherein the first environment comprises: the observed first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid are specifically:
taking the first environment of the charging pile, the first environment of the air conditioner and the first environment of the micro-grid as the input of first-stage reinforcement learning of a first scheduling model of the charging pile taking the charging pile as an initial agent, the input of first-stage reinforcement learning of the first scheduling model of the air conditioner taking the air conditioner as an initial agent and the input of first-stage reinforcement learning of the first scheduling model of the micro-grid taking the micro-grid as an initial agent respectively, correspondingly outputting whether the corresponding charging pile is connected to a first discrete variable in resource scheduling, taking the first discrete variable as a first sub-scheduling decision, whether the corresponding air conditioner is connected to a second discrete variable in resource scheduling, taking the second discrete variable as a second sub-scheduling decision and taking the micro-grid as a third discrete variable participating in resource scheduling, and taking the third discrete variable as a working mode decision;
Wherein the first scheduling decision comprises: the first sub-scheduling decision, the second sub-scheduling decision, and the working mode decision.
6. The method for jointly scheduling multiple load resources according to claim 1, wherein the first scheduling decision is used as at least one environment of second-stage reinforcement learning to decide the resource scheduling of the access load; the second stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period, and the method comprises the following steps:
and taking the first scheduling decision as at least one environment of second-stage reinforcement learning, taking the first scheduling decision and the first environment as a second environment of second-stage reinforcement learning of a first scheduling model, taking a second scheduling period of second-stage reinforcement learning distributed decision of the first scheduling model as a third scheduling period of the next moment according to the second environment.
7. The multi-load resource joint scheduling method according to claim 6, wherein the second scheduling period of the second stage reinforcement learning distributed decision of the first scheduling model according to the second environment is specifically:
Taking the first sub-scheduling decision and the first charging pile environment as a second charging pile environment for second-stage reinforcement learning of a first scheduling model of the charging pile, and outputting a first sub-scheduling period of the charging pile participating in resource scheduling;
taking the second sub-scheduling decision and the air conditioner first environment as a second-stage reinforcement learning air conditioner second environment of the air conditioner first scheduling model, and outputting a second sub-scheduling period of the air conditioner participating in resource scheduling;
taking the third sub-scheduling decision and the first charging pile environment as a second-stage reinforcement learning micro-grid second environment of a micro-grid first scheduling model, and outputting a third sub-scheduling period of the micro-grid participating in resource scheduling;
and taking the first sub-scheduling period, the first sub-scheduling period and the minimum value in the first sub-scheduling period as a second scheduling period.
8. The multi-load resource joint scheduling method of claim 1, wherein the performing real-time distributed resource scheduling on the charging pile, the air conditioner and the micro grid according to the trained second scheduling model includes:
respectively outputting a first real-time sub-scheduling decision of whether the charging pile is connected to resource allocation, a second real-time sub-scheduling decision of whether the air conditioner is connected to resource allocation and a third real-time working mode of the micro-grid participating in resource allocation according to a real-time observed charging pile third environment, an air conditioner third environment and a micro-grid third environment which are respectively used as inputs of a trained charging pile second scheduling model, an air conditioner second scheduling model and a first stage reinforcement learning of a micro-grid second scheduling model;
And taking the first real-time sub-scheduling decision and the third environment of the charging pile as the input of the second-stage reinforcement learning of the second scheduling model of the charging pile, taking the second real-time sub-scheduling decision and the third environment of the air conditioner as the input of the second-stage reinforcement learning of the second scheduling model of the air conditioner, and taking the third real-time working mode and the third environment of the micro grid as the input of the second-stage reinforcement learning of the second scheduling model of the micro grid to respectively obtain the first real-time power distribution of the charging pile, the second real-time power distribution of the air conditioner and the real-time output power of the micro grid.
9. The multi-load resource joint scheduling method according to claim 1, wherein the empirical data corresponding to the first scheduling model is uploaded to an empirical return visit pool, the empirical data corresponding to the first scheduling model is uploaded to the empirical return visit pool, and sampling data is selected from the empirical return visit pool for centralized training; the first scheduling model comprises a charging pile first scheduling model, an air conditioner first scheduling model and a micro-grid first scheduling model, and specifically comprises the following steps:
the charging pile first scheduling model, the air conditioner first scheduling model and the micro-grid first scheduling model respectively upload corresponding first environments, first scheduling decisions, input of second-stage reinforcement learning, allocated resource allocation results of second-stage reinforcement learning output and the second scheduling period into an experience return visit pool;
And randomly selecting sampling data from the experience return visit pool according to the step length to perform centralized training, and simultaneously updating parameters obtained by training on the first scheduling model of the charging pile, the first scheduling model of the air conditioner and the first scheduling model of the micro-grid.
10. A multiple load resource joint scheduling system, based on the multiple load resource joint scheduling method according to any one of claims 1 to 9, comprising: the system comprises a model building module, a first-stage resource allocation module, a second-stage resource allocation module and a real-time scheduling module; wherein,
the model building module is used for building an initial two-stage reinforcement learning first scheduling model for the obtained demand model according to the requirements of multiple loads when a first scheduling period starts; the multiple load includes: charging piles, air conditioners and micro-grids;
the first-stage resource allocation module is used for performing distributed execution on first-stage reinforcement learning with discrete first scheduling models according to a first environment for observing the multiple loads, and deciding whether the multiple loads are accessed into resource scheduling or not to obtain a first scheduling decision;
the second-stage resource allocation module is used for taking the first scheduling decision as at least one environment of second-stage reinforcement learning to decide the resource scheduling of the access load; the second-stage reinforcement learning takes the output second scheduling period as a third scheduling period of the next moment, so that the resource scheduling is decided after the environment is observed again in the third scheduling period;
And the real-time scheduling module is used for uploading experience data corresponding to the first scheduling model to an experience return visit pool, selecting sampling data from the experience return visit pool for centralized training, and respectively carrying out real-time distributed resource scheduling on the charging pile, the air conditioner and the micro-grid according to a trained second scheduling model.
CN202311616942.8A 2023-11-30 2023-11-30 Multi-load resource joint scheduling method and system Active CN117335439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311616942.8A CN117335439B (en) 2023-11-30 2023-11-30 Multi-load resource joint scheduling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311616942.8A CN117335439B (en) 2023-11-30 2023-11-30 Multi-load resource joint scheduling method and system

Publications (2)

Publication Number Publication Date
CN117335439A true CN117335439A (en) 2024-01-02
CN117335439B CN117335439B (en) 2024-02-27

Family

ID=89283382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311616942.8A Active CN117335439B (en) 2023-11-30 2023-11-30 Multi-load resource joint scheduling method and system

Country Status (1)

Country Link
CN (1) CN117335439B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012171147A1 (en) * 2011-06-17 2012-12-20 辽宁省电力有限公司 Coordination and control system for regulated charging and discharging of pure electric vehicle in combination with wind power generation
US20210356923A1 (en) * 2020-05-15 2021-11-18 Tsinghua University Power grid reactive voltage control method based on two-stage deep reinforcement learning
CN114091879A (en) * 2021-11-15 2022-02-25 浙江华云电力工程设计咨询有限公司 Multi-park energy scheduling method and system based on deep reinforcement learning
CN114362218A (en) * 2021-12-30 2022-04-15 中国电子科技南湖研究院 Deep Q learning-based multi-type energy storage scheduling method and device in microgrid
US20220126725A1 (en) * 2020-10-22 2022-04-28 Harbin Engineering University Method for scheduling multi agent and unmanned electric vehicle battery swap based on internet of vehicles
CN115360768A (en) * 2022-08-17 2022-11-18 广东电网有限责任公司 Power scheduling method and device based on muzero and deep reinforcement learning and storage medium
CN115358783A (en) * 2022-08-23 2022-11-18 湖南工业大学 Multi-electric vehicle and multi-micro-grid multi-party game energy trading system based on reinforcement learning and multiple constraints
CN115456287A (en) * 2022-09-21 2022-12-09 国电南瑞科技股份有限公司 Long-and-short-term memory network-based multi-element load prediction method for comprehensive energy system
CN116247648A (en) * 2022-12-12 2023-06-09 国网浙江省电力有限公司经济技术研究院 Deep reinforcement learning method for micro-grid energy scheduling under consideration of source load uncertainty
CN116542137A (en) * 2023-04-14 2023-08-04 贵州电网有限责任公司 Multi-agent reinforcement learning method for distributed resource cooperative scheduling
CN116780627A (en) * 2023-06-27 2023-09-19 中国电建集团华东勘测设计研究院有限公司 Micro-grid regulation and control method in building park

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012171147A1 (en) * 2011-06-17 2012-12-20 辽宁省电力有限公司 Coordination and control system for regulated charging and discharging of pure electric vehicle in combination with wind power generation
US20210356923A1 (en) * 2020-05-15 2021-11-18 Tsinghua University Power grid reactive voltage control method based on two-stage deep reinforcement learning
US20220126725A1 (en) * 2020-10-22 2022-04-28 Harbin Engineering University Method for scheduling multi agent and unmanned electric vehicle battery swap based on internet of vehicles
CN114091879A (en) * 2021-11-15 2022-02-25 浙江华云电力工程设计咨询有限公司 Multi-park energy scheduling method and system based on deep reinforcement learning
CN114362218A (en) * 2021-12-30 2022-04-15 中国电子科技南湖研究院 Deep Q learning-based multi-type energy storage scheduling method and device in microgrid
CN115360768A (en) * 2022-08-17 2022-11-18 广东电网有限责任公司 Power scheduling method and device based on muzero and deep reinforcement learning and storage medium
CN115358783A (en) * 2022-08-23 2022-11-18 湖南工业大学 Multi-electric vehicle and multi-micro-grid multi-party game energy trading system based on reinforcement learning and multiple constraints
CN115456287A (en) * 2022-09-21 2022-12-09 国电南瑞科技股份有限公司 Long-and-short-term memory network-based multi-element load prediction method for comprehensive energy system
CN116247648A (en) * 2022-12-12 2023-06-09 国网浙江省电力有限公司经济技术研究院 Deep reinforcement learning method for micro-grid energy scheduling under consideration of source load uncertainty
CN116542137A (en) * 2023-04-14 2023-08-04 贵州电网有限责任公司 Multi-agent reinforcement learning method for distributed resource cooperative scheduling
CN116780627A (en) * 2023-06-27 2023-09-19 中国电建集团华东勘测设计研究院有限公司 Micro-grid regulation and control method in building park

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FU X: "Etimating the failure probability in an integrated energy system considering correlations among failure pattems", 《ENERGY》, pages 656 - 666 *
莫静山等: "基于需求响应的电热综合能源系统优化调度研究综述", 《工程科学与技术》, pages 1 - 16 *

Also Published As

Publication number Publication date
CN117335439B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
Vázquez-Canteli et al. Reinforcement learning for demand response: A review of algorithms and modeling techniques
CN109599856B (en) Electric vehicle charging and discharging management optimization method and device in micro-grid multi-building
JP2024503017A (en) Method and device for optimizing charging and energy supply in a charging management system
CN107579518B (en) Power system environment economic load dispatching method and apparatus based on MHBA
CN109636056A (en) A kind of multiple-energy-source microgrid decentralization Optimization Scheduling based on multi-agent Technology
CN113098007B (en) Distributed online micro-grid scheduling method and system based on layered reinforcement learning
CN104037761B (en) AGC power multi-objective random optimization distribution method
CN114256836B (en) Capacity optimization configuration method for shared energy storage of new energy power station
Zhang et al. Fast stackelberg equilibrium learning for real-time coordinated energy control of a multi-area integrated energy system
CN111799822A (en) Energy utilization coordination control method of comprehensive energy system based on virtual energy storage
CN111682536B (en) Random-robust optimization operation method for virtual power plant participating in dual market before day
CN106712111A (en) Multi-objective fuzzy optimization multi-energy economic dispatching method under active distribution network environment
Pinzon et al. An MILP model for optimal management of energy consumption and comfort in smart buildings
CN111833205A (en) Mobile charging pile group intelligent scheduling method in big data scene
CN113110056B (en) Heat supply intelligent decision-making method and intelligent decision-making machine based on artificial intelligence
CN117335439B (en) Multi-load resource joint scheduling method and system
CN110992206B (en) Optimal scheduling method and system for multi-source electric field
CN117134380A (en) Hierarchical optimization operation method and system based on Yun Bian collaborative distributed energy storage
CN116799838A (en) On-line energy storage charge and discharge control method and medium for demand prediction and electronic equipment
CN109038672A (en) A kind of Multi-objective Robust Optimal Configuration Method for stabilizing renewable energy fluctuation
CN112600256B (en) Micro-grid power control method
CN114239930A (en) Demand response participation degree model construction method for smart power grid scene
CN113988440A (en) Secondary frequency modulation method for regional power distribution network based on virtual power plant
CN114221341A (en) Bidirectional interaction power demand response method and system based on all-Internet-of-things link
Fang et al. Energy scheduling and decision learning of combined cooling, heating and power microgrid based on deep deterministic policy gradient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant