CN112488531B - Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning - Google Patents
Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning Download PDFInfo
- Publication number
- CN112488531B CN112488531B CN202011389959.0A CN202011389959A CN112488531B CN 112488531 B CN112488531 B CN 112488531B CN 202011389959 A CN202011389959 A CN 202011389959A CN 112488531 B CN112488531 B CN 112488531B
- Authority
- CN
- China
- Prior art keywords
- load
- time
- real
- regulation
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000004044 response Effects 0.000 claims abstract description 42
- 230000006870 function Effects 0.000 claims description 134
- 230000002776 aggregation Effects 0.000 claims description 65
- 238000004220 aggregation Methods 0.000 claims description 65
- 230000009471 action Effects 0.000 claims description 50
- 238000006116 polymerization reaction Methods 0.000 claims description 34
- 238000005457 optimization Methods 0.000 claims description 32
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 23
- 238000012546 transfer Methods 0.000 claims description 20
- 238000005485 electric heating Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 10
- 230000007704 transition Effects 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 8
- 238000010438 heat treatment Methods 0.000 claims description 7
- 230000001276 controlling effect Effects 0.000 claims description 4
- 230000005611 electricity Effects 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 4
- 230000001105 regulatory effect Effects 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 9
- 229920000642 polymer Polymers 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000008236 heating water Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06313—Resource planning in a project environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/12—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
- H02J3/14—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by switching loads on to, or off from, network, e.g. progressively balanced loading
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02B—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
- Y02B70/00—Technologies for an efficient end-user side electric power management and consumption
- Y02B70/30—Systems integrating technologies related to power network operation and communication or information technologies for improving the carbon footprint of the management of residential or tertiary loads, i.e. smart grids as climate change mitigation technology in the buildings sector, including also the last stages of power distribution and the control, monitoring or operating management systems at local level
- Y02B70/3225—Demand response systems, e.g. load shedding, peak shaving
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S20/00—Management or operation of end-user stationary applications or the last stages of power distribution; Controlling, monitoring or operating thereof
- Y04S20/20—End-user application control systems
- Y04S20/222—Demand response systems, e.g. load shedding, peak shaving
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Educational Administration (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Public Health (AREA)
- Geometry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Water Supply & Treatment (AREA)
- Biodiversity & Conservation Biology (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
Abstract
The application discloses a heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.
Description
Technical Field
The application relates to the technical field of load regulation and control of power systems, in particular to a heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning.
Background
With the wide access of a large number of different flexible loads on demand sides and participation in power grid regulation, the heterogeneous characteristics of the flexible loads are gradually highlighted, and the processing of the heterogeneity becomes a key problem of practical regulation and application. The heterogeneous loads are divided into two heterogeneous modes of types and parameters, generally, different types of loads form heterogeneous types, loads of the same type but with different inherent parameters form heterogeneous parameters, and modeling of heterogeneous flexible loads is the basis of flexible load regulation.
The conventional load regulation and control models heterogeneous loads by using established physical parameters, and further performs target optimization and unified scheduling by clustering and dividing the heterogeneous loads into isomorphic groups or equivalent groups according to the similarity of the parameters, but the problem of complex physical parameters of the heterogeneous equipment is difficult to avoid. For example, for a temperature-controlled load, a first-order thermodynamic model of the temperature-controlled load is established in a conventional method based mainly on the dynamic temperature characteristic and the periodic operation mode of the load, but the response capability of a flexible load on a user side is reduced due to various loads, severe parameter differentiation, and multiple perception and interaction information depended on by regulation, and the demand response potential on the user side is difficult to excite.
Disclosure of Invention
The application provides a heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning, and the method and device are used for solving the technical problems that the response capability of a flexible load on a user side is low and the demand response potential of the user side is difficult to stimulate in the existing load regulation and control mode.
In view of this, the first aspect of the present application provides a method for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning, including:
respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, establishing a heterogeneous flexible load aggregation model, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads;
applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
Optionally, the single flexible load dynamics model includes a load temperature control dynamics function, a user discomfort function, and a reward function.
Optionally, the heterogeneous flexible load aggregation model is:
s(t+1)=F transition (s(t),a(t),w(t))
wherein s (t +1) is the state variable of the aggregated load at the time t +1, s (t) is the state variable of the aggregated load at the time t, a (t) is the action variable of the aggregated load at the time t, w (t) is the environment variable at the time t, R agg (t) is a reward function of the aggregate load at time t, r DR (t) aggregate load participation total revenue for demand response at time t,λ (t) (P) for total user discomfort agg (t)-P base (t)) Δ t is the reduction amount of electricity fee expenditure.
Optionally, the aggregate load real-time regulation deep reinforcement learning model is trained by using a deep Q-value network algorithm.
Optionally, the loss function of the deep reinforcement learning model is:
wherein, y j Is a target value of the Q network function, m is the number of samples, theta is a weight coefficient of the Q network function, s j Is the state variable of the jth sample, a j Is the action variable of the jth sample.
Optionally, the training of the aggregated load real-time regulation and control deep reinforcement learning model to obtain a real-time optimization regulation and control decision model for flexible load aggregation includes:
initializing a predicted Q network function and a target Q network function, setting the number of iteration rounds as EP, the learning rate as alpha, the exploration rate as epsilon and the maximum size of an experience playback pool as M;
collecting training samples, and storing the training samples in the experience playback pool;
extracting n samples from the experience playback pool in random batch, and calculating a loss function of the Q network function;
updating the weight coefficient theta of the Q network function by adopting a gradient descent method;
continuously generating new samples, replacing the old samples in the experience playback pool with the new samples, and calculating a loss function and a weight coefficient theta of the Q network function;
updating a weight coefficient theta' of the target Q network function;
checking whether the state variable s of the aggregated load is in a final state, if so, emptying the experience playback pool, sampling again, and putting a sampling sample into the experience playback pool;
and judging whether the iteration round number reaches the EP, if so, finishing the training, and otherwise, continuing the iteration.
Optionally, the method further comprises:
and testing the real-time optimization regulation and control decision model of flexible load aggregation.
The application second aspect provides a heterogeneous flexible load real-time regulation and control device based on deep reinforcement learning, includes:
the single flexible load modeling module is used for respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
the aggregation load modeling module is used for establishing a heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of the aggregation loads;
the application module is used for applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
the deep reinforcement learning module is used for establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and the strategy output module is used for inputting the state variable of the target aggregation load into the real-time optimization regulation and control decision model for flexible load aggregation to obtain the optimal strategy for real-time regulation and control of the aggregation load.
Optionally, the single flexible load dynamics model includes a load temperature control dynamics function, a user discomfort function, and a reward function.
Optionally, the method further comprises:
and the model testing module is used for testing the real-time optimization regulation and control decision model of flexible load aggregation.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, which comprises the following steps: respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load; according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, a heterogeneous flexible load aggregation model is established, and comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads; applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response; establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load; and inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
According to the heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, firstly, single flexible load models are respectively established for heterogeneous flexible loads of different types, then a polymerization load model is established for a plurality of flexible loads with different parameters and different structures, so that a Markov decision process when the heterogeneous flexible loads participate in demand response is obtained, a decision function of a polymer is trained through a machine learning framework of the deep reinforcement learning based on historical data, a real-time optimization regulation and control decision model of the heterogeneous flexible load polymer is obtained, an optimal strategy of real-time regulation and control of the polymerization load is obtained, and the flexible load response capability of a user side is improved. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for a user of ordinary skill in the art, other related drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a neural network form of a Q network function;
fig. 2 is a block diagram of a flow structure of a heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning provided in an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The application provides an embodiment of a heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, which comprises the following steps:
step 101, respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load.
It should be noted that, first, a single flexible load dynamic model is respectively established for heterogeneous flexible loads of different types, and the value ranges and the variation trends of variables such as state variables, behavior variables, environment variables, return functions and the like of the single flexible load are obtained. For convenience of understanding, two heterogeneous flexible loads, namely an electric heating load and an electric water heating load, are taken as examples in the embodiment of the present application for illustration, and it should be understood that other single flexible loads can be subjected to corresponding parameter changes on the basis of the embodiment of the present application to obtain the same effects as the embodiment of the present application.
For electric heating load:
the electric heating load is typically a temperature-controlled load, the purpose of which is to maintain the room temperature within a certain comfort range. Electric heating simulation by adopting equivalent model similar to first-order circuitSetting the rated power of the electric heating equipment i as P in the running process of the load i rate The equivalent thermal resistance of the room isEquivalent heat capacity ofIndoor temperature at time T is T i (T) outdoor temperature T ex (t), the dynamic equation of the indoor temperature can be expressed as:
in the formula, K i (t) is an action variable for controlling the on-off of the electric heating equipment i, and K i (t)∈{0,1}。
When the time granularity is Δ t, the formulat∈[0,T]Approximation translates into a state-transfer equation at discrete time:
let the comfortable temperature range of the user be [ T i L ,T i U ]The temperature of the electric heating equipment can be changed within the range ofThe user's discomfort function may be defined as:
wherein, the first and the second end of the pipe are connected with each other,andand respectively, discomfort penalty factors of the indoor temperature exceeding the upper limit and the lower limit.
And if the electricity price at the time t is lambda (t), the electricity fee expenditure of the user is as follows:
f i elec =λ(t)P i rate ·K i (t)Δt。
since the return function is a maximization function, the return function of the electric heating device i can be represented as:
r i (t)=-f i unc -f i elec 。
for electric water heating loads:
the electric hot water load can maintain the domestic water stored in the water tank within a comfortable range. In addition to dissipating heat in the environment, hot water in the water tank may also dissipate heat by flowing in cold water due to domestic hot water flowing out. Thus, the water temperature dynamic equation in tank i can be defined as:
wherein, the first and the second end of the pipe are connected with each other,represents the equivalent thermal resistance of the water tank i,represents the equivalent heat capacity, T, of the tank i i (t) Water temperature in tank i, P at time t i rate Indicating rated power, Q, of electric water heater of water tank i i (t) represents the amount of heat taken away by domestic water at time t, Q i (t) is related to the water usage habits of the user, K i (t) is a control variable for controlling the on-off of the electric hot water tank i, and K i (t)∈{0,1}。
General formulat∈[0,T]Converting into a discrete form to obtain a state transition equation as follows:
approximately, the comfortable temperature range of the electric hot water load can also be defined as [ T [ ] i L ,T i U ]The temperature of the electric heating water can be changed within the range ofAnd obtain the user's discomfort function, power consumption cost and reward function, which will not be described herein.
Step 102, establishing a heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads.
The flexible loads including heterogeneous types and heterogeneous parameters are aggregated to obtain an aggregation model of heterogeneous flexible loads, where the model includes a state variable, a state space, an action variable, an action space, and a state transfer function of the aggregated loads. For convenience of explanation, an aggregation model of heterogeneous flexible loads is established by taking N electric heating loads and L electric water heating loads containing heterogeneous parameters as an example. The subscripts of the electric heating load and the heterogeneous parameters thereof are respectively 1-N, and the subscript of the electric water heating load is N + 1-N + L.
Let the state variables of the aggregate load be:
the action variables are:
the state space of the aggregate load is:
wherein, the first and the second end of the pipe are connected with each other,the lower limit and the upper limit of the water temperature control range of the electric heating water load are respectively.
The action space is as follows:
A={0,1} N+L 。
since each load in the aggregate load has heterogeneous parameters, a single dynamic equation cannot be directly established. The state transition equations of the respective loads can be simultaneously established to obtain:
the above state transition equation can be simplified as:
s(t+1)=F transition (s(t),a(t),w(t))
where w (t) represents an environmental variable.
Setting the demand response regulation power required to be executed by the aggregate load at the time t to be delta P DR (t) the base line load of the aggregate load at time t is P base (t), aggregate power P at time t agg (t) is:
the aggregated load is in real-time market for revenue in response to system regulatory instructions. Assuming that the unit benefit of demand response at time t is μ (t), and that the full benefit is available only when the power of the actual response of the user is within a certain error range [1- ε,1+ ε ], ε is typically taken to be 20%, and the excess is not available. Thus, the total revenue of the aggregate load participating in the demand response is:
thus, the reward function at the time of the aggregate load t can be expressed as:
wherein R is agg (t) is a reward function of the aggregate load at time t, r DR (t) aggregate load participation total revenue for demand response at time t,λ (t) (P) for total user discomfort agg (t)-P base (t)) Δ t is an electric power fee expenditure reduction amount.
At t 0 The purpose of carrying out real-time optimization regulation and control on the aggregation load at any moment is to find t 0 Optimal strategy of time of day such that from t 0 The cumulative expected reward of the user from time to T period is the greatest. Considering the uncertainty of the future period, it is necessary to multiply the gain occurring for the future period by the attenuation coefficient γ. Assuming that the initial state variable is known as s 0 Then the real-time regulatory optimization problem can be expressed as:
s.t:s(t 0 )=s 0 ,s(t)∈S,a(t)∈A
E s(t),a(t) are expected values in the feasible domain state space S and the action space a. The real-time regulation and control optimization problem is a mixed integer nonlinear programming problem, and the optimization variable is a (t) 0 )。
And 103, applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response.
It should be noted that after the aggregation model is obtained, aggregation wakeup needs to be put into the real-time regulation and control environment of the power system, interact with the real-time environment, and continuously evolve to obtain a return function of aggregation load participating in real-time response.
And step 104, establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization.
It should be noted that, as can be seen from the real-time regulation and optimization problem, the dimension considering the constraint condition reaches (N + L) · (T-T) 0 ) Obviously, when the number of the aggregation loads is large, the complexity of direct optimization solution is large, and the instantaneity requirement of real-time optimization regulation and control is difficult to meet. Therefore, a (t) is obtained by the following deep reinforcement learning method 0 ) An approximation of (d). From the steps 102 to 103, it can be known that the process of the aggregation load participating in the real-time regulation is a Markov decision process, that is, the individual is at t 0 The decision of the moment and the subsequent state are only compared with the current state s (t) 0 ) Regardless of historical information. Slave typeQuadruplets of Markov decision process can be obtained<s,a,r,s'>Namely:
the state variable S belongs to S;
an action variable a belongs to A;
state transfer function s' ═ F transition (s,a);
The return function R is R agg (s,a)。
Let pi be the policy of aggregated load, which means the probability of taking a possible action variable a for a state variable s in the markov decision process, and is expressed as pi (a | s), as shown in the following formula:
π(a|s)=Pr[a(t)=a|s(t)=s]
thus, the goal of deep reinforcement learning is to find the optimal strategy that maximizes the expected value of the cumulative reward function
Defining a Q network function of the aggregated load and expressing the Q network function through a neural network, defining a loss function of a learning Q network function, and initializing a prediction Q network and a target Q network. Initializing the state of the aggregation load and collecting samples to store in an experience replay pool. And performing off-line training on the predicted Q network by using batch sampling in the empirical playback pool and the value of the target Q network, updating the parameter of the predicted Q network by a gradient descent method, repeating the step and periodically updating the parameter of the target Q network until the iteration number reaches the maximum value.
And 105, inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
It should be noted that after the real-time optimal regulation and control decision model for flexible load aggregation is obtained, the state variable s of the target aggregated load is input into the real-time optimal regulation and control decision model for flexible load aggregation, and an optimal strategy for aggregated load real-time regulation and control is obtained, which is expressed as:
where Q (s, a | θ) is a Q network function.
According to the heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, firstly, single flexible load models are respectively established for heterogeneous flexible loads of different types, then a polymerization load model is established for a plurality of heterogeneous flexible loads of different parameters, so that a Markov decision process when the heterogeneous flexible loads participate in demand response is obtained, a decision function of a polymer is trained through a machine learning framework of the deep reinforcement learning based on historical data, a real-time optimization regulation and control decision model of the heterogeneous flexible load polymer is obtained, an optimal strategy of real-time regulation and control of the polymerization load is obtained, and the flexible load response capability of a user side is improved. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.
Specifically, the aggregation load real-time regulation and control deep reinforcement learning model is trained by adopting a deep Q value network algorithm. The deep Q value network algorithm introduces two neural network functions to search for the optimal strategy, namely a value function and a Q network function. Wherein, the cost function represents the accumulated return expectation value obtained by the individual in the state s by adopting the strategy pi, and is represented as:
the Q network function represents the accumulated return expectation value obtained by selecting the action variable a under the state variable s and then continuously adopting the strategy pi, and is represented as follows:
at the optimal strategyNext, for any other strategy π, given an arbitrary state variable s, the individual's cost function should be satisfied
Based on the Bellman optimal equation, the optimal strategy adopted can be obtainedIn the case of (2), the relationship between the cost function and the Q-network function is expressed as:
that is, the Q network function can be decomposed into two parts, namely a reward function in the current state and a cost function in the next state multiplied by an attenuation coefficient.
And the value function under the optimal strategy satisfies the following conditions:
will be provided withSubstitution intoIn (3), it can be obtained that the Q network function satisfiesCan be combined withThe method is applied to the training of the neural network.
The left side is regarded as the predicted value Q of the Q network function, and the right side is regarded as the target value Q' of the Q network function.
And carrying out neural network parameterization representation on the Q network function. The mapping from input (s, a) to Q is first represented by having a typical fully connected neural network, as shown in fig. 1. Where the inputs are s and a, the output is Q, and the weighting factor is represented by θ. The purpose of the deep reinforcement learning is to make the predicted value Q of Q approach the target value Q' as much as possible by training the weight coefficient θ.
If it is paired withThe Q network functions on both sides are trained by adopting the same parameters, so that the dependency of the two is too strong, and the algorithm convergence is not facilitated. Therefore, it is necessary to put both sidesThe Q network function is represented by two neural networks Q and Q ', which are respectively called a prediction Q network and a target Q network, the structures of the two networks are completely consistent, and the corresponding weight coefficients are theta and theta', respectively.
Deep reinforcement learning requires training neural network parameters using owned data so that the output of the neural network approaches a target value as closely as possible. Let the current data have m samples(s) j ,a j ,s′ j ,r j ) And j is 1,2, …, m, the mean square error loss function of the neural network can be expressed as:
wherein, y j Representing the target value of the Q network function, the expression is:
as shown in fig. 2, the process of training the aggregation load real-time regulation deep reinforcement learning model is as follows:
(1) the neural network functions Q and Q' are initialized. The iteration round number is set to be EP, the learning rate is alpha, the exploration rate is epsilon, and the maximum size of the experience playback pool is M. Iterative training then begins.
(2) And sampling to obtain an experience playback pool.
Data samples for training the neural network can be obtained through offline sampling, and the collected samples are stored in an experience playback pool. Firstly, randomly initializing the state variable of the aggregated load to obtain s-s 1 。
Then adopting greedy strategy (epsilon-greedy) to obtain a as a 1 。
The epsilon-greedy strategy is as follows:
wherein the exploration rate epsilon is a constant between 0 and 1, and delta is a random sampling value between 0 and 1. The method is adopted to explore more action spaces as much as possible and avoid falling into a local optimization solution.
Then will s 1 And a 1 The transfer function and the return function are brought in to obtain the next state value s' 1 And r 1 Obtaining a sample quadruple(s) 1 ,a 1 ,s′ 1 ,r 1 )。
Let s 2 =s′ 1 . Repeating the above steps to obtain(s) 2 ,a 2 ,s′ 2 ,r 2 )、…、(s M ,a M ,s′ M ,r M ). M is the maximum number of samples of the empirical playback pool. Where the initial state variables need to be reset each time t reaches a maximum value.
(3) Randomly batch-extracting n samples from experience playback pool, substitutingThe corresponding Loss function Loss (θ) is calculated.
(4) The parameter theta of the Q-network function is updated. Updating theta in a gradient descent mode:
(5) continue to generate new samples(s) j ,a j ,s′ j ,r j ) J is M +1, M + 2. And replaces the old sample in the empirical playback pool with the new sample. Repeating the steps (3) and (4) every time n new data samples are put in.
(6) The parameter θ 'of the target Q network function Q' is updated. Updating the target Q network function Q' once, namely:
θ′←θ
(7) checking whether the state s is a final state, and if so, emptying the experience playback pool and jumping to the step (2) to restart.
(8) And (5) repeating the steps (2) to (7) until the number of iteration rounds reaches EP.
After neural network training is completed, test set data may be generated to verify the validity of the strategy.
In any state, given the state s of the aggregate load, the optimal decision for real-time regulation is obtained as follows.
And testing and recording the result of the optimized regulation and control of the polymerization load.
The application also provides an embodiment of the heterogeneous flexible load real-time regulation and control device based on deep reinforcement learning, which comprises the following steps:
the single flexible load modeling module is used for respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
the aggregation load modeling module is used for establishing an heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of the aggregation loads;
the application module is used for applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
the deep reinforcement learning module is used for establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and the strategy output module is used for inputting the state variable of the target aggregated load into the real-time optimization regulation and control decision model for flexible load aggregation to obtain the optimal strategy for real-time regulation and control of the aggregated load.
According to the heterogeneous flexible load real-time regulation and control device based on deep reinforcement learning, firstly, single flexible load models are respectively established for heterogeneous flexible loads of different types, then a polymerization load model is established for a plurality of flexible loads with different parameters and different structures, so that a Markov decision process when the heterogeneous flexible loads participate in demand response is obtained, a decision function of a polymer is trained through a machine learning framework of the deep reinforcement learning based on historical data, a real-time optimization regulation and control decision model of the heterogeneous flexible load polymer is obtained, an optimal strategy of real-time regulation and control of the polymerization load is obtained, and the flexible load response capability of a user side is improved. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.
Further, the single flexible load dynamic model includes a load temperature control dynamic function, a user discomfort function, and a reward function.
Further, the method also comprises the following steps:
and the model testing module is used for testing the real-time optimization regulation and control decision model of flexible load aggregation.
It should be noted that the device provided in the embodiment of the present application is a virtual device embodiment corresponding to the foregoing heterogeneous flexible load real-time regulation and control method embodiment based on deep reinforcement learning, and the embodiment of the present application can achieve the same technical effects as the foregoing heterogeneous flexible load real-time regulation and control method embodiment based on deep reinforcement learning, and is not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (9)
1. A heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning is characterized by comprising the following steps:
respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, establishing a heterogeneous flexible load aggregation model, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads; wherein the heterogeneous flexible load aggregation model is as follows:
s(t+1)=F transition (s(t),a(t),w(t))
wherein s (t +1) is a state variable of the aggregated load at the time t +1, F transition (s (t), a (t), w (t)) is a state transfer function of the aggregated load at time t, s (t) is a state variable of the aggregated load at time t, a (t) is an action variable of the aggregated load at time t, w (t) is an environment variable at time t, R agg (t) is a reward function of the aggregate load at time t, r DR (t) Total revenue of aggregated load participation in demand response at time t, λ (t) (P) agg (t)-P base (t)) Δ t is the reduction amount of electricity charge expenditure, f i unc For user non-fitness, T i L A comfort temperature lower limit for the user; t is a unit of i U Is the upper comfortable temperature limit, T, of the user i min Is the lower limit of the temperature variation, T, of the device i i max Is the upper limit of temperature variation of the equipment i, N is the number of electric heating equipment, L is the number of electric water heating equipment, T i (t) is the indoor temperature,is the indoor temperature T i (T) exceeding the user's comfort upper temperature limit T i U Is determined by the non-suitability penalty factor of (c),is the indoor temperature T i (T) is below the user's comfort temperature lower limit T i L A discomfort penalty factor of;
applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a flexible load polymerization real-time optimization regulation and control decision model;
and inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
2. The method for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning according to claim 1, wherein the single flexible load dynamic model comprises a load temperature control dynamic function, a user discomfort function and a reward function.
3. The heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning of claim 1, wherein the aggregate load real-time regulation and control deep reinforcement learning model is trained by adopting a deep Q value network algorithm.
4. The method for regulating and controlling the heterogeneous flexible load based on the deep reinforcement learning according to claim 3, wherein the loss function of the deep reinforcement learning model is as follows:
wherein, y j Is the target value of the Q network function, m is the number of samples, theta is the weight coefficient of the Q network function, s j Is the state variable of the jth sample, a j Is the action variable of the jth sample.
5. The heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning of claim 4, wherein the training of the aggregated load real-time regulation and control deep reinforcement learning model to obtain a flexible load aggregated real-time optimization regulation and control decision model comprises:
initializing a predicted Q network function and a target Q network function, setting the number of iteration rounds as EP, the learning rate as alpha, the exploration rate as epsilon and the maximum size of an experience playback pool as M;
collecting training samples, and storing the training samples in the experience playback pool;
extracting n samples from the experience playback pool in random batch, and calculating a loss function of the Q network function;
updating the weight coefficient theta of the Q network function by adopting a gradient descent method;
continuously generating new samples, replacing the old samples in the experience playback pool with the new samples, and calculating a loss function and a weight coefficient theta of the Q network function;
updating a weight coefficient theta' of the target Q network function;
checking whether a state variable s of the aggregation load is in a final state, if so, emptying the experience playback pool, sampling again, and putting a sampling sample into the experience playback pool;
and judging whether the iteration round number reaches EP, if so, finishing the training, and otherwise, continuing the iteration.
6. The method for regulating and controlling the heterogeneous flexible load based on the deep reinforcement learning in real time as claimed in claim 5, further comprising:
and testing the real-time optimization regulation and control decision model of flexible load aggregation.
7. The utility model provides a heterogeneous flexible load real-time regulation and control device based on deep reinforcement study which characterized in that includes:
the single flexible load modeling module is used for respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
the aggregation load modeling module is used for establishing a heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of the aggregation loads; wherein the heterogeneous flexible load aggregation model is as follows:
s(t+1)=F transition (s(t),a(t),w(t))
wherein s (t +1) is a state variable of the aggregated load at the time of t +1, F transition (s (t), a (t), w (t)) is a state transfer function of the aggregated load at time t, s (t) is a state variable of the aggregated load at time t, a (t) is an action variable of the aggregated load at time t, w (t) is an environment variable at time t, R agg (t) is a reward function of the aggregate load at time t, r DR (t) Total revenue of aggregated load participation in demand response at time t, λ (t) (P) agg (t)-P base (t)) Δ t is an electric power charge reduction amount, f i unc For user non-fitness, T i L A comfort temperature lower limit for the user; t is i U Is the upper comfortable temperature limit, T, of the user i min Is the lower limit of the temperature variation, T, of the device i i max The upper limit of the temperature variation of the equipment i, N is the number of electric heating equipment, L is the number of electric water heating equipment, and T is the number of electric water heating equipment i (t) is the indoor temperature,is the indoor temperature T i (T) exceeding an upper comfort temperature limit T for the user i U Is determined by the non-suitability penalty factor of (c),is the indoor temperature T i (T) is below the user's comfort temperature lower limit T i L A discomfort penalty factor of;
the application module is used for applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
the deep reinforcement learning module is used for establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and the strategy output module is used for inputting the state variable of the target aggregated load into the real-time optimization regulation and control decision model for flexible load aggregation to obtain the optimal strategy for real-time regulation and control of the aggregated load.
8. The device for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning of claim 7, wherein the single flexible load dynamic model comprises a load temperature control dynamic function, a user discomfort function and a reward function.
9. The device for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning according to claim 7, further comprising:
and the model testing module is used for testing the real-time optimization regulation and control decision model of flexible load aggregation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011389959.0A CN112488531B (en) | 2020-12-02 | 2020-12-02 | Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011389959.0A CN112488531B (en) | 2020-12-02 | 2020-12-02 | Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112488531A CN112488531A (en) | 2021-03-12 |
CN112488531B true CN112488531B (en) | 2022-09-06 |
Family
ID=74939630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011389959.0A Active CN112488531B (en) | 2020-12-02 | 2020-12-02 | Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112488531B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723798B (en) * | 2021-08-27 | 2022-11-11 | 广东电网有限责任公司 | Demand response control method and system based on online deep reinforcement learning |
CN115549109A (en) * | 2022-09-15 | 2022-12-30 | 清华大学 | Mass flexible load rapid aggregation control method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107453356A (en) * | 2017-08-21 | 2017-12-08 | 南京邮电大学 | User side flexible load dispatching method based on adaptive Dynamic Programming |
CN107800157A (en) * | 2017-11-14 | 2018-03-13 | 武汉大学 | The virtual power plant dual-layer optimization dispatching method of the temperature control load containing polymerization and new energy |
CN108964042A (en) * | 2018-07-24 | 2018-12-07 | 合肥工业大学 | Regional power grid operating point method for optimizing scheduling based on depth Q network |
CN110705737A (en) * | 2019-08-09 | 2020-01-17 | 四川大学 | Comprehensive optimization configuration method for multiple energy storage capacities of multi-energy microgrid |
WO2020037127A1 (en) * | 2018-08-17 | 2020-02-20 | Dauntless.Io, Inc. | Systems and methods for modeling and controlling physical dynamical systems using artificial intelligence |
CN111709672A (en) * | 2020-07-20 | 2020-09-25 | 国网黑龙江省电力有限公司 | Virtual power plant economic dispatching method based on scene and deep reinforcement learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3398116A1 (en) * | 2015-12-31 | 2018-11-07 | Vito NV | Methods, controllers and systems for the control of distribution systems using a neural network architecture |
-
2020
- 2020-12-02 CN CN202011389959.0A patent/CN112488531B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107453356A (en) * | 2017-08-21 | 2017-12-08 | 南京邮电大学 | User side flexible load dispatching method based on adaptive Dynamic Programming |
CN107800157A (en) * | 2017-11-14 | 2018-03-13 | 武汉大学 | The virtual power plant dual-layer optimization dispatching method of the temperature control load containing polymerization and new energy |
CN108964042A (en) * | 2018-07-24 | 2018-12-07 | 合肥工业大学 | Regional power grid operating point method for optimizing scheduling based on depth Q network |
WO2020037127A1 (en) * | 2018-08-17 | 2020-02-20 | Dauntless.Io, Inc. | Systems and methods for modeling and controlling physical dynamical systems using artificial intelligence |
CN110705737A (en) * | 2019-08-09 | 2020-01-17 | 四川大学 | Comprehensive optimization configuration method for multiple energy storage capacities of multi-energy microgrid |
CN111709672A (en) * | 2020-07-20 | 2020-09-25 | 国网黑龙江省电力有限公司 | Virtual power plant economic dispatching method based on scene and deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
基于深度强化学习的分布式电采暖参与需求响应优化调度;严干贵等;《电网技术》;20201105;第44卷(第11期);第4140-4147页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112488531A (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wei et al. | A bi-level scheduling model for virtual power plants with aggregated thermostatically controlled loads and renewable energy | |
Pinto et al. | Data-driven district energy management with surrogate models and deep reinforcement learning | |
Pinto et al. | Coordinated energy management for a cluster of buildings through deep reinforcement learning | |
Javaid et al. | Towards buildings energy management: using seasonal schedules under time of use pricing tariff via deep neuro-fuzzy optimizer | |
CN113572157B (en) | User real-time autonomous energy management optimization method based on near-end policy optimization | |
CN111695793B (en) | Method and system for evaluating energy utilization flexibility of comprehensive energy system | |
Kou et al. | Model-based and data-driven HVAC control strategies for residential demand response | |
CN112488531B (en) | Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning | |
CN113112077B (en) | HVAC control system based on multi-step prediction deep reinforcement learning algorithm | |
CN112036934A (en) | Quotation method for participation of load aggregators in demand response considering thermoelectric coordinated operation | |
CN114623569B (en) | Cluster air conditioner load differential regulation and control method based on deep reinforcement learning | |
Kong et al. | Refined peak shaving potential assessment and differentiated decision-making method for user load in virtual power plants | |
CN104036328A (en) | Self-adaptive wind power prediction system and prediction method | |
Bian et al. | Demand response model identification and behavior forecast with OptNet: A gradient-based approach | |
Feng et al. | Economic dispatch of industrial park considering uncertainty of renewable energy based on a deep reinforcement learning approach | |
Zhang et al. | Networked Multiagent-Based Safe Reinforcement Learning for Low-Carbon Demand Management in Distribution Networks | |
Wu et al. | Virtual-real interaction control of hybrid load system for low-carbon energy services | |
CN113591391A (en) | Power load control device, control method, terminal, medium and application | |
Amadeh et al. | Building cluster demand flexibility: An innovative characterization framework and applications at the planning and operational levels | |
Lénet et al. | An inverse nash mean field game-based strategy for the decentralized control of thermostatic loads | |
Coffman et al. | A model-free method for learning flexibility capacity of loads providing grid support | |
Pandiyan et al. | Recursive training based physics-inspired neural network for electric water heater modeling | |
Liu et al. | A Real-time Demand Response Strategy of Home Energy Management by Using Distributed Deep Reinforcement Learning | |
CN112052989B (en) | Risk cost allocation method for comprehensive energy resource sharing community | |
Amasyali et al. | Deep Reinforcement Learning for Autonomous Water Heater Control. Buildings 2021, 11, 548 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |