CN112488531B - Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning - Google Patents

Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning Download PDF

Info

Publication number
CN112488531B
CN112488531B CN202011389959.0A CN202011389959A CN112488531B CN 112488531 B CN112488531 B CN 112488531B CN 202011389959 A CN202011389959 A CN 202011389959A CN 112488531 B CN112488531 B CN 112488531B
Authority
CN
China
Prior art keywords
load
time
real
regulation
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011389959.0A
Other languages
Chinese (zh)
Other versions
CN112488531A (en
Inventor
肖云鹏
蔡秋娜
关玉衡
张兰
白杨
刘思捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Original Assignee
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd filed Critical Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority to CN202011389959.0A priority Critical patent/CN112488531B/en
Publication of CN112488531A publication Critical patent/CN112488531A/en
Application granted granted Critical
Publication of CN112488531B publication Critical patent/CN112488531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/14Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by switching loads on to, or off from, network, e.g. progressively balanced loading
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B70/00Technologies for an efficient end-user side electric power management and consumption
    • Y02B70/30Systems integrating technologies related to power network operation and communication or information technologies for improving the carbon footprint of the management of residential or tertiary loads, i.e. smart grids as climate change mitigation technology in the buildings sector, including also the last stages of power distribution and the control, monitoring or operating management systems at local level
    • Y02B70/3225Demand response systems, e.g. load shedding, peak shaving
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S20/00Management or operation of end-user stationary applications or the last stages of power distribution; Controlling, monitoring or operating thereof
    • Y04S20/20End-user application control systems
    • Y04S20/222Demand response systems, e.g. load shedding, peak shaving

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Geometry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Water Supply & Treatment (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)

Abstract

The application discloses a heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.

Description

Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning
Technical Field
The application relates to the technical field of load regulation and control of power systems, in particular to a heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning.
Background
With the wide access of a large number of different flexible loads on demand sides and participation in power grid regulation, the heterogeneous characteristics of the flexible loads are gradually highlighted, and the processing of the heterogeneity becomes a key problem of practical regulation and application. The heterogeneous loads are divided into two heterogeneous modes of types and parameters, generally, different types of loads form heterogeneous types, loads of the same type but with different inherent parameters form heterogeneous parameters, and modeling of heterogeneous flexible loads is the basis of flexible load regulation.
The conventional load regulation and control models heterogeneous loads by using established physical parameters, and further performs target optimization and unified scheduling by clustering and dividing the heterogeneous loads into isomorphic groups or equivalent groups according to the similarity of the parameters, but the problem of complex physical parameters of the heterogeneous equipment is difficult to avoid. For example, for a temperature-controlled load, a first-order thermodynamic model of the temperature-controlled load is established in a conventional method based mainly on the dynamic temperature characteristic and the periodic operation mode of the load, but the response capability of a flexible load on a user side is reduced due to various loads, severe parameter differentiation, and multiple perception and interaction information depended on by regulation, and the demand response potential on the user side is difficult to excite.
Disclosure of Invention
The application provides a heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning, and the method and device are used for solving the technical problems that the response capability of a flexible load on a user side is low and the demand response potential of the user side is difficult to stimulate in the existing load regulation and control mode.
In view of this, the first aspect of the present application provides a method for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning, including:
respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, establishing a heterogeneous flexible load aggregation model, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads;
applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
Optionally, the single flexible load dynamics model includes a load temperature control dynamics function, a user discomfort function, and a reward function.
Optionally, the heterogeneous flexible load aggregation model is:
s(t+1)=F transition (s(t),a(t),w(t))
Figure BDA0002812267550000021
wherein s (t +1) is the state variable of the aggregated load at the time t +1, s (t) is the state variable of the aggregated load at the time t, a (t) is the action variable of the aggregated load at the time t, w (t) is the environment variable at the time t, R agg (t) is a reward function of the aggregate load at time t, r DR (t) aggregate load participation total revenue for demand response at time t,
Figure BDA0002812267550000022
λ (t) (P) for total user discomfort agg (t)-P base (t)) Δ t is the reduction amount of electricity fee expenditure.
Optionally, the aggregate load real-time regulation deep reinforcement learning model is trained by using a deep Q-value network algorithm.
Optionally, the loss function of the deep reinforcement learning model is:
Figure BDA0002812267550000023
wherein, y j Is a target value of the Q network function, m is the number of samples, theta is a weight coefficient of the Q network function, s j Is the state variable of the jth sample, a j Is the action variable of the jth sample.
Optionally, the training of the aggregated load real-time regulation and control deep reinforcement learning model to obtain a real-time optimization regulation and control decision model for flexible load aggregation includes:
initializing a predicted Q network function and a target Q network function, setting the number of iteration rounds as EP, the learning rate as alpha, the exploration rate as epsilon and the maximum size of an experience playback pool as M;
collecting training samples, and storing the training samples in the experience playback pool;
extracting n samples from the experience playback pool in random batch, and calculating a loss function of the Q network function;
updating the weight coefficient theta of the Q network function by adopting a gradient descent method;
continuously generating new samples, replacing the old samples in the experience playback pool with the new samples, and calculating a loss function and a weight coefficient theta of the Q network function;
updating a weight coefficient theta' of the target Q network function;
checking whether the state variable s of the aggregated load is in a final state, if so, emptying the experience playback pool, sampling again, and putting a sampling sample into the experience playback pool;
and judging whether the iteration round number reaches the EP, if so, finishing the training, and otherwise, continuing the iteration.
Optionally, the method further comprises:
and testing the real-time optimization regulation and control decision model of flexible load aggregation.
The application second aspect provides a heterogeneous flexible load real-time regulation and control device based on deep reinforcement learning, includes:
the single flexible load modeling module is used for respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
the aggregation load modeling module is used for establishing a heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of the aggregation loads;
the application module is used for applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
the deep reinforcement learning module is used for establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and the strategy output module is used for inputting the state variable of the target aggregation load into the real-time optimization regulation and control decision model for flexible load aggregation to obtain the optimal strategy for real-time regulation and control of the aggregation load.
Optionally, the single flexible load dynamics model includes a load temperature control dynamics function, a user discomfort function, and a reward function.
Optionally, the method further comprises:
and the model testing module is used for testing the real-time optimization regulation and control decision model of flexible load aggregation.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, which comprises the following steps: respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load; according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, a heterogeneous flexible load aggregation model is established, and comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads; applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response; establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load; and inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
According to the heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, firstly, single flexible load models are respectively established for heterogeneous flexible loads of different types, then a polymerization load model is established for a plurality of flexible loads with different parameters and different structures, so that a Markov decision process when the heterogeneous flexible loads participate in demand response is obtained, a decision function of a polymer is trained through a machine learning framework of the deep reinforcement learning based on historical data, a real-time optimization regulation and control decision model of the heterogeneous flexible load polymer is obtained, an optimal strategy of real-time regulation and control of the polymerization load is obtained, and the flexible load response capability of a user side is improved. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for a user of ordinary skill in the art, other related drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a neural network form of a Q network function;
fig. 2 is a block diagram of a flow structure of a heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning provided in an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The application provides an embodiment of a heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, which comprises the following steps:
step 101, respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load.
It should be noted that, first, a single flexible load dynamic model is respectively established for heterogeneous flexible loads of different types, and the value ranges and the variation trends of variables such as state variables, behavior variables, environment variables, return functions and the like of the single flexible load are obtained. For convenience of understanding, two heterogeneous flexible loads, namely an electric heating load and an electric water heating load, are taken as examples in the embodiment of the present application for illustration, and it should be understood that other single flexible loads can be subjected to corresponding parameter changes on the basis of the embodiment of the present application to obtain the same effects as the embodiment of the present application.
For electric heating load:
the electric heating load is typically a temperature-controlled load, the purpose of which is to maintain the room temperature within a certain comfort range. Electric heating simulation by adopting equivalent model similar to first-order circuitSetting the rated power of the electric heating equipment i as P in the running process of the load i rate The equivalent thermal resistance of the room is
Figure BDA0002812267550000051
Equivalent heat capacity of
Figure BDA0002812267550000052
Indoor temperature at time T is T i (T) outdoor temperature T ex (t), the dynamic equation of the indoor temperature can be expressed as:
Figure BDA0002812267550000053
t∈[0,T]
in the formula, K i (t) is an action variable for controlling the on-off of the electric heating equipment i, and K i (t)∈{0,1}。
When the time granularity is Δ t, the formula
Figure BDA0002812267550000061
t∈[0,T]Approximation translates into a state-transfer equation at discrete time:
Figure BDA0002812267550000062
let the comfortable temperature range of the user be [ T i L ,T i U ]The temperature of the electric heating equipment can be changed within the range of
Figure BDA0002812267550000064
The user's discomfort function may be defined as:
Figure BDA0002812267550000065
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002812267550000066
and
Figure BDA0002812267550000067
and respectively, discomfort penalty factors of the indoor temperature exceeding the upper limit and the lower limit.
And if the electricity price at the time t is lambda (t), the electricity fee expenditure of the user is as follows:
f i elec =λ(t)P i rate ·K i (t)Δt。
since the return function is a maximization function, the return function of the electric heating device i can be represented as:
r i (t)=-f i unc -f i elec
for electric water heating loads:
the electric hot water load can maintain the domestic water stored in the water tank within a comfortable range. In addition to dissipating heat in the environment, hot water in the water tank may also dissipate heat by flowing in cold water due to domestic hot water flowing out. Thus, the water temperature dynamic equation in tank i can be defined as:
Figure BDA0002812267550000068
t∈[0,T]
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002812267550000069
represents the equivalent thermal resistance of the water tank i,
Figure BDA00028122675500000610
represents the equivalent heat capacity, T, of the tank i i (t) Water temperature in tank i, P at time t i rate Indicating rated power, Q, of electric water heater of water tank i i (t) represents the amount of heat taken away by domestic water at time t, Q i (t) is related to the water usage habits of the user, K i (t) is a control variable for controlling the on-off of the electric hot water tank i, and K i (t)∈{0,1}。
General formula
Figure BDA00028122675500000611
t∈[0,T]Converting into a discrete form to obtain a state transition equation as follows:
Figure BDA0002812267550000071
approximately, the comfortable temperature range of the electric hot water load can also be defined as [ T [ ] i L ,T i U ]The temperature of the electric heating water can be changed within the range of
Figure BDA0002812267550000072
And obtain the user's discomfort function, power consumption cost and reward function, which will not be described herein.
Step 102, establishing a heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads.
The flexible loads including heterogeneous types and heterogeneous parameters are aggregated to obtain an aggregation model of heterogeneous flexible loads, where the model includes a state variable, a state space, an action variable, an action space, and a state transfer function of the aggregated loads. For convenience of explanation, an aggregation model of heterogeneous flexible loads is established by taking N electric heating loads and L electric water heating loads containing heterogeneous parameters as an example. The subscripts of the electric heating load and the heterogeneous parameters thereof are respectively 1-N, and the subscript of the electric water heating load is N + 1-N + L.
Let the state variables of the aggregate load be:
Figure BDA0002812267550000073
the action variables are:
Figure BDA0002812267550000074
the state space of the aggregate load is:
Figure BDA0002812267550000075
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002812267550000076
the lower limit and the upper limit of the water temperature control range of the electric heating water load are respectively.
The action space is as follows:
A={0,1} N+L
since each load in the aggregate load has heterogeneous parameters, a single dynamic equation cannot be directly established. The state transition equations of the respective loads can be simultaneously established to obtain:
Figure BDA0002812267550000081
the above state transition equation can be simplified as:
s(t+1)=F transition (s(t),a(t),w(t))
where w (t) represents an environmental variable.
Setting the demand response regulation power required to be executed by the aggregate load at the time t to be delta P DR (t) the base line load of the aggregate load at time t is P base (t), aggregate power P at time t agg (t) is:
Figure BDA0002812267550000082
the aggregated load is in real-time market for revenue in response to system regulatory instructions. Assuming that the unit benefit of demand response at time t is μ (t), and that the full benefit is available only when the power of the actual response of the user is within a certain error range [1- ε,1+ ε ], ε is typically taken to be 20%, and the excess is not available. Thus, the total revenue of the aggregate load participating in the demand response is:
Figure BDA0002812267550000083
thus, the reward function at the time of the aggregate load t can be expressed as:
Figure BDA0002812267550000091
wherein R is agg (t) is a reward function of the aggregate load at time t, r DR (t) aggregate load participation total revenue for demand response at time t,
Figure BDA0002812267550000092
λ (t) (P) for total user discomfort agg (t)-P base (t)) Δ t is an electric power fee expenditure reduction amount.
At t 0 The purpose of carrying out real-time optimization regulation and control on the aggregation load at any moment is to find t 0 Optimal strategy of time of day such that from t 0 The cumulative expected reward of the user from time to T period is the greatest. Considering the uncertainty of the future period, it is necessary to multiply the gain occurring for the future period by the attenuation coefficient γ. Assuming that the initial state variable is known as s 0 Then the real-time regulatory optimization problem can be expressed as:
Figure BDA0002812267550000093
s.t:s(t 0 )=s 0 ,s(t)∈S,a(t)∈A
Figure BDA0002812267550000094
E s(t),a(t) are expected values in the feasible domain state space S and the action space a. The real-time regulation and control optimization problem is a mixed integer nonlinear programming problem, and the optimization variable is a (t) 0 )。
And 103, applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response.
It should be noted that after the aggregation model is obtained, aggregation wakeup needs to be put into the real-time regulation and control environment of the power system, interact with the real-time environment, and continuously evolve to obtain a return function of aggregation load participating in real-time response.
And step 104, establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization.
It should be noted that, as can be seen from the real-time regulation and optimization problem, the dimension considering the constraint condition reaches (N + L) · (T-T) 0 ) Obviously, when the number of the aggregation loads is large, the complexity of direct optimization solution is large, and the instantaneity requirement of real-time optimization regulation and control is difficult to meet. Therefore, a (t) is obtained by the following deep reinforcement learning method 0 ) An approximation of (d). From the steps 102 to 103, it can be known that the process of the aggregation load participating in the real-time regulation is a Markov decision process, that is, the individual is at t 0 The decision of the moment and the subsequent state are only compared with the current state s (t) 0 ) Regardless of historical information. Slave type
Figure BDA0002812267550000101
Quadruplets of Markov decision process can be obtained<s,a,r,s'>Namely:
the state variable S belongs to S;
an action variable a belongs to A;
state transfer function s' ═ F transition (s,a);
The return function R is R agg (s,a)。
Let pi be the policy of aggregated load, which means the probability of taking a possible action variable a for a state variable s in the markov decision process, and is expressed as pi (a | s), as shown in the following formula:
π(a|s)=Pr[a(t)=a|s(t)=s]
thus, the goal of deep reinforcement learning is to find the optimal strategy that maximizes the expected value of the cumulative reward function
Figure BDA0002812267550000102
Defining a Q network function of the aggregated load and expressing the Q network function through a neural network, defining a loss function of a learning Q network function, and initializing a prediction Q network and a target Q network. Initializing the state of the aggregation load and collecting samples to store in an experience replay pool. And performing off-line training on the predicted Q network by using batch sampling in the empirical playback pool and the value of the target Q network, updating the parameter of the predicted Q network by a gradient descent method, repeating the step and periodically updating the parameter of the target Q network until the iteration number reaches the maximum value.
And 105, inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
It should be noted that after the real-time optimal regulation and control decision model for flexible load aggregation is obtained, the state variable s of the target aggregated load is input into the real-time optimal regulation and control decision model for flexible load aggregation, and an optimal strategy for aggregated load real-time regulation and control is obtained, which is expressed as:
Figure BDA0002812267550000103
where Q (s, a | θ) is a Q network function.
According to the heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning, firstly, single flexible load models are respectively established for heterogeneous flexible loads of different types, then a polymerization load model is established for a plurality of heterogeneous flexible loads of different parameters, so that a Markov decision process when the heterogeneous flexible loads participate in demand response is obtained, a decision function of a polymer is trained through a machine learning framework of the deep reinforcement learning based on historical data, a real-time optimization regulation and control decision model of the heterogeneous flexible load polymer is obtained, an optimal strategy of real-time regulation and control of the polymerization load is obtained, and the flexible load response capability of a user side is improved. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.
Specifically, the aggregation load real-time regulation and control deep reinforcement learning model is trained by adopting a deep Q value network algorithm. The deep Q value network algorithm introduces two neural network functions to search for the optimal strategy, namely a value function and a Q network function. Wherein, the cost function represents the accumulated return expectation value obtained by the individual in the state s by adopting the strategy pi, and is represented as:
Figure BDA0002812267550000111
the Q network function represents the accumulated return expectation value obtained by selecting the action variable a under the state variable s and then continuously adopting the strategy pi, and is represented as follows:
Figure BDA0002812267550000112
at the optimal strategy
Figure BDA0002812267550000113
Next, for any other strategy π, given an arbitrary state variable s, the individual's cost function should be satisfied
Figure BDA0002812267550000114
Based on the Bellman optimal equation, the optimal strategy adopted can be obtained
Figure BDA0002812267550000115
In the case of (2), the relationship between the cost function and the Q-network function is expressed as:
Figure BDA0002812267550000116
that is, the Q network function can be decomposed into two parts, namely a reward function in the current state and a cost function in the next state multiplied by an attenuation coefficient.
And the value function under the optimal strategy satisfies the following conditions:
Figure BDA0002812267550000117
will be provided with
Figure BDA0002812267550000118
Substitution into
Figure BDA0002812267550000119
In (3), it can be obtained that the Q network function satisfies
Figure BDA00028122675500001110
Can be combined with
Figure BDA00028122675500001111
The method is applied to the training of the neural network.
Figure BDA00028122675500001112
The left side is regarded as the predicted value Q of the Q network function, and the right side is regarded as the target value Q' of the Q network function.
And carrying out neural network parameterization representation on the Q network function. The mapping from input (s, a) to Q is first represented by having a typical fully connected neural network, as shown in fig. 1. Where the inputs are s and a, the output is Q, and the weighting factor is represented by θ. The purpose of the deep reinforcement learning is to make the predicted value Q of Q approach the target value Q' as much as possible by training the weight coefficient θ.
If it is paired with
Figure BDA0002812267550000121
The Q network functions on both sides are trained by adopting the same parameters, so that the dependency of the two is too strong, and the algorithm convergence is not facilitated. Therefore, it is necessary to put both sidesThe Q network function is represented by two neural networks Q and Q ', which are respectively called a prediction Q network and a target Q network, the structures of the two networks are completely consistent, and the corresponding weight coefficients are theta and theta', respectively.
Deep reinforcement learning requires training neural network parameters using owned data so that the output of the neural network approaches a target value as closely as possible. Let the current data have m samples(s) j ,a j ,s′ j ,r j ) And j is 1,2, …, m, the mean square error loss function of the neural network can be expressed as:
Figure BDA0002812267550000122
wherein, y j Representing the target value of the Q network function, the expression is:
Figure BDA0002812267550000123
as shown in fig. 2, the process of training the aggregation load real-time regulation deep reinforcement learning model is as follows:
(1) the neural network functions Q and Q' are initialized. The iteration round number is set to be EP, the learning rate is alpha, the exploration rate is epsilon, and the maximum size of the experience playback pool is M. Iterative training then begins.
(2) And sampling to obtain an experience playback pool.
Data samples for training the neural network can be obtained through offline sampling, and the collected samples are stored in an experience playback pool. Firstly, randomly initializing the state variable of the aggregated load to obtain s-s 1
Then adopting greedy strategy (epsilon-greedy) to obtain a as a 1
The epsilon-greedy strategy is as follows:
Figure BDA0002812267550000124
wherein the exploration rate epsilon is a constant between 0 and 1, and delta is a random sampling value between 0 and 1. The method is adopted to explore more action spaces as much as possible and avoid falling into a local optimization solution.
Then will s 1 And a 1 The transfer function and the return function are brought in to obtain the next state value s' 1 And r 1 Obtaining a sample quadruple(s) 1 ,a 1 ,s′ 1 ,r 1 )。
Let s 2 =s′ 1 . Repeating the above steps to obtain(s) 2 ,a 2 ,s′ 2 ,r 2 )、…、(s M ,a M ,s′ M ,r M ). M is the maximum number of samples of the empirical playback pool. Where the initial state variables need to be reset each time t reaches a maximum value.
(3) Randomly batch-extracting n samples from experience playback pool, substituting
Figure BDA0002812267550000131
The corresponding Loss function Loss (θ) is calculated.
(4) The parameter theta of the Q-network function is updated. Updating theta in a gradient descent mode:
Figure BDA0002812267550000132
(5) continue to generate new samples(s) j ,a j ,s′ j ,r j ) J is M +1, M + 2. And replaces the old sample in the empirical playback pool with the new sample. Repeating the steps (3) and (4) every time n new data samples are put in.
(6) The parameter θ 'of the target Q network function Q' is updated. Updating the target Q network function Q' once, namely:
θ′←θ
(7) checking whether the state s is a final state, and if so, emptying the experience playback pool and jumping to the step (2) to restart.
(8) And (5) repeating the steps (2) to (7) until the number of iteration rounds reaches EP.
After neural network training is completed, test set data may be generated to verify the validity of the strategy.
In any state, given the state s of the aggregate load, the optimal decision for real-time regulation is obtained as follows.
Figure BDA0002812267550000133
And testing and recording the result of the optimized regulation and control of the polymerization load.
The application also provides an embodiment of the heterogeneous flexible load real-time regulation and control device based on deep reinforcement learning, which comprises the following steps:
the single flexible load modeling module is used for respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
the aggregation load modeling module is used for establishing an heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of the aggregation loads;
the application module is used for applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
the deep reinforcement learning module is used for establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and the strategy output module is used for inputting the state variable of the target aggregated load into the real-time optimization regulation and control decision model for flexible load aggregation to obtain the optimal strategy for real-time regulation and control of the aggregated load.
According to the heterogeneous flexible load real-time regulation and control device based on deep reinforcement learning, firstly, single flexible load models are respectively established for heterogeneous flexible loads of different types, then a polymerization load model is established for a plurality of flexible loads with different parameters and different structures, so that a Markov decision process when the heterogeneous flexible loads participate in demand response is obtained, a decision function of a polymer is trained through a machine learning framework of the deep reinforcement learning based on historical data, a real-time optimization regulation and control decision model of the heterogeneous flexible load polymer is obtained, an optimal strategy of real-time regulation and control of the polymerization load is obtained, and the flexible load response capability of a user side is improved. The technical problems that the response capability of the flexible load on the user side is low and the demand response potential on the user side is difficult to excite in the existing load regulation and control mode are solved.
Further, the single flexible load dynamic model includes a load temperature control dynamic function, a user discomfort function, and a reward function.
Further, the method also comprises the following steps:
and the model testing module is used for testing the real-time optimization regulation and control decision model of flexible load aggregation.
It should be noted that the device provided in the embodiment of the present application is a virtual device embodiment corresponding to the foregoing heterogeneous flexible load real-time regulation and control method embodiment based on deep reinforcement learning, and the embodiment of the present application can achieve the same technical effects as the foregoing heterogeneous flexible load real-time regulation and control method embodiment based on deep reinforcement learning, and is not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (9)

1. A heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning is characterized by comprising the following steps:
respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, establishing a heterogeneous flexible load aggregation model, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of aggregated loads; wherein the heterogeneous flexible load aggregation model is as follows:
s(t+1)=F transition (s(t),a(t),w(t))
Figure FDA0003750916910000011
Figure FDA0003750916910000012
wherein s (t +1) is a state variable of the aggregated load at the time t +1, F transition (s (t), a (t), w (t)) is a state transfer function of the aggregated load at time t, s (t) is a state variable of the aggregated load at time t, a (t) is an action variable of the aggregated load at time t, w (t) is an environment variable at time t, R agg (t) is a reward function of the aggregate load at time t, r DR (t) Total revenue of aggregated load participation in demand response at time t, λ (t) (P) agg (t)-P base (t)) Δ t is the reduction amount of electricity charge expenditure, f i unc For user non-fitness, T i L A comfort temperature lower limit for the user; t is a unit of i U Is the upper comfortable temperature limit, T, of the user i min Is the lower limit of the temperature variation, T, of the device i i max Is the upper limit of temperature variation of the equipment i, N is the number of electric heating equipment, L is the number of electric water heating equipment, T i (t) is the indoor temperature,
Figure FDA0003750916910000013
is the indoor temperature T i (T) exceeding the user's comfort upper temperature limit T i U Is determined by the non-suitability penalty factor of (c),
Figure FDA0003750916910000014
is the indoor temperature T i (T) is below the user's comfort temperature lower limit T i L A discomfort penalty factor of;
applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a flexible load polymerization real-time optimization regulation and control decision model;
and inputting the state variable of the target aggregated load into a real-time optimization regulation and control decision model for flexible load aggregation to obtain an optimal strategy for real-time regulation and control of the aggregated load.
2. The method for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning according to claim 1, wherein the single flexible load dynamic model comprises a load temperature control dynamic function, a user discomfort function and a reward function.
3. The heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning of claim 1, wherein the aggregate load real-time regulation and control deep reinforcement learning model is trained by adopting a deep Q value network algorithm.
4. The method for regulating and controlling the heterogeneous flexible load based on the deep reinforcement learning according to claim 3, wherein the loss function of the deep reinforcement learning model is as follows:
Figure FDA0003750916910000021
wherein, y j Is the target value of the Q network function, m is the number of samples, theta is the weight coefficient of the Q network function, s j Is the state variable of the jth sample, a j Is the action variable of the jth sample.
5. The heterogeneous flexible load real-time regulation and control method based on deep reinforcement learning of claim 4, wherein the training of the aggregated load real-time regulation and control deep reinforcement learning model to obtain a flexible load aggregated real-time optimization regulation and control decision model comprises:
initializing a predicted Q network function and a target Q network function, setting the number of iteration rounds as EP, the learning rate as alpha, the exploration rate as epsilon and the maximum size of an experience playback pool as M;
collecting training samples, and storing the training samples in the experience playback pool;
extracting n samples from the experience playback pool in random batch, and calculating a loss function of the Q network function;
updating the weight coefficient theta of the Q network function by adopting a gradient descent method;
continuously generating new samples, replacing the old samples in the experience playback pool with the new samples, and calculating a loss function and a weight coefficient theta of the Q network function;
updating a weight coefficient theta' of the target Q network function;
checking whether a state variable s of the aggregation load is in a final state, if so, emptying the experience playback pool, sampling again, and putting a sampling sample into the experience playback pool;
and judging whether the iteration round number reaches EP, if so, finishing the training, and otherwise, continuing the iteration.
6. The method for regulating and controlling the heterogeneous flexible load based on the deep reinforcement learning in real time as claimed in claim 5, further comprising:
and testing the real-time optimization regulation and control decision model of flexible load aggregation.
7. The utility model provides a heterogeneous flexible load real-time regulation and control device based on deep reinforcement study which characterized in that includes:
the single flexible load modeling module is used for respectively establishing a single flexible load dynamic model for different types of heterogeneous flexible loads of the power system to obtain a state variable, an action variable, an environment variable and a return function of the single flexible load;
the aggregation load modeling module is used for establishing a heterogeneous flexible load aggregation model according to the state variables, the action state variables, the environment variables and the return functions of all the single flexible loads, wherein the heterogeneous flexible load aggregation model comprises the state variables, the state spaces, the action variables, the action spaces and the state transfer functions of the aggregation loads; wherein the heterogeneous flexible load aggregation model is as follows:
s(t+1)=F transition (s(t),a(t),w(t))
Figure FDA0003750916910000031
Figure FDA0003750916910000032
wherein s (t +1) is a state variable of the aggregated load at the time of t +1, F transition (s (t), a (t), w (t)) is a state transfer function of the aggregated load at time t, s (t) is a state variable of the aggregated load at time t, a (t) is an action variable of the aggregated load at time t, w (t) is an environment variable at time t, R agg (t) is a reward function of the aggregate load at time t, r DR (t) Total revenue of aggregated load participation in demand response at time t, λ (t) (P) agg (t)-P base (t)) Δ t is an electric power charge reduction amount, f i unc For user non-fitness, T i L A comfort temperature lower limit for the user; t is i U Is the upper comfortable temperature limit, T, of the user i min Is the lower limit of the temperature variation, T, of the device i i max The upper limit of the temperature variation of the equipment i, N is the number of electric heating equipment, L is the number of electric water heating equipment, and T is the number of electric water heating equipment i (t) is the indoor temperature,
Figure FDA0003750916910000033
is the indoor temperature T i (T) exceeding an upper comfort temperature limit T for the user i U Is determined by the non-suitability penalty factor of (c),
Figure FDA0003750916910000034
is the indoor temperature T i (T) is below the user's comfort temperature lower limit T i L A discomfort penalty factor of;
the application module is used for applying the aggregation model to a real-time regulation and control environment of the power system to obtain a return function of aggregation load participating in real-time response;
the deep reinforcement learning module is used for establishing a polymerization load real-time regulation and control deep reinforcement learning model, and training the polymerization load real-time regulation and control deep reinforcement learning model according to a state variable, an action variable, a state transfer function and a return function participating in real-time response of the polymerization load to obtain a real-time optimization regulation and control decision model for flexible load polymerization;
and the strategy output module is used for inputting the state variable of the target aggregated load into the real-time optimization regulation and control decision model for flexible load aggregation to obtain the optimal strategy for real-time regulation and control of the aggregated load.
8. The device for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning of claim 7, wherein the single flexible load dynamic model comprises a load temperature control dynamic function, a user discomfort function and a reward function.
9. The device for real-time regulation and control of heterogeneous flexible loads based on deep reinforcement learning according to claim 7, further comprising:
and the model testing module is used for testing the real-time optimization regulation and control decision model of flexible load aggregation.
CN202011389959.0A 2020-12-02 2020-12-02 Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning Active CN112488531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011389959.0A CN112488531B (en) 2020-12-02 2020-12-02 Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011389959.0A CN112488531B (en) 2020-12-02 2020-12-02 Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112488531A CN112488531A (en) 2021-03-12
CN112488531B true CN112488531B (en) 2022-09-06

Family

ID=74939630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011389959.0A Active CN112488531B (en) 2020-12-02 2020-12-02 Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112488531B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723798B (en) * 2021-08-27 2022-11-11 广东电网有限责任公司 Demand response control method and system based on online deep reinforcement learning
CN115549109A (en) * 2022-09-15 2022-12-30 清华大学 Mass flexible load rapid aggregation control method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107453356A (en) * 2017-08-21 2017-12-08 南京邮电大学 User side flexible load dispatching method based on adaptive Dynamic Programming
CN107800157A (en) * 2017-11-14 2018-03-13 武汉大学 The virtual power plant dual-layer optimization dispatching method of the temperature control load containing polymerization and new energy
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN110705737A (en) * 2019-08-09 2020-01-17 四川大学 Comprehensive optimization configuration method for multiple energy storage capacities of multi-energy microgrid
WO2020037127A1 (en) * 2018-08-17 2020-02-20 Dauntless.Io, Inc. Systems and methods for modeling and controlling physical dynamical systems using artificial intelligence
CN111709672A (en) * 2020-07-20 2020-09-25 国网黑龙江省电力有限公司 Virtual power plant economic dispatching method based on scene and deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3398116A1 (en) * 2015-12-31 2018-11-07 Vito NV Methods, controllers and systems for the control of distribution systems using a neural network architecture

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107453356A (en) * 2017-08-21 2017-12-08 南京邮电大学 User side flexible load dispatching method based on adaptive Dynamic Programming
CN107800157A (en) * 2017-11-14 2018-03-13 武汉大学 The virtual power plant dual-layer optimization dispatching method of the temperature control load containing polymerization and new energy
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
WO2020037127A1 (en) * 2018-08-17 2020-02-20 Dauntless.Io, Inc. Systems and methods for modeling and controlling physical dynamical systems using artificial intelligence
CN110705737A (en) * 2019-08-09 2020-01-17 四川大学 Comprehensive optimization configuration method for multiple energy storage capacities of multi-energy microgrid
CN111709672A (en) * 2020-07-20 2020-09-25 国网黑龙江省电力有限公司 Virtual power plant economic dispatching method based on scene and deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的分布式电采暖参与需求响应优化调度;严干贵等;《电网技术》;20201105;第44卷(第11期);第4140-4147页 *

Also Published As

Publication number Publication date
CN112488531A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
Wei et al. A bi-level scheduling model for virtual power plants with aggregated thermostatically controlled loads and renewable energy
Pinto et al. Data-driven district energy management with surrogate models and deep reinforcement learning
Pinto et al. Coordinated energy management for a cluster of buildings through deep reinforcement learning
Javaid et al. Towards buildings energy management: using seasonal schedules under time of use pricing tariff via deep neuro-fuzzy optimizer
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN111695793B (en) Method and system for evaluating energy utilization flexibility of comprehensive energy system
Kou et al. Model-based and data-driven HVAC control strategies for residential demand response
CN112488531B (en) Heterogeneous flexible load real-time regulation and control method and device based on deep reinforcement learning
CN113112077B (en) HVAC control system based on multi-step prediction deep reinforcement learning algorithm
CN112036934A (en) Quotation method for participation of load aggregators in demand response considering thermoelectric coordinated operation
CN114623569B (en) Cluster air conditioner load differential regulation and control method based on deep reinforcement learning
Kong et al. Refined peak shaving potential assessment and differentiated decision-making method for user load in virtual power plants
CN104036328A (en) Self-adaptive wind power prediction system and prediction method
Bian et al. Demand response model identification and behavior forecast with OptNet: A gradient-based approach
Feng et al. Economic dispatch of industrial park considering uncertainty of renewable energy based on a deep reinforcement learning approach
Zhang et al. Networked Multiagent-Based Safe Reinforcement Learning for Low-Carbon Demand Management in Distribution Networks
Wu et al. Virtual-real interaction control of hybrid load system for low-carbon energy services
CN113591391A (en) Power load control device, control method, terminal, medium and application
Amadeh et al. Building cluster demand flexibility: An innovative characterization framework and applications at the planning and operational levels
Lénet et al. An inverse nash mean field game-based strategy for the decentralized control of thermostatic loads
Coffman et al. A model-free method for learning flexibility capacity of loads providing grid support
Pandiyan et al. Recursive training based physics-inspired neural network for electric water heater modeling
Liu et al. A Real-time Demand Response Strategy of Home Energy Management by Using Distributed Deep Reinforcement Learning
CN112052989B (en) Risk cost allocation method for comprehensive energy resource sharing community
Amasyali et al. Deep Reinforcement Learning for Autonomous Water Heater Control. Buildings 2021, 11, 548

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant