CN113112077B - HVAC control system based on multi-step prediction deep reinforcement learning algorithm - Google Patents

HVAC control system based on multi-step prediction deep reinforcement learning algorithm Download PDF

Info

Publication number
CN113112077B
CN113112077B CN202110403130.XA CN202110403130A CN113112077B CN 113112077 B CN113112077 B CN 113112077B CN 202110403130 A CN202110403130 A CN 202110403130A CN 113112077 B CN113112077 B CN 113112077B
Authority
CN
China
Prior art keywords
neural network
value
output
current
environment temperature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110403130.XA
Other languages
Chinese (zh)
Other versions
CN113112077A (en
Inventor
任密蜂
刘祥飞
杨之乐
张建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202110403130.XA priority Critical patent/CN113112077B/en
Publication of CN113112077A publication Critical patent/CN113112077A/en
Application granted granted Critical
Publication of CN113112077B publication Critical patent/CN113112077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector

Abstract

The invention relates to an intelligent control method of a control system of temperature, humidity, Air cleanliness and Air circulation (HVAC), in particular to an HVAC control system based on a Long Short-term Memory neural network (LSTM) and a Deep Reinforcement Learning (DRL) algorithm of a generalized mutual entropy (GC) loss function. The method comprises the following steps: the method comprises the steps of collecting outdoor environment temperature, indoor environment temperature and power grid electricity price information, preprocessing the collected data, predicting future multistep outdoor environment temperature by using outdoor environment temperature historical data, and controlling power output of the HVAC system by utilizing a Deep Deterministic strategy (DDPG) algorithm of a DRL (digital refrigerant phase) based on future outdoor temperature value, indoor environment temperature and power grid electricity price information. The invention can intelligently control the HVAC system in real time to reduce the cost of users and ensure the satisfaction degree of the users, and has higher practical engineering application value.

Description

HVAC control system based on multi-step prediction deep reinforcement learning algorithm
Technical Field
The invention relates to a method for intelligently and optimally controlling an HVAC system, in particular to a research method for intelligently controlling the HVAC system based on a GC-LSTM neural network and a DRL algorithm.
Background
The household users are used as terminal users of the power grid, and the electricity utilization habits of the users and the addition of the distributed renewable energy sources directly cause the appearance of wave crests and wave troughs of the power grid; which can cause severe impact and serious threat to the power grid. With the development of the smart power grid and the implementation of a demand response strategy in recent years, the passive mode of a resident user is changed into the active mode to be added into the power grid; under the environment of the smart power grid, the electricity price information and the generating capacity information of the power grid are communicated with the demand information of the user in a two-way mode. In the family user, the power consumption of the air conditioning system accounts for about 35% of the power consumption of the whole user, so that on the premise of meeting certain comfort of the user, the output power of the HVAC system is intelligently controlled according to the power price of a power grid and the temperature information of the environment, and the method has important significance for reducing the use of the power, reducing the user cost and reducing the greenhouse effect.
At present, the HVAC system mainly adopts a traditional control mode closed-loop control and model predictive control algorithm, a temperature sensor is arranged in the closed-loop control system, when the indoor temperature is detected to reach a set value, the HVAC system stops working, the HVAC system based on the closed-loop control mode is simple to operate and easy to realize, but under the environment of an intelligent power grid and a corresponding strategy of demand, the power is difficult to be converted according to dynamic electricity prices so as to reach the standards of energy conservation and emission reduction; the model predictive control algorithm controls the HVAC system by establishing an accurate model of the indoor temperature variation, however, the complexity of the indoor ambient temperature variation affects the accuracy of the modeling. With the development of an intelligent algorithm, researchers also propose to optimize and control the HVAC system by using a particle swarm optimization algorithm and a genetic algorithm, the algorithm optimizes and controls the power output of the HVAC system under a real-time electricity price mechanism to reduce the cost of users, the algorithm has the characteristic of difficult tuning, the problem that the power output of the HVAC system has delay on the change of indoor temperature is not considered, and the comfort degree of the users is not really guaranteed. It is therefore necessary to predict the future outdoor ambient temperature values first.
Disclosure of Invention
The invention provides a method for controlling an HVAC control system based on a multi-step prediction deep reinforcement learning algorithm, aiming at the nonlinearity and randomness of outdoor environment temperature and intelligent power grid electricity price and the time delay of the HVAC system output power to the indoor environment temperature change.
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is realized by adopting the following technical scheme, the model structure of the HVAC control system is shown as figure 1, and the HVAC control system comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting the time series data into the data of a supervision sequence;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by the forgetting gate through a sigmoid functiont-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfThe sigmoid function is expressed by sigma and is the weight and the offset value of the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh function
Figure GDA0003608820890000021
Candidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
Figure GDA0003608820890000022
wherein wi,wc,biAnd bcInputting the parameter values of the gate neural network;
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct) Wherein w iso,boParameters of the output gate neural network;
4) calculating a true value Y based on a GC loss functiontAnd the predicted value htThe error between, as in the following equation:
Figure GDA0003608820890000023
Gα,β(0) is a zero-mean generalized gaussian density function,
Figure GDA0003608820890000024
for sample estimation of a predicted value and a true value, N is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, multiple iterative training is carried out, and the weight w and the offset value b of the neural network are updated by a minimum batch gradient descent method, so that the error between the true value and the predicted value is minimum;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Based on a long-short term memory neural network based on a generalized mutual entropy loss function, obtaining the outdoor environment temperature h ═ h at n continuous moments in the futurei+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inH, rho are calculated according to the related informationtAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy mu (S)tμ) And Gaussian noise
Figure GDA0003608820890000025
To select an action
Figure GDA0003608820890000026
at∈[Pmin,Pmax]Gaussian noise (Gaussian)
Figure GDA0003608820890000027
Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:
Figure GDA0003608820890000028
then obtain a timely reward rtAnd reaches the next state St+1
Figure GDA0003608820890000029
ηHVACAnd A is the inertia coefficient, thermal conversion efficiency and overall thermal conductivity of the HVAC respectively;
step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data quantity of the experience pool buff-C is larger than the sampling number M, M samples are randomly taken from the experience pool buff-N (S)i,ai,ri,Si+1),i=1,2,…,M,riRewarding the sample i, and performing the following steps; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1μ')|θQ') Where μ' (S)i+1μ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1μ')|θQ') A target network Q' that is Ctric is a future target value that is output based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Parameters of a target neural network of Actor and parameters of a target network of Ctric are respectively;
step seven: critic current neural network Q pair action a taken based on DDPG algorithmtPerforming evaluation to calculate an evaluation value of θQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square error
Figure GDA0003608820890000031
And updating the parameter theta of the Critic current neural network by using a minimum batch gradient descent methodQ
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe loss gradient J is given by the following equation:
Figure GDA0003608820890000032
step ten: the parameters of the current neural networks of Ctric and Actor are soft-copied to the parameters of the target neural networks of Ctric and Actor by a proportionality coefficient tau respectively, that is:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven:regarding the state at the next time as the state at the current time, that is: st←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
Drawings
FIG. 1 is a schematic diagram of the establishment of an HVAC intelligent control system.
Fig. 2 is a graph of loss functions of the outdoor environment temperature training set and the test set in the debugging stage, where 1 represents a loss function curve of the outdoor environment temperature training set, and 2 represents a loss function curve of the outdoor environment temperature test set.
Fig. 3 is a graph showing a real value and a predicted value of the outdoor environment temperature test set at the debugging stage, where 3 represents the predicted value of the outdoor environment temperature test set, and 4 represents the real value of the outdoor environment temperature test set.
Detailed Description
The method takes the collected real environment temperature data as an experimental object to train and test the HVAC control system based on the multi-step prediction deep reinforcement learning algorithm
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, selecting the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]As input to the model, h ═ hi+1,…,hi+n]The sampling interval, as a real output of the model, is every 30 minutes.
Step two: preprocessing the acquired data, correcting abnormal data, converting the data of the time sequence into data of a supervision sequence, and dividing the data into 2500 training sets and 1000 testing sets.
Step three: setting the number of cells of the long-short term memory neural network as 100, the training times as 500, the learning rate as 0.001 and the batch of the minimum batch gradient descent method as 32;
step four: inputting the input quantity of the training set into a long-short term memory neural network based on a generalized mutual entropy loss function, and performing forgetting, memorizing and learning on the input quantity by using a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression process of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by the forgetting gate through a sigmoid functiont-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfThe sigmoid function is expressed by sigma and is the weight and the offset value of the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh function
Figure GDA0003608820890000041
Candidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
Figure GDA0003608820890000042
wherein wi,wc,biAnd bcInputting the parameter values of the gate neural network;
3) the output gate obtains the output of the model, and first obtains an initial output o through a sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct) Wherein w iso,boParameters of the output gate neural network;
4) calculating true value Y based on GC loss functiontAnd the predicted value htThe error between, as in the following equation:
Figure GDA0003608820890000043
Gα,β(0) is a zero-mean generalized gaussian density function,
Figure GDA0003608820890000044
for sample estimation of a predicted value and a true value, N is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, multiple iterative training is carried out, and the weight w and the offset value b of the neural network are updated by a minimum batch gradient descent method, so that the error between the true value and the predicted value is minimum;
step five: finally, based on a long-short term memory neural network of the generalized mutual entropy loss function, obtaining a nonlinear mapping model from the outdoor environment temperature at the first moment i to the outdoor environment temperature at the future moment n, and using the accuracy of the test set test model;
step six: testing the accuracy of the model using the test set, using the root mean square error between the true and predicted values, the probability density distribution of the error, and R2As evaluation indexes of the model, they are defined as:
Figure GDA0003608820890000045
Figure GDA0003608820890000051
Figure GDA0003608820890000052
in the formula yi,hiFor the corresponding real and predicted values of each step,
Figure GDA0003608820890000053
the mean value of real samples in each step, m is the number of samples in the test set, k (-) is a Gaussian kernel function,
Figure GDA0003608820890000054
the probability density function of the error will be implemented in a sliding window approach.
The real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: acquiring the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]Based on GC-LSTM neural network model, obtaining the outdoor environment temperature h ═ 3 moments in future continuouslyi+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inAnd dividing the data into a training set 2500 and a test set 1000 according to the relevant information. H, rhotAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: setting a DDPG algorithm of deep reinforcement learning as four neural networks, wherein a current neural network of an Actor and a target neural network of the Actor have three layers of neural networks with the same structure, a hidden layer activation function is tanh, a current neural network of Critic and a target neural network of Critic have the same neural network structure, and the hidden layer activation function is relu;
step three: current state information S in training settThe current neural network input to the Actor based on the current strategy and Gaussian noise
Figure GDA0003608820890000055
To select an action
Figure GDA0003608820890000056
at∈[Pmin,Pmax],PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step four: performing action atControlling the output power of the air conditioner and then obtaining a timely reward rtTo the next state St+1R is a prizetWill be related to the comfort of the user, as follows:
Figure GDA0003608820890000057
Tminand TmaxMinimum and maximum comfort temperature, respectively, lambda1And λ2A weighting factor for balancing the awards;
step five: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step six: then randomly take M samples (S) from the experience pool buff-Ni,ai,ri,Si+1),i=1,2,…,M;
Step seven: calculating the expectation y of the target based on the status of the next time and the action obtained by the target network of the Actori=ri+γQ'(Si+1,μ'(Si+1μ')|θQ');
Step eight: action a taken by Critic current neural network Q pair of DDPG algorithmtEvaluation is performed to calculate an evaluation value Q (S)t,aiQ);
Step nine: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square error
Figure GDA0003608820890000061
Updating the parameters of the Critic current neural network by using a minimum batch gradient descent method;
step ten: updating Actor current neural network parameter theta by using sample strategy gradientμThe loss gradient J is given by the following equation:
Figure GDA0003608820890000062
step eleven: respectively soft-copying the parameters of the current neural networks of Critic and Actor to the target neural network parameters of Ctric and Actor by using a proportionality coefficient tau;
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step twelve: obtaining a convergent Actor current neural network through training of a training set, and outputting a parameter theta of the neural networkμUsing the reward value obtained by each iteration training and the error value L of each step as a judgment index of network convergence;
step thirteen: current state information S of test settInputting to the Actor current neural network of the DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system, and using the power consumption cost of the HVAC system and the comfort of the user as the performance indexes of the system.
The invention has the advantages that: the long-short term memory neural network is used for predicting the temperature of the future outdoor environment, the comfort level of a user is improved, and the generalized mutual entropy loss function is used as the loss function of the long-short term memory neural network to improve the accuracy of prediction; then based on the DDPG algorithm, according to the electricity price change of the power grid, the change of the indoor temperature and the change of the future outdoor temperature, the power output of the HVAC system is intelligently adjusted, and the power consumption cost of a user is saved under the condition of ensuring the comfort degree of the user.
The above description is only an example of the present invention, but the structural features of the present invention are not limited thereto, and any changes or modifications within the scope of the present invention by those skilled in the art are covered by the present invention.

Claims (1)

1. The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is characterized in that: the method comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting time series data into supervision series data;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Tt]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by the forgetting gate through a sigmoid functiont-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfThe sigmoid function is expressed by sigma and is the weight and the offset value of the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bi) Second generating new candidate values by the tanh function
Figure FDA0003608820880000011
Candidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
Figure FDA0003608820880000012
wherein wi,wc,biAnd bcInputting the parameter values of the gate neural network;
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct) Wherein w iso,boParameters of the output gate neural network;
4) calculating true value Y based on generalized mutual entropy loss functiontAnd the predicted value htThe error between, as shown in the following equation:
Figure FDA0003608820880000013
Gα,β(0) is a zero-mean generalized gaussian density function,
Figure FDA0003608820880000014
for sample estimation of a predicted value and a real value, N is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, multiple iterative training is carried out, and the weight w and the offset value b of the neural network are updated by a minimum batch gradient descent method, so that the error between the real value and the predicted value is minimized;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Based on the long-short term memory neural network of the generalized mutual entropy loss function, the outdoor environment temperature h ═ h at n continuous moments in the future is obtainedi+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inH, rho are compared with relevant informationtAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy mu (S)tμ) And Gaussian noise
Figure FDA0003608820880000021
To select an action
Figure FDA0003608820880000022
at∈[Pmin,Pmax]Gaussian noise (Gaussian)
Figure FDA0003608820880000023
Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:
Figure FDA0003608820880000024
then obtain a timely reward rtAnd reaches the next state St+1
Figure FDA0003608820880000025
ηHVACA is the inertia coefficient, the heat conversion efficiency and the overall heat conductivity of the HVAC respectively;
step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data quantity of the experience pool buff-C is larger than the sampling number M, M samples are randomly taken from the experience pool buff-N (S)i,ai,ri,Si+1),i=1,2,…,M,riRewarding the sample i, and performing the following steps; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1μ')|θQ') Where μ' (S)i+1μ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1μ')|θQ') A target network Q' which is Critic outputs a future target value based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Respectively a parameter of a target neural network of Actor and a parameter of a target network of Critic;
step seven: critic current neural network Q pair action a taken based on DDPG algorithmtEvaluation is performed to calculate an evaluation value Q (S)t,atQ) Wherein thetaQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square error
Figure FDA0003608820880000026
And updating parameter theta of Critic current neural network by using minimum batch gradient descent methodQ
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe loss gradient J is given by the following equation:
Figure FDA0003608820880000027
step ten: the parameters of the current neural networks of Critic and Actor are respectively copied to the parameters of the target neural networks of Critic and Actor in a soft mode by a proportionality coefficient tau, namely:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven: regarding the state at the next time as the state at the current time, that is: s. thet←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
CN202110403130.XA 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm Active CN113112077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110403130.XA CN113112077B (en) 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110403130.XA CN113112077B (en) 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm

Publications (2)

Publication Number Publication Date
CN113112077A CN113112077A (en) 2021-07-13
CN113112077B true CN113112077B (en) 2022-06-10

Family

ID=76716975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110403130.XA Active CN113112077B (en) 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm

Country Status (1)

Country Link
CN (1) CN113112077B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113485498B (en) * 2021-07-19 2022-10-18 北京工业大学 Indoor environment comfort level adjusting method and system based on deep learning
CN113741449B (en) * 2021-08-30 2023-07-14 南京信息工程大学 Multi-agent control method for sea-air collaborative observation task
CN113940218B (en) * 2021-09-30 2022-09-16 上海易航海芯农业科技有限公司 Intelligent heat supply method and system for greenhouse
CN113659246B (en) * 2021-10-20 2022-01-25 中国气象科学研究院 Battery system suitable for polar region ultralow temperature environment and temperature control method thereof
CN114488811B (en) * 2022-01-25 2023-08-29 同济大学 Greenhouse environment energy-saving control method based on second-order Woltai model prediction
TWI795283B (en) * 2022-05-04 2023-03-01 台灣松下電器股份有限公司 Control method of air conditioning system
CN115412923B (en) * 2022-10-28 2023-02-03 河北省科学院应用数学研究所 Multi-source sensor data credible fusion method, system, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458443A (en) * 2019-08-07 2019-11-15 南京邮电大学 A kind of wisdom home energy management method and system based on deeply study
CN111080002A (en) * 2019-12-10 2020-04-28 华南理工大学 Deep learning-based multi-step prediction method and system for building electrical load
CN111365828A (en) * 2020-03-06 2020-07-03 上海外高桥万国数据科技发展有限公司 Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning
CN112460741A (en) * 2020-11-23 2021-03-09 香港中文大学(深圳) Control method of building heating, ventilation and air conditioning system
CN112561728A (en) * 2020-10-28 2021-03-26 西安交通大学 Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102353119B (en) * 2011-08-09 2013-04-24 北京建筑工程学院 Control method of VAV (variable air volume) air-conditioning system
JP6553933B2 (en) * 2015-04-24 2019-07-31 京セラ株式会社 Power control method, power control apparatus, and power control system
CN105805822B (en) * 2016-03-24 2018-11-13 常州英集动力科技有限公司 Heating energy-saving control method based on neural network prediction and system
CN105870483B (en) * 2016-03-31 2019-01-11 华中科技大学 Solid oxide fuel battery system power tracking process thermoelectricity cooperative control method
US10997491B2 (en) * 2017-10-04 2021-05-04 Huawei Technologies Co., Ltd. Method of prediction of a state of an object in the environment using an action model of a neural network
JP2019200040A (en) * 2018-05-18 2019-11-21 ジョンソン コントロールズ テクノロジー カンパニーJohnson Controls Technology Company Hvac control system with model driven deep learning
US11966840B2 (en) * 2019-08-15 2024-04-23 Noodle Analytics, Inc. Deep probabilistic decision machines

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458443A (en) * 2019-08-07 2019-11-15 南京邮电大学 A kind of wisdom home energy management method and system based on deeply study
CN111080002A (en) * 2019-12-10 2020-04-28 华南理工大学 Deep learning-based multi-step prediction method and system for building electrical load
CN111365828A (en) * 2020-03-06 2020-07-03 上海外高桥万国数据科技发展有限公司 Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning
CN112561728A (en) * 2020-10-28 2021-03-26 西安交通大学 Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment
CN112460741A (en) * 2020-11-23 2021-03-09 香港中文大学(深圳) Control method of building heating, ventilation and air conditioning system

Also Published As

Publication number Publication date
CN113112077A (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN113112077B (en) HVAC control system based on multi-step prediction deep reinforcement learning algorithm
Yuce et al. ANN–GA smart appliance scheduling for optimised energy management in the domestic sector
CN112614009B (en) Power grid energy management method and system based on deep expectation Q-learning
CN110705743B (en) New energy consumption electric quantity prediction method based on long-term and short-term memory neural network
Li et al. Predicting hourly cooling load in the building: A comparison of support vector machine and different artificial neural networks
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN114370698B (en) Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN104484715A (en) Neural network and particle swarm optimization algorithm-based building energy consumption predicting method
CN112070262B (en) Air conditioner load prediction method based on support vector machine
CN116187601B (en) Comprehensive energy system operation optimization method based on load prediction
CN112926795A (en) SBO (statistical analysis) -based CNN (continuous casting) optimization-based high-rise residential building group heat load prediction method and system
CN111898856B (en) Analysis method of physical-data fusion building based on extreme learning machine
CN113902582A (en) Building comprehensive energy load prediction method and system
CN114119273A (en) Park comprehensive energy system non-invasive load decomposition method and system
CN115907191A (en) Adaptive building photovoltaic skin model prediction control method
CN112418495A (en) Building energy consumption prediction method based on longicorn stigma optimization algorithm and neural network
Zhao et al. Heating load prediction of residential district using hybrid model based on CNN
CN108303898B (en) Intelligent scheduling method of novel solar-air energy coupling cold and heat cogeneration system
CN112560160B (en) Model and data driven heating ventilation air conditioner optimal set temperature acquisition method and equipment
Godahewa et al. Simulation and optimisation of air conditioning systems using machine learning
CN116880169A (en) Peak power demand prediction control method based on deep reinforcement learning
CN111859242A (en) Household power energy efficiency optimization method and system
Zileska Pancovska et al. Prediction of energy consumption in buildings using support vector machine
CN115169839A (en) Heating load scheduling method based on data-physics-knowledge combined drive
CN115511218A (en) Intermittent type electrical appliance load prediction method based on multi-task learning and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant