CN113112077B - HVAC control system based on multi-step prediction deep reinforcement learning algorithm - Google Patents
HVAC control system based on multi-step prediction deep reinforcement learning algorithm Download PDFInfo
- Publication number
- CN113112077B CN113112077B CN202110403130.XA CN202110403130A CN113112077B CN 113112077 B CN113112077 B CN 113112077B CN 202110403130 A CN202110403130 A CN 202110403130A CN 113112077 B CN113112077 B CN 113112077B
- Authority
- CN
- China
- Prior art keywords
- neural network
- value
- output
- current
- environment temperature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
Abstract
The invention relates to an intelligent control method of a control system of temperature, humidity, Air cleanliness and Air circulation (HVAC), in particular to an HVAC control system based on a Long Short-term Memory neural network (LSTM) and a Deep Reinforcement Learning (DRL) algorithm of a generalized mutual entropy (GC) loss function. The method comprises the following steps: the method comprises the steps of collecting outdoor environment temperature, indoor environment temperature and power grid electricity price information, preprocessing the collected data, predicting future multistep outdoor environment temperature by using outdoor environment temperature historical data, and controlling power output of the HVAC system by utilizing a Deep Deterministic strategy (DDPG) algorithm of a DRL (digital refrigerant phase) based on future outdoor temperature value, indoor environment temperature and power grid electricity price information. The invention can intelligently control the HVAC system in real time to reduce the cost of users and ensure the satisfaction degree of the users, and has higher practical engineering application value.
Description
Technical Field
The invention relates to a method for intelligently and optimally controlling an HVAC system, in particular to a research method for intelligently controlling the HVAC system based on a GC-LSTM neural network and a DRL algorithm.
Background
The household users are used as terminal users of the power grid, and the electricity utilization habits of the users and the addition of the distributed renewable energy sources directly cause the appearance of wave crests and wave troughs of the power grid; which can cause severe impact and serious threat to the power grid. With the development of the smart power grid and the implementation of a demand response strategy in recent years, the passive mode of a resident user is changed into the active mode to be added into the power grid; under the environment of the smart power grid, the electricity price information and the generating capacity information of the power grid are communicated with the demand information of the user in a two-way mode. In the family user, the power consumption of the air conditioning system accounts for about 35% of the power consumption of the whole user, so that on the premise of meeting certain comfort of the user, the output power of the HVAC system is intelligently controlled according to the power price of a power grid and the temperature information of the environment, and the method has important significance for reducing the use of the power, reducing the user cost and reducing the greenhouse effect.
At present, the HVAC system mainly adopts a traditional control mode closed-loop control and model predictive control algorithm, a temperature sensor is arranged in the closed-loop control system, when the indoor temperature is detected to reach a set value, the HVAC system stops working, the HVAC system based on the closed-loop control mode is simple to operate and easy to realize, but under the environment of an intelligent power grid and a corresponding strategy of demand, the power is difficult to be converted according to dynamic electricity prices so as to reach the standards of energy conservation and emission reduction; the model predictive control algorithm controls the HVAC system by establishing an accurate model of the indoor temperature variation, however, the complexity of the indoor ambient temperature variation affects the accuracy of the modeling. With the development of an intelligent algorithm, researchers also propose to optimize and control the HVAC system by using a particle swarm optimization algorithm and a genetic algorithm, the algorithm optimizes and controls the power output of the HVAC system under a real-time electricity price mechanism to reduce the cost of users, the algorithm has the characteristic of difficult tuning, the problem that the power output of the HVAC system has delay on the change of indoor temperature is not considered, and the comfort degree of the users is not really guaranteed. It is therefore necessary to predict the future outdoor ambient temperature values first.
Disclosure of Invention
The invention provides a method for controlling an HVAC control system based on a multi-step prediction deep reinforcement learning algorithm, aiming at the nonlinearity and randomness of outdoor environment temperature and intelligent power grid electricity price and the time delay of the HVAC system output power to the indoor environment temperature change.
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is realized by adopting the following technical scheme, the model structure of the HVAC control system is shown as figure 1, and the HVAC control system comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting the time series data into the data of a supervision sequence;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by the forgetting gate through a sigmoid functiont-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfThe sigmoid function is expressed by sigma and is the weight and the offset value of the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh functionCandidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:wherein wi,wc,biAnd bcInputting the parameter values of the gate neural network;
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct) Wherein w iso,boParameters of the output gate neural network;
4) calculating a true value Y based on a GC loss functiontAnd the predicted value htThe error between, as in the following equation:
Gα,β(0) is a zero-mean generalized gaussian density function,for sample estimation of a predicted value and a true value, N is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, multiple iterative training is carried out, and the weight w and the offset value b of the neural network are updated by a minimum batch gradient descent method, so that the error between the true value and the predicted value is minimum;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Based on a long-short term memory neural network based on a generalized mutual entropy loss function, obtaining the outdoor environment temperature h ═ h at n continuous moments in the futurei+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inH, rho are calculated according to the related informationtAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy mu (S)t|θμ) And Gaussian noiseTo select an actionat∈[Pmin,Pmax]Gaussian noise (Gaussian)Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:then obtain a timely reward rtAnd reaches the next state St+1,ηHVACAnd A is the inertia coefficient, thermal conversion efficiency and overall thermal conductivity of the HVAC respectively;
step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data quantity of the experience pool buff-C is larger than the sampling number M, M samples are randomly taken from the experience pool buff-N (S)i,ai,ri,Si+1),i=1,2,…,M,riRewarding the sample i, and performing the following steps; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1|θμ')|θQ') Where μ' (S)i+1|θμ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1|θμ')|θQ') A target network Q' that is Ctric is a future target value that is output based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Parameters of a target neural network of Actor and parameters of a target network of Ctric are respectively;
step seven: critic current neural network Q pair action a taken based on DDPG algorithmtPerforming evaluation to calculate an evaluation value of θQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square errorAnd updating the parameter theta of the Critic current neural network by using a minimum batch gradient descent methodQ;
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe loss gradient J is given by the following equation:
step ten: the parameters of the current neural networks of Ctric and Actor are soft-copied to the parameters of the target neural networks of Ctric and Actor by a proportionality coefficient tau respectively, that is:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven:regarding the state at the next time as the state at the current time, that is: st←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
Drawings
FIG. 1 is a schematic diagram of the establishment of an HVAC intelligent control system.
Fig. 2 is a graph of loss functions of the outdoor environment temperature training set and the test set in the debugging stage, where 1 represents a loss function curve of the outdoor environment temperature training set, and 2 represents a loss function curve of the outdoor environment temperature test set.
Fig. 3 is a graph showing a real value and a predicted value of the outdoor environment temperature test set at the debugging stage, where 3 represents the predicted value of the outdoor environment temperature test set, and 4 represents the real value of the outdoor environment temperature test set.
Detailed Description
The method takes the collected real environment temperature data as an experimental object to train and test the HVAC control system based on the multi-step prediction deep reinforcement learning algorithm
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, selecting the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]As input to the model, h ═ hi+1,…,hi+n]The sampling interval, as a real output of the model, is every 30 minutes.
Step two: preprocessing the acquired data, correcting abnormal data, converting the data of the time sequence into data of a supervision sequence, and dividing the data into 2500 training sets and 1000 testing sets.
Step three: setting the number of cells of the long-short term memory neural network as 100, the training times as 500, the learning rate as 0.001 and the batch of the minimum batch gradient descent method as 32;
step four: inputting the input quantity of the training set into a long-short term memory neural network based on a generalized mutual entropy loss function, and performing forgetting, memorizing and learning on the input quantity by using a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression process of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by the forgetting gate through a sigmoid functiont-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfThe sigmoid function is expressed by sigma and is the weight and the offset value of the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh functionCandidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:wherein wi,wc,biAnd bcInputting the parameter values of the gate neural network;
3) the output gate obtains the output of the model, and first obtains an initial output o through a sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct) Wherein w iso,boParameters of the output gate neural network;
4) calculating true value Y based on GC loss functiontAnd the predicted value htThe error between, as in the following equation:
Gα,β(0) is a zero-mean generalized gaussian density function,for sample estimation of a predicted value and a true value, N is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, multiple iterative training is carried out, and the weight w and the offset value b of the neural network are updated by a minimum batch gradient descent method, so that the error between the true value and the predicted value is minimum;
step five: finally, based on a long-short term memory neural network of the generalized mutual entropy loss function, obtaining a nonlinear mapping model from the outdoor environment temperature at the first moment i to the outdoor environment temperature at the future moment n, and using the accuracy of the test set test model;
step six: testing the accuracy of the model using the test set, using the root mean square error between the true and predicted values, the probability density distribution of the error, and R2As evaluation indexes of the model, they are defined as:
in the formula yi,hiFor the corresponding real and predicted values of each step,the mean value of real samples in each step, m is the number of samples in the test set, k (-) is a Gaussian kernel function,the probability density function of the error will be implemented in a sliding window approach.
The real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: acquiring the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]Based on GC-LSTM neural network model, obtaining the outdoor environment temperature h ═ 3 moments in future continuouslyi+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inAnd dividing the data into a training set 2500 and a test set 1000 according to the relevant information. H, rhotAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: setting a DDPG algorithm of deep reinforcement learning as four neural networks, wherein a current neural network of an Actor and a target neural network of the Actor have three layers of neural networks with the same structure, a hidden layer activation function is tanh, a current neural network of Critic and a target neural network of Critic have the same neural network structure, and the hidden layer activation function is relu;
step three: current state information S in training settThe current neural network input to the Actor based on the current strategy and Gaussian noiseTo select an actionat∈[Pmin,Pmax],PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step four: performing action atControlling the output power of the air conditioner and then obtaining a timely reward rtTo the next state St+1R is a prizetWill be related to the comfort of the user, as follows:
Tminand TmaxMinimum and maximum comfort temperature, respectively, lambda1And λ2A weighting factor for balancing the awards;
step five: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step six: then randomly take M samples (S) from the experience pool buff-Ni,ai,ri,Si+1),i=1,2,…,M;
Step seven: calculating the expectation y of the target based on the status of the next time and the action obtained by the target network of the Actori=ri+γQ'(Si+1,μ'(Si+1|θμ')|θQ');
Step eight: action a taken by Critic current neural network Q pair of DDPG algorithmtEvaluation is performed to calculate an evaluation value Q (S)t,ai|θQ);
Step nine: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square errorUpdating the parameters of the Critic current neural network by using a minimum batch gradient descent method;
step ten: updating Actor current neural network parameter theta by using sample strategy gradientμThe loss gradient J is given by the following equation:
step eleven: respectively soft-copying the parameters of the current neural networks of Critic and Actor to the target neural network parameters of Ctric and Actor by using a proportionality coefficient tau;
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step twelve: obtaining a convergent Actor current neural network through training of a training set, and outputting a parameter theta of the neural networkμUsing the reward value obtained by each iteration training and the error value L of each step as a judgment index of network convergence;
step thirteen: current state information S of test settInputting to the Actor current neural network of the DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system, and using the power consumption cost of the HVAC system and the comfort of the user as the performance indexes of the system.
The invention has the advantages that: the long-short term memory neural network is used for predicting the temperature of the future outdoor environment, the comfort level of a user is improved, and the generalized mutual entropy loss function is used as the loss function of the long-short term memory neural network to improve the accuracy of prediction; then based on the DDPG algorithm, according to the electricity price change of the power grid, the change of the indoor temperature and the change of the future outdoor temperature, the power output of the HVAC system is intelligently adjusted, and the power consumption cost of a user is saved under the condition of ensuring the comfort degree of the user.
The above description is only an example of the present invention, but the structural features of the present invention are not limited thereto, and any changes or modifications within the scope of the present invention by those skilled in the art are covered by the present invention.
Claims (1)
1. The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is characterized in that: the method comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting time series data into supervision series data;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Tt]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by the forgetting gate through a sigmoid functiont-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfThe sigmoid function is expressed by sigma and is the weight and the offset value of the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bi) Second generating new candidate values by the tanh functionCandidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:wherein wi,wc,biAnd bcInputting the parameter values of the gate neural network;
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct) Wherein w iso,boParameters of the output gate neural network;
4) calculating true value Y based on generalized mutual entropy loss functiontAnd the predicted value htThe error between, as shown in the following equation:
Gα,β(0) is a zero-mean generalized gaussian density function,for sample estimation of a predicted value and a real value, N is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, multiple iterative training is carried out, and the weight w and the offset value b of the neural network are updated by a minimum batch gradient descent method, so that the error between the real value and the predicted value is minimized;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Based on the long-short term memory neural network of the generalized mutual entropy loss function, the outdoor environment temperature h ═ h at n continuous moments in the future is obtainedi+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inH, rho are compared with relevant informationtAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy mu (S)t|θμ) And Gaussian noiseTo select an actionat∈[Pmin,Pmax]Gaussian noise (Gaussian)Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:then obtain a timely reward rtAnd reaches the next state St+1,ηHVACA is the inertia coefficient, the heat conversion efficiency and the overall heat conductivity of the HVAC respectively;
step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data quantity of the experience pool buff-C is larger than the sampling number M, M samples are randomly taken from the experience pool buff-N (S)i,ai,ri,Si+1),i=1,2,…,M,riRewarding the sample i, and performing the following steps; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1|θμ')|θQ') Where μ' (S)i+1|θμ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1|θμ')|θQ') A target network Q' which is Critic outputs a future target value based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Respectively a parameter of a target neural network of Actor and a parameter of a target network of Critic;
step seven: critic current neural network Q pair action a taken based on DDPG algorithmtEvaluation is performed to calculate an evaluation value Q (S)t,at|θQ) Wherein thetaQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square errorAnd updating parameter theta of Critic current neural network by using minimum batch gradient descent methodQ;
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe loss gradient J is given by the following equation:
step ten: the parameters of the current neural networks of Critic and Actor are respectively copied to the parameters of the target neural networks of Critic and Actor in a soft mode by a proportionality coefficient tau, namely:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven: regarding the state at the next time as the state at the current time, that is: s. thet←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110403130.XA CN113112077B (en) | 2021-04-14 | 2021-04-14 | HVAC control system based on multi-step prediction deep reinforcement learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110403130.XA CN113112077B (en) | 2021-04-14 | 2021-04-14 | HVAC control system based on multi-step prediction deep reinforcement learning algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113112077A CN113112077A (en) | 2021-07-13 |
CN113112077B true CN113112077B (en) | 2022-06-10 |
Family
ID=76716975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110403130.XA Active CN113112077B (en) | 2021-04-14 | 2021-04-14 | HVAC control system based on multi-step prediction deep reinforcement learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113112077B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113485498B (en) * | 2021-07-19 | 2022-10-18 | 北京工业大学 | Indoor environment comfort level adjusting method and system based on deep learning |
CN113741449B (en) * | 2021-08-30 | 2023-07-14 | 南京信息工程大学 | Multi-agent control method for sea-air collaborative observation task |
CN113940218B (en) * | 2021-09-30 | 2022-09-16 | 上海易航海芯农业科技有限公司 | Intelligent heat supply method and system for greenhouse |
CN113659246B (en) * | 2021-10-20 | 2022-01-25 | 中国气象科学研究院 | Battery system suitable for polar region ultralow temperature environment and temperature control method thereof |
CN114488811B (en) * | 2022-01-25 | 2023-08-29 | 同济大学 | Greenhouse environment energy-saving control method based on second-order Woltai model prediction |
TWI795283B (en) * | 2022-05-04 | 2023-03-01 | 台灣松下電器股份有限公司 | Control method of air conditioning system |
CN115412923B (en) * | 2022-10-28 | 2023-02-03 | 河北省科学院应用数学研究所 | Multi-source sensor data credible fusion method, system, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
CN111080002A (en) * | 2019-12-10 | 2020-04-28 | 华南理工大学 | Deep learning-based multi-step prediction method and system for building electrical load |
CN111365828A (en) * | 2020-03-06 | 2020-07-03 | 上海外高桥万国数据科技发展有限公司 | Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning |
CN112460741A (en) * | 2020-11-23 | 2021-03-09 | 香港中文大学(深圳) | Control method of building heating, ventilation and air conditioning system |
CN112561728A (en) * | 2020-10-28 | 2021-03-26 | 西安交通大学 | Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102353119B (en) * | 2011-08-09 | 2013-04-24 | 北京建筑工程学院 | Control method of VAV (variable air volume) air-conditioning system |
JP6553933B2 (en) * | 2015-04-24 | 2019-07-31 | 京セラ株式会社 | Power control method, power control apparatus, and power control system |
CN105805822B (en) * | 2016-03-24 | 2018-11-13 | 常州英集动力科技有限公司 | Heating energy-saving control method based on neural network prediction and system |
CN105870483B (en) * | 2016-03-31 | 2019-01-11 | 华中科技大学 | Solid oxide fuel battery system power tracking process thermoelectricity cooperative control method |
US10997491B2 (en) * | 2017-10-04 | 2021-05-04 | Huawei Technologies Co., Ltd. | Method of prediction of a state of an object in the environment using an action model of a neural network |
JP2019200040A (en) * | 2018-05-18 | 2019-11-21 | ジョンソン コントロールズ テクノロジー カンパニーJohnson Controls Technology Company | Hvac control system with model driven deep learning |
US11966840B2 (en) * | 2019-08-15 | 2024-04-23 | Noodle Analytics, Inc. | Deep probabilistic decision machines |
-
2021
- 2021-04-14 CN CN202110403130.XA patent/CN113112077B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
CN111080002A (en) * | 2019-12-10 | 2020-04-28 | 华南理工大学 | Deep learning-based multi-step prediction method and system for building electrical load |
CN111365828A (en) * | 2020-03-06 | 2020-07-03 | 上海外高桥万国数据科技发展有限公司 | Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning |
CN112561728A (en) * | 2020-10-28 | 2021-03-26 | 西安交通大学 | Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment |
CN112460741A (en) * | 2020-11-23 | 2021-03-09 | 香港中文大学(深圳) | Control method of building heating, ventilation and air conditioning system |
Also Published As
Publication number | Publication date |
---|---|
CN113112077A (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113112077B (en) | HVAC control system based on multi-step prediction deep reinforcement learning algorithm | |
Yuce et al. | ANN–GA smart appliance scheduling for optimised energy management in the domestic sector | |
CN112614009B (en) | Power grid energy management method and system based on deep expectation Q-learning | |
CN110705743B (en) | New energy consumption electric quantity prediction method based on long-term and short-term memory neural network | |
Li et al. | Predicting hourly cooling load in the building: A comparison of support vector machine and different artificial neural networks | |
CN113572157B (en) | User real-time autonomous energy management optimization method based on near-end policy optimization | |
CN114370698B (en) | Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning | |
CN104484715A (en) | Neural network and particle swarm optimization algorithm-based building energy consumption predicting method | |
CN112070262B (en) | Air conditioner load prediction method based on support vector machine | |
CN116187601B (en) | Comprehensive energy system operation optimization method based on load prediction | |
CN112926795A (en) | SBO (statistical analysis) -based CNN (continuous casting) optimization-based high-rise residential building group heat load prediction method and system | |
CN111898856B (en) | Analysis method of physical-data fusion building based on extreme learning machine | |
CN113902582A (en) | Building comprehensive energy load prediction method and system | |
CN114119273A (en) | Park comprehensive energy system non-invasive load decomposition method and system | |
CN115907191A (en) | Adaptive building photovoltaic skin model prediction control method | |
CN112418495A (en) | Building energy consumption prediction method based on longicorn stigma optimization algorithm and neural network | |
Zhao et al. | Heating load prediction of residential district using hybrid model based on CNN | |
CN108303898B (en) | Intelligent scheduling method of novel solar-air energy coupling cold and heat cogeneration system | |
CN112560160B (en) | Model and data driven heating ventilation air conditioner optimal set temperature acquisition method and equipment | |
Godahewa et al. | Simulation and optimisation of air conditioning systems using machine learning | |
CN116880169A (en) | Peak power demand prediction control method based on deep reinforcement learning | |
CN111859242A (en) | Household power energy efficiency optimization method and system | |
Zileska Pancovska et al. | Prediction of energy consumption in buildings using support vector machine | |
CN115169839A (en) | Heating load scheduling method based on data-physics-knowledge combined drive | |
CN115511218A (en) | Intermittent type electrical appliance load prediction method based on multi-task learning and deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |