CN114234381A - Central air conditioner control method and system based on reinforcement learning - Google Patents
Central air conditioner control method and system based on reinforcement learning Download PDFInfo
- Publication number
- CN114234381A CN114234381A CN202111420612.2A CN202111420612A CN114234381A CN 114234381 A CN114234381 A CN 114234381A CN 202111420612 A CN202111420612 A CN 202111420612A CN 114234381 A CN114234381 A CN 114234381A
- Authority
- CN
- China
- Prior art keywords
- network
- central air
- conditioning system
- current
- control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000002787 reinforcement Effects 0.000 title claims abstract description 29
- 238000004378 air conditioning Methods 0.000 claims abstract description 66
- 230000006870 function Effects 0.000 claims abstract description 33
- 230000009471 action Effects 0.000 claims abstract description 27
- 230000001276 controlling effect Effects 0.000 claims abstract description 11
- 230000004044 response Effects 0.000 claims description 21
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 14
- 238000005457 optimization Methods 0.000 claims description 9
- 238000005057 refrigeration Methods 0.000 claims description 9
- 238000005265 energy consumption Methods 0.000 claims description 7
- 230000000977 initiatory effect Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 description 8
- 238000001816 cooling Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 238000005338 heat storage Methods 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000011217 control strategy Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000004146 energy storage Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000004134 energy conservation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000003442 weekly effect Effects 0.000 description 1
Images
Classifications
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F24—HEATING; RANGES; VENTILATING
- F24F—AIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
- F24F11/00—Control or safety arrangements
- F24F11/50—Control or safety arrangements characterised by user interfaces or communication
- F24F11/54—Control or safety arrangements characterised by user interfaces or communication using one central controller connected to several sub-controllers
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F24—HEATING; RANGES; VENTILATING
- F24F—AIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
- F24F11/00—Control or safety arrangements
- F24F11/50—Control or safety arrangements characterised by user interfaces or communication
- F24F11/56—Remote control
- F24F11/58—Remote control using Internet communication
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F24—HEATING; RANGES; VENTILATING
- F24F—AIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
- F24F11/00—Control or safety arrangements
- F24F11/50—Control or safety arrangements characterised by user interfaces or communication
- F24F11/61—Control or safety arrangements characterised by user interfaces or communication using timers
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F24—HEATING; RANGES; VENTILATING
- F24F—AIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
- F24F2110/00—Control inputs relating to air properties
- F24F2110/10—Temperature
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F24—HEATING; RANGES; VENTILATING
- F24F—AIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
- F24F2140/00—Control inputs relating to system states
- F24F2140/20—Heat-exchange fluid temperature
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Chemical & Material Sciences (AREA)
- Combustion & Propulsion (AREA)
- Mechanical Engineering (AREA)
- General Engineering & Computer Science (AREA)
- Air Conditioning Control Device (AREA)
Abstract
A central air-conditioning system control method based on reinforcement learning comprises the following steps: s1, designing the space where the central air-conditioning system is located and the state space S of the central air-conditioning system, controlling action A for controlling the central air-conditioning system and reward function rt(ii) a S2, based on the state space S, the control action A and the reward function rtDesigning a DDPG network; and S3, executing the DDPG network to control the central air-conditioning system. The method adopts a Deep Deterministic Policy Gradient (DDPG) method to solve the control action, is not influenced by model parameters, and is applied to the field of data processingThe air conditioner load regulation and control capacity is increased on the premise of ensuring the comfort of a user.
Description
Technical Field
The invention relates to the technical field of central air-conditioning system control, in particular to a central air-conditioning control method and a central air-conditioning control system based on reinforcement learning.
Background
The central air conditioning system (HVAC) is used as a main device for building energy consumption, has the characteristics of long operation time, large power, flexible temperature regulation and control range and the like, and is a demand side resource with great potential. The building environment has heat storage capacity, the load regulation has certain energy storage characteristic compared with the traditional load, and the demand response is implemented to reduce the load and realize the power peak demand, so that the heat storage type heat storage system becomes the potential field of the central air-conditioning system with the largest energy conservation. In order to adapt to the continuously changing outdoor weather conditions and the changing conditions of indoor loads, how to select a proper controller to reasonably regulate and control a central air conditioning system on the premise of ensuring the comfort requirement of a user is always the research focus of building operation optimization for achieving the purpose of reducing the building load in peak hours.
Currently, the control methods of the central air conditioning system include:
1) the conventional control modes include rule-based control (such as start-stop control), PID control and the like. Conventional control approaches determine supervisory level set points for the central air conditioning system, such as various temperature/flow rate set points, using rule-based control methods, which are typically static and determined from the experience of engineers and facility managers, requiring extensive a priori knowledge and accurate system model parameters. The system is widely applied to practical engineering projects due to the characteristics of simple design, low cost and the like, however, as a typical complex multivariable system with high nonlinearity, coupling, time-varying property and uncertainty, the traditional control mode is often difficult to obtain ideal operation effect.
2) A model predictive control Method (MPC), the basic idea of which is to obtain an optimal control strategy at each time step by performing a rolling optimization over a time window in the future. By predicting future indoor disturbances and outdoor weather conditions, building energy efficiency can be significantly improved. However, the actual operation effect of the MPC heavily depends on the accuracy of the model, especially for the problem of building hot and humid environment control, it is difficult to establish a building dynamic model which is accurate and can be applied in real-time optimization control, and once the mathematical model has a large deviation from the actual situation, the effect of the control strategy calculated by the MPC is difficult to guarantee. Also, using the MPC method requires low-order system dynamics and objective functions, developing a "model" of MPC is complex, and linear models are typically used to model building temperature response, thus requiring careful selection of control variables to ensure a low-order relationship between central air conditioning energy consumption and state and control variables.
3) A heuristic algorithm (such as a genetic algorithm, a particle swarm algorithm and the like) is adopted, the genetic algorithm is used for realizing the energy-saving optimized operation of the central air-conditioning system, a black box model needs to be established in the optimization method, and the mechanism modeling and parameter identification work is complex.
4) The reinforcement learning method mainly utilizes the traditional Tabular Q-learning algorithm to realize the operation optimization of the air conditioning system, but in the actual control problem, the dimension of the system state space and the action space is large, and the algorithm faces the dimension disaster. Based on the strong generalization ability of the neural network, the dimension disaster problem can be solved by parameterizing the approximate value function, but the situation of over-estimation of the value function easily occurs in the single neural network structure in the algorithm learning process. On the basis of the reinforcement learning method, the problem of gradient disappearance can be solved by using the LSTM neural network, the stability of the reinforcement learning algorithm is improved, but the condition of over-estimation of a value function is still not improved.
The existing problems of the control method of the central air-conditioning system can be summarized as the problems of difficult modeling or inaccurate modeling. Therefore, it is necessary to provide a control method of a central air conditioning system, which solves the problem of difficult accurate modeling.
Disclosure of Invention
The invention provides a central air-conditioning control method based on reinforcement learning, which adopts a Deep Deterministic Policy Gradient (DDPG) method to solve control actions, is not influenced by model parameters, and increases the air-conditioning load regulation and control capability on the premise of ensuring the comfort of a user.
To achieve the above and other related objects, the present invention provides a central air conditioning system control method based on reinforcement learning, comprising the steps of:
s1, designing state space S of central air-conditioning system, controlling action A for controlling central air-conditioning system and reward function rt;
The state space S at least comprises air conditioner load, weather interference factors of the temperature of a controlled area, outdoor weather conditions, chilled water supply temperature, advanced refrigeration time, demand response time, refrigerator operation state and time sequence;
the control action A is to shut down the central air-conditioning system or select a water supply temperature from a water supply temperature set as the water supply temperature of the central air-conditioning system, and the control action A is selected based on the state space S;
the reward function rtThe system is used for judging the control result generated by the control action A to obtain an award value;
s2, based on the state space S, the control action A and the reward function rtDesigning a DDPG network;
and S3, executing the DDPG network to control the central air-conditioning system.
Preferably, the reward function rtThe formula of (1) is:
rt=-[η×(Tsetlow-Tave)×λ+β×Phvac]
wherein eta, lambda and beta represent adjustable hyper-parameters, eta and beta control the relative importance between the energy consumption of the building air conditioner and the indoor thermal comfort level for optimization, lambda represents the punishment level of the room temperature violating the temperature of the controlled area in the idle time, and TsetlowRepresents a penalty threshold for indoor air temperature, TaveRepresents the average indoor temperature, all parameters were normalized.
Preferably, the implementation method of the DDPG network includes the following steps:
s3.1, randomly initializing a current critic network Q, a current actor network mu and target networks Q 'and mu' of the current critic network Q and the actor network mu, randomly initializing a playback buffer R, and randomly initializing N;
s3.2, setting an initial state S based on the state space StThe initial state stInputting the current operator network mu to obtain an initial action at;
S3.3, executing an initial action atAccording to said reward function rtReceive an initial reward RtAnd enter the next state st+1Will [ s ]t,at,Rt,st+1]Storing the composition set into the playback buffer R;
s3.3 [ S ] for the playback buffer Rt,at,Rt,st+1]Performing m times of random sampling, wherein t is 1,2, and m is more than or equal to 2, and setting a target network Q' of the current critic network Q based on m times of random sampling samples to obtain a target network yt;
S3.4, the target network ytSubstituting the loss function of the current critic network Q to update the current critic networkQ, updating the current operator network mu by adopting gradient back propagation;
s3.5, updating the target networks Q 'and mu' in proportion.
Preferably, the loss function of the current critic network Q is:
wherein, Q(s)t,at| θ Q) denotes stAnd atSubstituting into the current critical network Q and having a network parameter of thetaQ,t=1,2,...,m,m≥2。
Preferably, the specific formula for updating the current operator network μ according to the gradient back propagation is as follows:
wherein, thetaμRepresenting a network parameter of the current actor network mu.
Preferably, the scaling target networks Q ' and μ ' represent network parameters θ of the scaling target networks Q ' and μQ'And thetaμ'And specifically according to the following formula:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
where τ represents an update coefficient.
Preferably, the time series is defined as:
wherein S ismThe time series is represented by a time series,representing a time series of seven days per week,representing a time series of 144 for 10min per day.
Based on the same inventive concept, the present invention also provides a control system of a central air-conditioning system, which includes an arm-based embedded device, wherein the embedded device is deployed with a program of the reinforcement learning-based central air-conditioning system control method as described in any one of the above, so that the embedded device is used for performing shutdown and temperature operation of an air conditioner.
In conclusion, the central air-conditioning control method and the control system based on reinforcement learning provided by the invention can achieve the purpose of reducing the building load in the peak time on the premise of not influencing the comfort level of the user by a series of technical means of regulating and controlling the building air-conditioning system, and the method has strong convergence capability and good stability and improves the system efficiency by continuous learning.
Drawings
Fig. 1 is a schematic step diagram of a central air-conditioning control method based on reinforcement learning according to an embodiment of the present invention;
fig. 2 is a diagram illustrating accumulated rewards in a DDPG network learning process in a central air-conditioning control method based on reinforcement learning according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating the control effect of the DDPG network in the central air-conditioning control method based on reinforcement learning according to an embodiment of the present invention;
fig. 4 is a schematic diagram of load shedding potential when the DDPG network selects different pre-cooling time lengths in the central air-conditioning control method based on reinforcement learning according to an embodiment of the present invention.
Detailed Description
The following describes the central air-conditioning control method and system based on reinforcement learning in detail with reference to the accompanying drawings and the detailed description. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise scale for the purpose of facilitating and distinctly aiding in the description of the embodiments of the present invention. To make the objects, features and advantages of the present invention comprehensible, reference is made to the accompanying drawings. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.
First, the DDPG mentioned in the present invention is explained: DDPG is evolved from DDQN, the current Q network of DDQN is responsible for calculating the executable action Q value of the current state space S, then the action A is selected by using an epsilon-greedy strategy, the action A is executed to obtain a new state space S ' and reward, a sample is put into an experience pool, namely a replay buffer, the executable action is calculated on the next state space S ' sampled in the replay buffer, then the action A ' is selected by using a greedy strategy, the target Q network calculates the Q value, and after the target Q network calculates the target Q value, the Loss Function is calculated and the updating parameters are propagated in a gradient reverse mode. And the target Q network is responsible for calculating a target Q value of the experience pool sample according to the Q value and action decoupling idea by combining the current Q network, and periodically updating parameters from the current Q network. In the DDPG, the function positioning of the Critic current network, the Critic target network and the current Q network and the target Q network of the DDQN are basically similar. However, the DDPG has an Actor policy network belonging to itself, so that the epsilon-greedy policy is not needed but the Actor is used to select action a from the current network. The action a 'is selected by the Actor target network without greedy for the next state space S' sampled in the experience pool. The DDPG incorporates the concepts of empirical playback (Experience Replay) and dual networks, i.e., the current network and the target network. However, since there are two networks, namely an Actor network and a Critic network, the two networks become 4 networks, which are: an Actor current network, an Actor target network, a criticic current network, and a criticic target network. The structures of the 2 Actor networks are the same, and the structures of the 2 Critic networks are the same.
Referring to fig. 1, the present invention provides a central air conditioning system control method based on reinforcement learning, including the following steps:
s1, designing state space S of central air-conditioning system, controlling action A for controlling central air-conditioning system and reward function rt(ii) a The state space S at least comprises air conditioner load, weather interference factors of the temperature of a controlled area, outdoor weather conditions, chilled water supply temperature, advanced refrigeration time, demand response time, refrigerator operation state and time sequence; the control action A is to shut down the central air-conditioning system or select a water supply temperature from a water supply temperature set as the water supply temperature of the central air-conditioning system, and the control action A is selected based on the state space S; the reward function rtThe system is used for judging the control result generated by the control action A to obtain an award value;
s2, based on the state space S, the control action A and the reward function rtDesigning a DDPG network;
and S3, executing the DDPG network to control the central air-conditioning system.
In this embodiment, for step S1, the present invention describes the demand response problem of the central air conditioning system as a markov decision process, determines the observable state space and control information, and designs a reward function to accelerate the optimization process of the agent.
Firstly, a state space design is carried out, namely, a space where the central air-conditioning system is located and a state space S of the central air-conditioning system are designed, wherein S is equal to [ P [ [ P ]havc,Tin,Tout,Tsupply,Tp,Te,si,t,Sm]Wherein P ishavcRepresenting air conditioning load, influenced by control strategy action, TinFor the temperature of the controlled area subject to weather and disturbance factors, ToutFor outdoor weather conditions, TsupplyTemperature of the chilled water supply, TpTo advance the cooling time, TeFor the duration of the demand response, si,tFor the operating state of the refrigerator, time series SmCan be defined as:in the formula: represents a seven day weekly time series;representing a time series of 144 for 10min per day.
Then, control operation design is carried out: is represented by A, A ═ off, a1,a2,...,ai]In the formula: off is an off state, aiAnd (i ═ 1, 2.., n) represents that the water supply temperature of the building takes different values at different moments.
Finally, carrying out reward function design: by rtIs represented byt=-[η×(Tsetlow-Tave)×λ+β×Phvac]Wherein eta, lambda and beta are adjustable hyper-parameters, and eta and beta control the relative importance between the energy consumption of the building air conditioner and the indoor thermal comfort level for optimization; λ is the penalty level for a temperature violation in the controlled area during idle time, TsetlowIs a penalty threshold for indoor air temperature, TaveIs the indoor average temperature, and all parameters are normalized.
In this embodiment, the DDPG is performed according to the following steps:
s3.1, randomly initializing a current critic network Q, a current actor network mu and target networks Q 'and mu' of the current critic network Q and the actor network mu, and randomly initializing a playback buffer R;
s3.2, setting an initial state S based on the state space StThe initial state stInputting the current operator network mu to obtain an initial action at,
Wherein, N represents random noise, in order to increase some randomness in the learning process and increase the learning coverage, DDPG adds a certain noise N to the selected action.
S3.3, executing an initial action atAccording to said reward function rtReceive an initial reward RtAnd enter the next state st+1Will [ s ]t,at,Rt,st+1]Storing the composition set into the playback buffer R;
s3.3 [ S ] for the playback buffer Rt,at,Rt,st+1]Performing m times of random sampling, wherein t is 1,2, and m is more than or equal to 2, and setting a target network Q' of the current critic network Q based on m times of random sampling samples to obtain a target network yt;
S3.4, the target network ytSubstituting the loss function of the current critic network Q to update the current critic network Q, and then updating the current operator network mu according to gradient back propagation;
s3.5, updating the target networks Q 'and mu' in proportion.
The basic idea is to adopt a convolutional neural network, namely the mu network and the Q network, as a simulation of a strategy function, and then train the neural network by using a deep learning method. The method is a deterministic behavior strategy, the behavior of each step directly obtains a determined value through a strategy function, and meanwhile, a convolutional neural network is continuously optimized through deep learning, so that the strategy function is improved.
In this embodiment, the loss function of the current critic network Q is:
wherein, Q(s)t,at|θQ) Denotes a general formula stAnd atSubstituting into the current critical network Q and having a network parameter of thetaQ,t=1,2,...,m,m≥2。
In this embodiment, the specific formula for updating the current operator network μ according to the gradient back propagation is as follows:
wherein, thetaμRepresenting a network parameter of the current actor network mu.
In the present embodiment, the scaling target networks Q ' and μ ' represent the network parameters θ of the scaling target networks Q ' and μQ'And thetaμ'And specifically according to the following formula:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
where τ represents an update coefficient, which is generally taken to be relatively small, such as a value of 0.1 or 0.01.
In addition, the inventor tests the method, and in the whole process of continuous demand response, pre-cooling is carried out in advance for a certain time according to the learned strategy, and the start time, the duration and the pre-cooling temperature of the pre-cooling are learned by an intelligent agent. The whole demand response duration is independently learned by the intelligent agent according to the change sensitivity of the outdoor temperature difference, and after the demand response is finished, the unit normally operates according to the outdoor temperature. In the control process of carrying out demand response verification on the central air-conditioning system by using reinforcement learning, parameters need to be configured, example system parameter setting, action family network learning rate of 0.001, comment family network learning rate of 0.0001, discount factor of 0.99 and target network updating parameter of 0.001 are carried out.
The experimental data are derived from part of actual operation data of a building in a certain region in 7.8 months from two years, and the outdoor temperature set in the experiment changes according to the temperature day of the building in 7.8 months from two years in the certain region.
Fig. 2 shows the accumulated reward in the whole DDPG network learning process, when the collected data is just trained, because there is no prior knowledge or rule, the accumulated reward is very small due to the large loss caused by frequently violating the requirement of indoor comfort, after a period of trial and error, the deep reinforcement algorithm learning becomes effective learning, the indoor temperature of the controlled area is kept in the required range, and the accumulated reward is gradually increased. Finally, when the deep reinforcement learning algorithm learns the strategy of avoiding temperature violation and minimizing energy consumption and can learn a better demand response strategy at the same time, it is learned that the strategy is a balance of comfort demand, energy consumption demand and demand response strategy. The Q value will then stabilize, indicating that the proposed method successfully learns a strategy to maximize the jackpot.
Because the outdoor temperature of the running environment of the air conditioner has certain regularity, the cold load of the central air conditioner is influenced by the outdoor temperature and has close relation with time. Thus select min=1440min,mHvac1440min is the length of the input time sequence of the above DDPG network, there are 4 observations of the agent: when the DDPG network learns to lower the indoor set temperature in advance, the room is pre-cooled in the valley period, and the control effect is as shown in fig. 3.
By utilizing the heat storage characteristic of the building, the advanced refrigeration control strategy is adopted in the period of about 13:20-14:00 (namely 800-. Under the strategy, the operation time of the indoor temperature with high load rate is shortened, a certain peak clipping effect is achieved, and a certain demand response requirement is met. However, the higher demand response effect can be achieved by selecting the appropriate advance cooling time period, and the load reduction potential by selecting different pre-cooling time periods is shown in fig. 4.
On the premise that the advance refrigeration strategy is proved to be feasible, the regulation and control potentials of the load can be affected differently by exploring and setting different refrigeration time, the temperature of a controlled area is reduced along with the increase of the advance refrigeration time, the load reduction potential of the system is enhanced, but the load reduction potential can not change greatly after the refrigeration time exceeds 40 minutes. This is because, after the load reduction strategy is implemented, the change speed of the room temperature of the building is very sensitive due to the heat storage characteristics, and the longer the advanced refrigeration time is, the greater the regulation potential of the central air conditioning system is, but after the demand response lasts for a period of time, the temperature difference becomes small for different periods of time, the room temperature has already reached the lower temperature limit of 24 ℃ which is the constraint, and the load reduction capability is not greatly improved any more. Generally speaking, as the demand response time is increased, the regulation potential does not change too much, and when the demand response duration is shorter, the strategy in the scene has better demand response characteristics.
Through the analysis, the comfort level requirement of a user can be well guaranteed by refrigerating in advance in the regulation and control time period, and meanwhile, the self load is reduced. The heat energy storage characteristic of the building air conditioner participating in demand response is also reflected, and the fact that the central air conditioning system is a good user-side demand response resource is demonstrated.
Based on the same invention conception, the invention also provides a control system of the central air-conditioning system, which comprises an arm-based embedded device, wherein the embedded device is deployed with a program of the reinforcement learning-based central air-conditioning system control method, so that the embedded device is used for shutting down an air conditioner and operating the air conditioner at a temperature.
The central air-conditioning control method and the control system based on reinforcement learning provided by the invention can achieve the purpose of reducing the building load in the peak time period by a series of technical means of regulating and controlling the building air-conditioning system without influencing the comfort level of a user.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.
Claims (8)
1. A central air-conditioning system control method based on reinforcement learning is characterized by comprising the following steps:
s1, designing state space S of central air-conditioning system, controlling action A for controlling central air-conditioning system and reward function rt;
The state space S at least comprises air conditioner load, weather interference factors of the temperature of a controlled area, outdoor weather conditions, chilled water supply temperature, advanced refrigeration time, demand response time, refrigerator operation state and time sequence;
the control action A is to shut down the central air-conditioning system or select a water supply temperature from a water supply temperature set as the water supply temperature of the central air-conditioning system, and the control action A is selected based on the state space S;
the reward function rtThe system is used for judging the control result generated by the control action A to obtain an award value;
s2, based on the state space S, the control action A and the reward function rtDesigning a DDPG network;
and S3, executing the DDPG network to control the central air-conditioning system.
2. The reinforcement learning-based central air conditioning system control method of claim 1, wherein the reward function r istThe formula of (1) is:
rt=-[η×(Tsetlow-Tave)×λ+β×Phvac]
wherein eta, lambda and beta represent adjustable hyper-parameters, eta and beta control the relative importance between the energy consumption of the building air conditioner and the indoor thermal comfort level for optimization, lambda represents the punishment level of the room temperature violating the temperature of the controlled area in the idle time, and TsetlowRepresents a penalty threshold for indoor air temperature, TaveRepresents the average indoor temperature, all parameters were normalized.
3. The reinforcement learning-based central air conditioning system control method of claim 1, wherein the implementation method of the DDPG network comprises the following steps:
s3.1, randomly initializing a current critic network Q, a current actor network mu and target networks Q 'and mu' of the current critic network Q and the actor network mu, randomly initializing a playback buffer R, and randomly initializing N;
s3.2, setting an initial state S based on the state space StThe initial state stInputting the current operator network mu to obtain an initial action at;
S3.3, executing an initial action atAccording to said reward function rtReceive an initial reward RtAnd enter the next state st+1Will [ s ]t,at,Rt,st+1]Storing the composition set into the playback buffer R;
s3.3 [ S ] for the playback buffer Rt,at,Rt,st+1]Performing m times of random sampling, wherein t is 1,2, and m is more than or equal to 2, and setting a target network Q' of the current critic network Q based on m times of random sampling samples to obtain a target network yt;
S3.4, the target network ytSubstituting the loss function of the current critic network Q to update the current critic network Q, and then updating the current operator network mu by adopting gradient back propagation;
s3.5, updating the target networks Q 'and mu' in proportion.
4. The reinforcement learning-based central air conditioning system control method of claim 3, wherein the loss function of the current criticc network Q is:
wherein, Q(s)t,at|θQ) Denotes a general formula stAnd atSubstituting into the current critical network Q and having a network parameter of thetaQ,t=1,2,...,m,m≥2。
6. The central air-conditioning system controlling method based on reinforcement learning of claim 5, wherein the ratio update target networks Q ' and μ ' represent network parameters θ of the ratio update target networks Q ' and μQ'And thetaμ'And specifically according to the following formula:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
where τ represents an update coefficient.
7. The intensive learning-based central air-conditioning system control method according to claim 1, wherein the time series is defined as:
8. A control apparatus of a central air-conditioning system, characterized in that the apparatus is deployed with a program of the reinforcement learning-based central air-conditioning system control method according to any one of claims 1 to 7 to make the embedded device used for the shutdown and temperature operation of the air-conditioner.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111420612.2A CN114234381A (en) | 2021-11-26 | 2021-11-26 | Central air conditioner control method and system based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111420612.2A CN114234381A (en) | 2021-11-26 | 2021-11-26 | Central air conditioner control method and system based on reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114234381A true CN114234381A (en) | 2022-03-25 |
Family
ID=80751285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111420612.2A Pending CN114234381A (en) | 2021-11-26 | 2021-11-26 | Central air conditioner control method and system based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114234381A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115289619A (en) * | 2022-07-28 | 2022-11-04 | 安徽大学 | Subway platform HVAC control method based on multi-agent deep reinforcement learning |
CN116481149A (en) * | 2023-06-20 | 2023-07-25 | 深圳市微筑科技有限公司 | Method and system for configuring indoor environment parameters |
CN118224713A (en) * | 2024-05-07 | 2024-06-21 | 深圳市亚晔实业有限公司 | Air ventilation and exhaust cooperative control method and device based on multi-agent system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040204784A1 (en) * | 2002-12-16 | 2004-10-14 | Maturana Francisco Paul | Decentralized autonomous control for complex fluid distribution systems |
CN101650063A (en) * | 2008-08-11 | 2010-02-17 | 柯细勇 | Climate compensation controller for central air conditioner and climate compensation method for central air conditioner |
CN108386971A (en) * | 2018-01-28 | 2018-08-10 | 浙江博超节能科技有限公司 | Central air-conditioning energy robot control system(RCS) |
CN109890176A (en) * | 2019-03-01 | 2019-06-14 | 北京慧辰资道资讯股份有限公司 | A kind of method and device based on artificial intelligence optimization's energy consumption of machine room efficiency |
CN110398029A (en) * | 2019-07-25 | 2019-11-01 | 北京上格云技术有限公司 | Control method and computer readable storage medium |
CN111623497A (en) * | 2020-02-20 | 2020-09-04 | 上海朗绿建筑科技股份有限公司 | Radiation air conditioner precooling and preheating method and system, storage medium and radiation air conditioner |
CN112966431A (en) * | 2021-02-04 | 2021-06-15 | 西安交通大学 | Data center energy consumption joint optimization method, system, medium and equipment |
CN113283156A (en) * | 2021-03-29 | 2021-08-20 | 北京建筑大学 | Subway station air conditioning system energy-saving control method based on deep reinforcement learning |
-
2021
- 2021-11-26 CN CN202111420612.2A patent/CN114234381A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040204784A1 (en) * | 2002-12-16 | 2004-10-14 | Maturana Francisco Paul | Decentralized autonomous control for complex fluid distribution systems |
CN101650063A (en) * | 2008-08-11 | 2010-02-17 | 柯细勇 | Climate compensation controller for central air conditioner and climate compensation method for central air conditioner |
CN108386971A (en) * | 2018-01-28 | 2018-08-10 | 浙江博超节能科技有限公司 | Central air-conditioning energy robot control system(RCS) |
CN109890176A (en) * | 2019-03-01 | 2019-06-14 | 北京慧辰资道资讯股份有限公司 | A kind of method and device based on artificial intelligence optimization's energy consumption of machine room efficiency |
CN110398029A (en) * | 2019-07-25 | 2019-11-01 | 北京上格云技术有限公司 | Control method and computer readable storage medium |
CN111623497A (en) * | 2020-02-20 | 2020-09-04 | 上海朗绿建筑科技股份有限公司 | Radiation air conditioner precooling and preheating method and system, storage medium and radiation air conditioner |
CN112966431A (en) * | 2021-02-04 | 2021-06-15 | 西安交通大学 | Data center energy consumption joint optimization method, system, medium and equipment |
CN113283156A (en) * | 2021-03-29 | 2021-08-20 | 北京建筑大学 | Subway station air conditioning system energy-saving control method based on deep reinforcement learning |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115289619A (en) * | 2022-07-28 | 2022-11-04 | 安徽大学 | Subway platform HVAC control method based on multi-agent deep reinforcement learning |
CN116481149A (en) * | 2023-06-20 | 2023-07-25 | 深圳市微筑科技有限公司 | Method and system for configuring indoor environment parameters |
CN116481149B (en) * | 2023-06-20 | 2023-09-01 | 深圳市微筑科技有限公司 | Method and system for configuring indoor environment parameters |
CN118224713A (en) * | 2024-05-07 | 2024-06-21 | 深圳市亚晔实业有限公司 | Air ventilation and exhaust cooperative control method and device based on multi-agent system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114234381A (en) | Central air conditioner control method and system based on reinforcement learning | |
Zou et al. | Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network | |
Nagarathinam et al. | Marco-multi-agent reinforcement learning based control of building hvac systems | |
CN111795484B (en) | Intelligent air conditioner control method and system | |
Huang et al. | Simulation-based performance evaluation of model predictive control for building energy systems | |
Privara et al. | Model predictive control of a building heating system: The first experience | |
CN113283156B (en) | Energy-saving control method for subway station air conditioning system based on deep reinforcement learning | |
Li et al. | A multi-grid reinforcement learning method for energy conservation and comfort of HVAC in buildings | |
US20200379417A1 (en) | Techniques for using machine learning for control and predictive maintenance of buildings | |
TWI504094B (en) | Power load monitoring and predicting system and method thereof | |
CN111351180A (en) | System and method for realizing energy conservation and temperature control of data center by applying artificial intelligence | |
WO2015151363A1 (en) | Air-conditioning system and control method for air-conditioning equipment | |
US20130151013A1 (en) | Method for Controlling HVAC Systems Using Set-Point Trajectories | |
Ghahramani et al. | Energy trade off analysis of optimized daily temperature setpoints | |
Heidari et al. | Reinforcement Learning for proactive operation of residential energy systems by learning stochastic occupant behavior and fluctuating solar energy: Balancing comfort, hygiene and energy use | |
Nikovski et al. | A method for computing optimal set-point schedules for HVAC systems | |
CN114110824B (en) | Intelligent control method and device for constant humidity machine | |
CN1920427B (en) | Room temperature PID control method of air conditioner set | |
CN113821903A (en) | Temperature control method and device, modular data center and storage medium | |
CN116907036A (en) | Deep reinforcement learning water chilling unit control method based on cold load prediction | |
Sun et al. | Intelligent distributed temperature and humidity control mechanism for uniformity and precision in the indoor environment | |
Bayer et al. | Enhancing the performance of multi-agent reinforcement learning for controlling HVAC systems | |
Wang et al. | A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control | |
Schepers et al. | Autonomous building control using offline reinforcement learning | |
Shen et al. | Advanced control framework of regenerative electric heating with renewable energy based on multi-agent cooperation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220325 |
|
RJ01 | Rejection of invention patent application after publication |