CN114279042A - Central air conditioner control method based on multi-agent deep reinforcement learning - Google Patents
Central air conditioner control method based on multi-agent deep reinforcement learning Download PDFInfo
- Publication number
- CN114279042A CN114279042A CN202111609118.0A CN202111609118A CN114279042A CN 114279042 A CN114279042 A CN 114279042A CN 202111609118 A CN202111609118 A CN 202111609118A CN 114279042 A CN114279042 A CN 114279042A
- Authority
- CN
- China
- Prior art keywords
- cooling water
- reinforcement learning
- current
- bulb temperature
- central air
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000000498 cooling water Substances 0.000 claims abstract description 86
- 239000003795 chemical substances by application Substances 0.000 claims abstract description 50
- 238000004378 air conditioning Methods 0.000 claims abstract description 40
- 238000001816 cooling Methods 0.000 claims abstract description 37
- 230000008569 process Effects 0.000 claims abstract description 13
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 230000009471 action Effects 0.000 claims description 51
- 238000004422 calculation algorithm Methods 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 3
- 238000005728 strengthening Methods 0.000 claims 1
- 238000005265 energy consumption Methods 0.000 abstract description 12
- 238000011217 control strategy Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02B—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
- Y02B30/00—Energy efficient heating, ventilation or air conditioning [HVAC]
- Y02B30/70—Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating
Landscapes
- Air Conditioning Control Device (AREA)
Abstract
The invention discloses a central air-conditioning control method based on multi-agent deep reinforcement learning, which carries out model-free optimization control on the starting and stopping states and working parameters of a refrigerator, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to the current indoor required cooling load and the outdoor wet bulb temperature, and comprises the operation sequence control of the refrigerator and the intelligent optimization control on the working frequency of the cooling water pump and the cooling water tower fan, the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, can respectively control the working frequency of the cooling water pump and the cooling water tower fan by using a single agent, can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce the unnecessary refrigerating capacity, reduce the working loads of the refrigerator, the cooling water pump and the cooling water tower fan, prolong the service life and reduce the failure rate, the energy consumption of the whole central air conditioning system and even the total energy consumption of the building are greatly reduced.
Description
Technical Field
The invention relates to the technical field of central air-conditioning control, in particular to a central air-conditioning control method based on multi-agent deep reinforcement learning.
Background
According to statistics, the energy consumption of the central air-conditioning system accounts for and even exceeds 50% of the total energy consumption of the building, wherein the energy consumption of the cold machine and the cooling water system is an important component of the energy consumption of the central air-conditioning system, and therefore, the optimal control of the cold machine and the cooling water system is particularly important for reducing the energy consumption of the whole central air-conditioning system and even the total energy consumption of the building.
Currently, the optimal control method in the current control method of the central air conditioning system mainly includes rule-based control, model-free control, and the like. Rule-based control is often static, and control rules are determined based on the experience of engineers and plant administrators, and have very limited applicability and optimization. The model-based approach requires a large amount of historical data and sensor information to build an accurate central air conditioning model, but the approach generally lacks robustness and is not suitable for old building groups lacking historical data and sensors. In order to avoid establishing an accurate mathematical model, a model-free control method is adopted, and the traditional model-free control method needs discretization of states and actions, so that the action space is large, the training time is long, the generalization capability of the algorithm is reduced, and the complex problem cannot be solved.
Therefore, this problem is urgently solved.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a central air-conditioning control method based on multi-agent deep reinforcement learning.
In order to achieve the purpose, the technical scheme adopted by the invention for solving the technical problems is as follows: a central air-conditioning control method based on multi-agent deep reinforcement learning carries out model-free optimal control on start-stop states and working parameters of a cold machine, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to current indoor required cold load and outdoor wet bulb temperature, and comprises cold machine operation sequence control and intelligent agent optimal control of working frequencies of the cooling water pump and the cooling water tower fan.
Preferably, the central air-conditioning system is characterized in that the cold machine, the cooling water pump and the cooling water tower are sequentially connected and arranged in groups, the sequential control of the cold machine is realized through a sequential controller, and the intelligent optimization control of the working frequency of the cooling water pump and the fan of the cooling water tower is realized through a reinforcement learning controller respectively.
Preferably, the method comprises the following steps:
A1. recording the outdoor wet bulb temperature by an electronic thermometer;
A2. obtaining the current indoor required cooling load through energy consumption software EnergyPlus analog simulation;
A3. the sequence controller determines the number of cold machines to be started according to the current indoor cold load requirement;
A4. and after receiving the current state information, the reinforcement learning controller establishes an environment model for the received data information and provides an optimal strategy according to the environment model.
Preferably, in the step a2, the current room is modeled by energy plus as a whole, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are inputted, wherein CL issRepresenting the current indoor required cooling load, and T representing the current indoor dry bulb temperature, outdoor dry bulb temperature and indoor wet bulb temperatureAnd set of outdoor wet bulb temperatures, modelroomRepresents the current room model, outputs CLs={T,modelroom}。
Preferably, in the step a3, the sequence controller performs threshold calculation and action execution, wherein thresholdnRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and thresholdnThe sequence controller calculates CL in real timesThreshold of fallnTo thresholdn+1When n is 0, the sequence controller closes all the coolers and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.
Preferably, in step a4, the two reinforcement learning controllers are respectively used as agents for controlling the working frequency of the cooling water pump and the cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network includes two fully-connected layers and a playback memory unit, the input layer is the current indoor required cooling load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cooling load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the reachable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all reachable frequencies of the cooling water pump, and the playback memory unit is used for recording all samples(s) and the playback memory unit is used for recording all the samples(s)t,at,rt,st+1) Wherein s istIndicating the current indoor required cooling load and outdoor wet bulb temperature, atRepresenting the operating frequencies of the cooling water pump and the cooling tower fan under the current indoor required cooling load and outdoor wet bulb temperature state, s' is represented at st’Performing action a in StatetNext state to which the post-transition is made, rtIs shown in the current state st’Lower execution action atAn immediate reward is obtained.
Preferably, in the step a4, the two reinforcement learning controllers model the control problem of the operating frequency of the cooling water pump and the cooling water tower fan as two Markov Decision Process (MDP) models, and define the states, actions and reward functions therein as follows:
B1. state, denoted by S, wherein CLs represents the current indoor demand cooling load, TwetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CLs,Twet};
B2. Actions, denoted by a, wherein fpumpRepresenting the frequency of the cooling water pump, ftowerRepresenting the frequency of the cooling tower fan, apump=fpummp;atower=ftower;
B3. A reward function, denoted by r, where PchillerRepresenting the power consumption of the refrigerator, PtowerRepresenting the power consumption of the cooling tower fan, PpumpRepresenting the power consumption of the cooling water pump,
preferably, in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents a return value of adopting the action a in the state s, and Q (s, a) is equal to E [ R (s, a) ] if the value function Q (s, a) is an expectation regarding R (s, a).
Preferably, in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, where the algorithm training process is as follows:
C1. initializing a memory playback unit, wherein the capacity is N, and the memory playback unit is used for storing training samples;
C2. initializing a current value network, randomly initializing a weight parameter omega, initializing a target value network, wherein the structure and the initialization weight are the same as the current value network;
C3. obtaining Q (s, a) in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, after a value function is calculated through the current value network, selecting an action a by using a greedy strategy, marking the action as a time step t by each state transition, and storing data (s, a, r, s') obtained at each time step into a playback memory unit;
C4. defining a loss function:
L(ω)=E[(r+γmaxa′Q(s′,a′;ω-)-Q(s,a;ω))2];
C5. randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting the (s, a, r, s') to the current value network, the target value network and the L (omega), and updating the L (omega) by using a random gradient descent method with respect to the omega, wherein the updating formula is as follows:
preferably, the overall algorithm training process is as follows:
D1. at the current time step t, performing cold machine start-stop control according to the real-time cold load;
D2. observing the environmental state stRecording data such as real-time cold load, outdoor wet bulb temperature and the like;
D3. model-free method given control action atSelecting the action a with the maximum current Q value by using a greedy strategyt;
D4. The system executes the control action to obtain the next environment state st+1Calculating the coefficient of refrigerating performance under the current action and taking the coefficient as a reward value r in the reinforcement learning algorithmt;
D5. Training multi-agent deep reinforcement learning algorithm, executing parameter updating, and sampling(s)t,at,rt,st+1) Storing the data into an experience pool, randomly sampling the data from the experience pool, and executing algorithm training to update network parameters;
D6. ending the current time step t and starting the next time step t + 1.
Due to the application of the technical scheme, compared with the prior art, the invention has the following beneficial effects:
1. the control method adopts multi-agent deep reinforcement learning, namely, a plurality of agents and a neural network are introduced on the basis of the traditional reinforcement learning, the problem of dimension disaster caused by that the reinforcement learning calculates and stores state-action values one by one under the condition of slow convergence speed of a single agent and high-dimensional state space is solved, and particularly, the control method is applied to the aspect that a plurality of cooperative controllers exist in a system, models the control problem of the working frequency of a cooling water pump and a cooling water tower fan into a Markov decision process model according to the current indoor required cold load and outdoor wet bulb temperature, defines a loss function and updates the loss function by using a gradient descent method, and solves the optimal strategy for controlling the working frequency of the cooling water pump and the cooling water tower fan;
2. the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, and can respectively control the working frequencies of the cooling water pump and the cooling water tower fan by using a single agent;
3. the control method can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce unnecessary refrigerating capacity, reduce the working load of a cold machine, a cooling water pump and a cooling water tower fan, prolong the service life, reduce the failure rate and greatly reduce the energy consumption of the whole central air-conditioning system and even the total energy consumption of a building.
Drawings
Fig. 1 is a schematic layout diagram of a chiller, a cooling water pump and a cooling water tower in a central air-conditioning system according to an embodiment of a central air-conditioning control method based on multi-agent deep reinforcement learning.
Fig. 2 is a flowchart of an embodiment of a central air-conditioning control method based on multi-agent deep reinforcement learning according to the present invention.
Fig. 3 is a logic flow diagram of the sequence controller performing threshold calculation and action execution in an embodiment of a multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.
Fig. 4 is a flowchart of deep Q learning algorithm training performed by the reinforcement learning controller in an embodiment of the multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.
FIG. 5 is a flowchart of the overall algorithm training in an embodiment of a multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.
Detailed Description
The present invention will be further described in detail with reference to the following specific examples:
with reference to fig. 1 to 5, this embodiment is a central air-conditioning control method based on multi-agent deep reinforcement learning, and model-free optimal control is performed on start-stop states and working parameters of a chiller, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to a current indoor required cooling load and an outdoor wet bulb temperature, including chiller running sequence control and intelligent agent optimal control of working frequencies of the cooling water pump and the cooling water tower fan.
As shown in fig. 1, the cold machines, the cooling water pumps and the cooling water towers in the central air-conditioning system are sequentially connected and arranged in groups, as shown in fig. 2, the sequential control of the cold machines is realized by a sequential controller, and the intelligent optimization control of the working frequencies of the cooling water pumps and the fans of the cooling water towers is realized by a reinforcement learning controller respectively.
The present embodiment includes the following steps:
A1. recording the outdoor wet bulb temperature by an electronic thermometer;
A2. obtaining the current indoor required cooling load through energy consumption software EnergyPlus analog simulation;
A3. the sequence controller determines the number of cold machines to be started according to the current indoor cold load requirement;
A4. and after receiving the current state information, the reinforcement learning controller establishes an environment model for the received data information and provides an optimal strategy according to the environment model.
In step A2, the current room is modeled as a whole using EnergyPlus, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are input, wherein CL issRepresenting the current indoor demand cooling load, T representing the set of the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature, modelroomRepresents the current room model, outputs CLs={T,modelroom}。
As shown in FIG. 3, in step A3, the sequence controller performs threshold calculation and action execution, wherein thresholdnRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and thresholdnThe sequence controller calculates CL in real timesThreshold of fallnTo thresholdn+1When n is 0, the sequence controller closes all the coolers and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.
In step A4, two reinforcement learning controllers are respectively used as agents for controlling the working frequency of a cooling water pump and a cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network comprises two full-connection layers and a playback memory unit, the input layer is the current indoor required cold load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cold load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the reachable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all the reachable frequencies of the cooling water tower fan, and the playback memory unit is used for recording all samples (st,at,rt,st+1) Wherein s istIndicating the current indoor required cooling load and outdoor wet bulb temperature, atRepresenting the operating frequencies of the cooling water pump and the cooling tower fan under the current indoor required cooling load and outdoor wet bulb temperature state, s' is represented at st’Performing action a in StatetNext state to which the post-transition is made, rtIs shown in the current state st’Lower execution action atAn immediate reward is obtained.
In step a4, two reinforcement learning controllers model the control problem of the operating frequency of the cooling water pumps and cooling tower fans as two Markov Decision Process (MDP) models and define the states, actions and reward functions therein as follows:
B1. state, denoted by S, wherein CLs represents the current indoor demand cooling load, TwetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CLs,Twet};
B2. Actions, denoted by a, wherein fpumpRepresenting the frequency of the cooling water pump, ftowerRepresenting the frequency of the cooling tower fan, apump=fpump;atower=ftower;
B3. A reward function, denoted by r, where PchillerRepresenting the power consumption of the refrigerator, PtowerRepresenting the power consumption of the cooling tower fan, PpumpRepresenting the power consumption of the cooling water pump,
in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents the return value of adopting action a in state s, and Q (s, a) is the expectation for R (s, a), and Q (s, a) is E [ R (s, a) ].
As shown in fig. 4, in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, and the algorithm training flow is as follows:
C1. initializing a memory playback unit, wherein the capacity is N, and the memory playback unit is used for storing training samples;
C2. initializing a current value network, randomly initializing a weight parameter omega, initializing a target value network, wherein the structure and the initialization weight are the same as the current value network;
C3. obtaining Q (s, a ') in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, selecting an action a by using an e-greedy strategy after calculating a value function through the current value network, marking the action as a time step t by each state transition, and storing data (s, a, rs') obtained at each time step into a playback memory unit;
C4. defining a loss function:
L(ω)=E[(r+γmaxa,Q(s′,a′;ω-)-Q(s,a;ω))2];
C5. randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting the (s, a, r, s') to the current value network, the target value network and the L (omega), and updating the L (omega) by using a random gradient descent method with respect to the omega, wherein the updating formula is as follows:
as shown in fig. 5, the present embodiment includes the following overall algorithm training process:
D1. at the current time step t, performing cold machine start-stop control according to the real-time cold load;
D2. observing the environmental state stRecording data such as real-time cold load, outdoor wet bulb temperature and the like;
D3. model-free method given control action atSelecting the action a with the maximum current Q value by using a greedy strategyt;
D4. The system executes the control action to obtain the next environment state st+1Calculating the coefficient of refrigerating performance under the current action and taking the coefficient as a reward value r in the reinforcement learning algorithmt;
D5. Training multi-agent deep reinforcement learning algorithm, executing parameter updating, and sampling(s)t,at,rt,st+1) Storing the data into an experience pool, randomly sampling the data from the experience pool, and executing algorithm training to update network parameters;
D6. ending the current time step t and starting the next time step t + 1.
The innovation of the invention is as follows:
1. the control method adopts multi-agent deep reinforcement learning, namely, a plurality of agents and a neural network are introduced on the basis of the traditional reinforcement learning, the problem of dimension disaster caused by that the reinforcement learning calculates and stores state-action values one by one under the condition of slow convergence speed of a single agent and high-dimensional state space is solved, and particularly, the control method is applied to the aspect that a plurality of cooperative controllers exist in a system, models the control problem of the working frequency of a cooling water pump and a cooling water tower fan into a Markov decision process model according to the current indoor required cold load and outdoor wet bulb temperature, defines a loss function and updates the loss function by using a gradient descent method, and solves the optimal strategy for controlling the working frequency of the cooling water pump and the cooling water tower fan;
2. the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, and can respectively control the working frequencies of the cooling water pump and the cooling water tower fan by using a single agent;
3. the control method can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce unnecessary refrigerating capacity, reduce the working load of a cold machine, a cooling water pump and a cooling water tower fan, prolong the service life, reduce the failure rate and greatly reduce the energy consumption of the whole central air-conditioning system and even the total energy consumption of a building.
The above-mentioned embodiments are merely illustrative of the technical idea and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention, and all equivalent changes or modifications made according to the spirit of the present invention should be covered in the scope of the present invention.
Claims (10)
1. A central air-conditioning control method based on multi-agent deep reinforcement learning is characterized in that: and performing model-free optimal control on start-stop states and working parameters of a central air conditioning system refrigerating machine, a cooling water pump and a cooling tower fan according to the current indoor required cooling load and the outdoor wet bulb temperature, wherein the model-free optimal control comprises the operation sequence control of the refrigerating machine and the intelligent body optimal control of the working frequency of the cooling water pump and the cooling tower fan.
2. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 1, characterized in that: the central air-conditioning system is characterized in that the cold machine, the cooling water pump and the cooling water tower are sequentially connected and arranged in groups, the cold machine sequence control is realized through a sequence controller, and the intelligent body optimization control of the working frequency of the cooling water pump and the cooling water tower fan is realized through a reinforcement learning controller respectively.
3. The central air-conditioning control method based on multi-agent deep reinforcement learning as claimed in claim 2, characterized by comprising the following steps:
A1. recording the outdoor wet bulb temperature by an electronic thermometer;
A2. obtaining the current indoor required cooling load through energy consumption software EnergyPlus analog simulation;
A3. the sequence controller determines the number of cold machines to be started according to the current indoor cold load requirement;
A4. and after receiving the current state information, the reinforcement learning controller establishes an environment model for the received data information and provides an optimal strategy according to the environment model.
4. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 3, characterized in that: in the step A2, the current room is wholly modeled by EnergyPlus, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are input, wherein CL issRepresenting the current indoor demand cooling load, T representing the set of the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature, modelroomRepresents the current room model, outputs CLs={T,modelroom}。
5. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 4, characterized in that: in the step A3, the sequence controller performs threshold calculation and action execution, wherein thresholdnRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and thresholdnThe sequence controller calculates CL in real timesThreshold of fallnTo thresholdn+1The range of (1) is always kept in the on state of the n refrigerators, and when n is equal to 0,the sequence controller turns off all the cold machines and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.
6. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 5, characterized in that: in the step a4, the two reinforcement learning controllers are respectively used as agents for controlling the working frequency of the cooling water pump and the cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network comprises two fully-connected layers and a playback memory unit, the input layer is the current indoor required cooling load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cooling load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the achievable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all the achievable frequencies of the cooling water tower fan, and the playback memory unit is used for recording all samples(s) and is used for recording the frequencies of the cooling water tower fant,at,rt,st+1) Wherein s istIndicating the current indoor required cooling load and outdoor wet bulb temperature, atRepresenting the operating frequencies of the cooling water pump and the cooling tower fan at the current indoor required cooling load and outdoor wet bulb temperature state, s, is represented at st’Performing action a in StatetNext state to which the post-transition is made, rtIs shown in the current state st’Lower execution action atAn immediate reward is obtained.
7. The multi-agent deep reinforcement learning-based central air conditioning control method according to claim 6, wherein in the step A4, two reinforcement learning controllers model the control problem of the working frequencies of the cooling water pump and the cooling water tower fan as two Markov Decision Process (MDP) models, and define the states, actions and reward functions therein as follows:
B1. state, denoted by S, in which CLsRepresenting the current indoor demand cooling load, TwetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CLs,Twet};
B2. Actions, denoted by a, wherein fpumpRepresenting the frequency of the cooling water pump, ftowerRepresenting the frequency of the cooling tower fan, apump=fpump;atower=ftower;
B3. A reward function, denoted by r, where PchillerRepresenting the power consumption of the refrigerator, PtowerRepresenting the power consumption of the cooling tower fan, PpumpRepresenting the power consumption of the cooling water pump,
8. the central air-conditioning control method based on multi-agent deep reinforcement learning of claim 7, characterized in that: in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents a return value of action a in state s, and Q (s, a) is an expectation regarding R (s, a), and Q (s, a) is E [ R (s, a) ].
9. The multi-agent Deep reinforcement learning-based central air-conditioning control method according to claim 8, wherein in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, and the algorithm training process is as follows:
C1. initializing a memory playback unit, wherein the capacity is N, and the memory playback unit is used for storing training samples;
C2. initializing a current value network, randomly initializing a weight parameter omega, initializing a target value network, wherein the structure and the initialization weight are the same as the current value network;
C3. obtaining Q (s, a) in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, after a value function is calculated through the current value network, selecting an action a by using an e-greedy strategy, marking the action as a time step t by each state transition, and storing data (s, a, r, s') obtained at each time step into a playback memory unit;
C4. defining a loss function:
L(ω)=E[(r+γmaxa′Q(s′,a′;ω-)-Q(s,a;ω))2];
C5. randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting the (s, a, r, s') to the current value network, the target value network and the L (omega), and updating the L (omega) by using a random gradient descent method with respect to the omega, wherein the updating formula is as follows:
10. the central air-conditioning control method based on multi-agent deep strengthening exercise of calligraphy of claim 9, characterized by comprising the following overall algorithm training process:
D1. at the current time step t, performing cold machine start-stop control according to the real-time cold load;
D2. observing the environmental state stRecording data such as real-time cold load, outdoor wet bulb temperature and the like;
D3. model-free method given control action atSelecting the action a with the maximum current Q value by using a greedy strategyt;
D4. The system executes the control action to obtain the next environment state st+1Calculating the coefficient of refrigerating performance under the current action and taking the coefficient as a reward value r in the reinforcement learning algorithmt;
D5. Training multi-agent deep reinforcement learning algorithm, executing parameter updating, and sampling(s)t,at,rt,st+1) Storing the data into an experience pool, randomly sampling the data from the experience pool, and executing algorithm training to update network parameters;
D6. ending the current time step t and starting the next time step t + 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111609118.0A CN114279042B (en) | 2021-12-27 | 2021-12-27 | Central air conditioner control method based on multi-agent deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111609118.0A CN114279042B (en) | 2021-12-27 | 2021-12-27 | Central air conditioner control method based on multi-agent deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114279042A true CN114279042A (en) | 2022-04-05 |
CN114279042B CN114279042B (en) | 2024-01-26 |
Family
ID=80875846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111609118.0A Active CN114279042B (en) | 2021-12-27 | 2021-12-27 | Central air conditioner control method based on multi-agent deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114279042B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115289619A (en) * | 2022-07-28 | 2022-11-04 | 安徽大学 | Subway platform HVAC control method based on multi-agent deep reinforcement learning |
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
CN116068886A (en) * | 2022-12-09 | 2023-05-05 | 上海碳索能源服务股份有限公司 | Optimal control device of cooling water system of efficient refrigeration machine room |
CN116485044A (en) * | 2023-06-21 | 2023-07-25 | 南京邮电大学 | Intelligent operation optimization method for power grid interactive type efficient commercial building |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100023167A1 (en) * | 2007-04-04 | 2010-01-28 | Yasuyuki Ito | Air-conditioning system controller |
WO2010020160A1 (en) * | 2008-08-22 | 2010-02-25 | Weldtech Technology (Shanghai) Co., Ltd. | Method and system of energy-efficient control for central chiller plant systems |
CN104089362A (en) * | 2014-06-03 | 2014-10-08 | 杭州哲达科技股份有限公司 | Cooling efficiency maximization method for cooling water system in central air-conditioner and control device |
CN104534627A (en) * | 2015-01-14 | 2015-04-22 | 江苏联宏自动化系统工程有限公司 | Comprehensive efficiency control method of central air-conditioning cooling water system |
CN105004002A (en) * | 2015-07-06 | 2015-10-28 | 西安建筑科技大学 | Energy saving control system and energy saving control method used for central air conditioner cooling water system |
US9536191B1 (en) * | 2015-11-25 | 2017-01-03 | Osaro, Inc. | Reinforcement learning using confidence scores |
CN109475067A (en) * | 2018-01-15 | 2019-03-15 | 香江科技股份有限公司 | A kind of data center's multi-freezing pipe cooling and energy conserving system and its control method |
CN111950158A (en) * | 2020-08-17 | 2020-11-17 | 武汉理工大学 | Central air conditioner energy consumption optimization method based on sequence least square programming |
CN112325447A (en) * | 2020-11-02 | 2021-02-05 | 珠海米枣智能科技有限公司 | Refrigerating unit control device and control method based on reinforcement learning |
-
2021
- 2021-12-27 CN CN202111609118.0A patent/CN114279042B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100023167A1 (en) * | 2007-04-04 | 2010-01-28 | Yasuyuki Ito | Air-conditioning system controller |
WO2010020160A1 (en) * | 2008-08-22 | 2010-02-25 | Weldtech Technology (Shanghai) Co., Ltd. | Method and system of energy-efficient control for central chiller plant systems |
CN104089362A (en) * | 2014-06-03 | 2014-10-08 | 杭州哲达科技股份有限公司 | Cooling efficiency maximization method for cooling water system in central air-conditioner and control device |
CN104534627A (en) * | 2015-01-14 | 2015-04-22 | 江苏联宏自动化系统工程有限公司 | Comprehensive efficiency control method of central air-conditioning cooling water system |
CN105004002A (en) * | 2015-07-06 | 2015-10-28 | 西安建筑科技大学 | Energy saving control system and energy saving control method used for central air conditioner cooling water system |
US9536191B1 (en) * | 2015-11-25 | 2017-01-03 | Osaro, Inc. | Reinforcement learning using confidence scores |
CN109475067A (en) * | 2018-01-15 | 2019-03-15 | 香江科技股份有限公司 | A kind of data center's multi-freezing pipe cooling and energy conserving system and its control method |
CN111950158A (en) * | 2020-08-17 | 2020-11-17 | 武汉理工大学 | Central air conditioner energy consumption optimization method based on sequence least square programming |
CN112325447A (en) * | 2020-11-02 | 2021-02-05 | 珠海米枣智能科技有限公司 | Refrigerating unit control device and control method based on reinforcement learning |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115289619A (en) * | 2022-07-28 | 2022-11-04 | 安徽大学 | Subway platform HVAC control method based on multi-agent deep reinforcement learning |
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
CN115544899B (en) * | 2022-11-23 | 2023-04-07 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
CN116068886A (en) * | 2022-12-09 | 2023-05-05 | 上海碳索能源服务股份有限公司 | Optimal control device of cooling water system of efficient refrigeration machine room |
CN116485044A (en) * | 2023-06-21 | 2023-07-25 | 南京邮电大学 | Intelligent operation optimization method for power grid interactive type efficient commercial building |
CN116485044B (en) * | 2023-06-21 | 2023-09-12 | 南京邮电大学 | Intelligent operation optimization method for power grid interactive type efficient commercial building |
Also Published As
Publication number | Publication date |
---|---|
CN114279042B (en) | 2024-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114279042B (en) | Central air conditioner control method based on multi-agent deep reinforcement learning | |
WO2023093820A1 (en) | Device control optimization method, display platform, cloud server, and storage medium | |
CN107940679B (en) | Group control method based on performance curve of water chilling unit of data center | |
CN111126605B (en) | Data center machine room control method and device based on reinforcement learning algorithm | |
CN104019526B (en) | Improve PSO algorithm Fuzzy Adaptive PID temperature and humidity control system and method | |
CN113283156B (en) | Energy-saving control method for subway station air conditioning system based on deep reinforcement learning | |
WO2023030522A1 (en) | Data center air conditioning system diagnosis method and apparatus | |
CN110726218B (en) | Air conditioner, control method and device thereof, storage medium and processor | |
CN114383299B (en) | Central air-conditioning system operation strategy optimization method based on big data and dynamic simulation | |
CN108512258B (en) | Wind power plant active scheduling method based on improved multi-agent consistency algorithm | |
CN112413831A (en) | Energy-saving control system and method for central air conditioner | |
CN108168030A (en) | A kind of intelligent control method based on refrigeration performance curve | |
KR20180138371A (en) | Method for evaluating data based models and conducting predictive control of capsule type ice thermal storage system using the same | |
CN113821903A (en) | Temperature control method and device, modular data center and storage medium | |
CN118259595A (en) | Optimization control method of variable air volume air conditioning system based on fuzzy control and model predictive control | |
CN113791538A (en) | Control method, control device and control system of machine room equipment | |
CN113757922A (en) | Deep learning-based air conditioning system energy-saving control method, device and equipment and computer medium | |
CN116576542A (en) | Distributed event trigger control method and system for variable air volume central air conditioning system | |
WO2022246627A1 (en) | Method and apparatus for controlling refrigerating device | |
CN115628517A (en) | Simulation system for energy-saving strategy of central air-conditioning cooling tower | |
CN115717758A (en) | Indoor space temperature and humidity regulation and control method and system | |
CN115526504A (en) | Energy-saving scheduling method and system for water supply system of pump station, electronic equipment and storage medium | |
CN110836518A (en) | System basic knowledge based global optimization control method for self-learning air conditioning system | |
CN115877714B (en) | Control method and device for refrigerating system, electronic equipment and storage medium | |
CN116068886B (en) | Optimal control device of cooling water system of efficient refrigeration machine room |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |