CN114279042A - Central air conditioner control method based on multi-agent deep reinforcement learning - Google Patents

Central air conditioner control method based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN114279042A
CN114279042A CN202111609118.0A CN202111609118A CN114279042A CN 114279042 A CN114279042 A CN 114279042A CN 202111609118 A CN202111609118 A CN 202111609118A CN 114279042 A CN114279042 A CN 114279042A
Authority
CN
China
Prior art keywords
cooling water
reinforcement learning
current
bulb temperature
central air
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111609118.0A
Other languages
Chinese (zh)
Other versions
CN114279042B (en
Inventor
陈建平
傅启明
陈曦尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Industrial Big Data Innovation Center Co ltd
Suzhou University of Science and Technology
Original Assignee
Chongqing Industrial Big Data Innovation Center Co ltd
Suzhou University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Industrial Big Data Innovation Center Co ltd, Suzhou University of Science and Technology filed Critical Chongqing Industrial Big Data Innovation Center Co ltd
Priority to CN202111609118.0A priority Critical patent/CN114279042B/en
Publication of CN114279042A publication Critical patent/CN114279042A/en
Application granted granted Critical
Publication of CN114279042B publication Critical patent/CN114279042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating

Abstract

The invention discloses a central air-conditioning control method based on multi-agent deep reinforcement learning, which carries out model-free optimization control on the starting and stopping states and working parameters of a refrigerator, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to the current indoor required cooling load and the outdoor wet bulb temperature, and comprises the operation sequence control of the refrigerator and the intelligent optimization control on the working frequency of the cooling water pump and the cooling water tower fan, the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, can respectively control the working frequency of the cooling water pump and the cooling water tower fan by using a single agent, can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce the unnecessary refrigerating capacity, reduce the working loads of the refrigerator, the cooling water pump and the cooling water tower fan, prolong the service life and reduce the failure rate, the energy consumption of the whole central air conditioning system and even the total energy consumption of the building are greatly reduced.

Description

Central air conditioner control method based on multi-agent deep reinforcement learning
Technical Field
The invention relates to the technical field of central air-conditioning control, in particular to a central air-conditioning control method based on multi-agent deep reinforcement learning.
Background
According to statistics, the energy consumption of the central air-conditioning system accounts for and even exceeds 50% of the total energy consumption of the building, wherein the energy consumption of the cold machine and the cooling water system is an important component of the energy consumption of the central air-conditioning system, and therefore, the optimal control of the cold machine and the cooling water system is particularly important for reducing the energy consumption of the whole central air-conditioning system and even the total energy consumption of the building.
Currently, the optimal control method in the current control method of the central air conditioning system mainly includes rule-based control, model-free control, and the like. Rule-based control is often static, and control rules are determined based on the experience of engineers and plant administrators, and have very limited applicability and optimization. The model-based approach requires a large amount of historical data and sensor information to build an accurate central air conditioning model, but the approach generally lacks robustness and is not suitable for old building groups lacking historical data and sensors. In order to avoid establishing an accurate mathematical model, a model-free control method is adopted, and the traditional model-free control method needs discretization of states and actions, so that the action space is large, the training time is long, the generalization capability of the algorithm is reduced, and the complex problem cannot be solved.
Therefore, this problem is urgently solved.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a central air-conditioning control method based on multi-agent deep reinforcement learning.
In order to achieve the purpose, the technical scheme adopted by the invention for solving the technical problems is as follows: a central air-conditioning control method based on multi-agent deep reinforcement learning carries out model-free optimal control on start-stop states and working parameters of a cold machine, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to current indoor required cold load and outdoor wet bulb temperature, and comprises cold machine operation sequence control and intelligent agent optimal control of working frequencies of the cooling water pump and the cooling water tower fan.
Preferably, the central air-conditioning system is characterized in that the cold machine, the cooling water pump and the cooling water tower are sequentially connected and arranged in groups, the sequential control of the cold machine is realized through a sequential controller, and the intelligent optimization control of the working frequency of the cooling water pump and the fan of the cooling water tower is realized through a reinforcement learning controller respectively.
Preferably, the method comprises the following steps:
A1. recording the outdoor wet bulb temperature by an electronic thermometer;
A2. obtaining the current indoor required cooling load through energy consumption software EnergyPlus analog simulation;
A3. the sequence controller determines the number of cold machines to be started according to the current indoor cold load requirement;
A4. and after receiving the current state information, the reinforcement learning controller establishes an environment model for the received data information and provides an optimal strategy according to the environment model.
Preferably, in the step a2, the current room is modeled by energy plus as a whole, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are inputted, wherein CL issRepresenting the current indoor required cooling load, and T representing the current indoor dry bulb temperature, outdoor dry bulb temperature and indoor wet bulb temperatureAnd set of outdoor wet bulb temperatures, modelroomRepresents the current room model, outputs CLs={T,modelroom}。
Preferably, in the step a3, the sequence controller performs threshold calculation and action execution, wherein thresholdnRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and thresholdnThe sequence controller calculates CL in real timesThreshold of fallnTo thresholdn+1When n is 0, the sequence controller closes all the coolers and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.
Preferably, in step a4, the two reinforcement learning controllers are respectively used as agents for controlling the working frequency of the cooling water pump and the cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network includes two fully-connected layers and a playback memory unit, the input layer is the current indoor required cooling load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cooling load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the reachable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all reachable frequencies of the cooling water pump, and the playback memory unit is used for recording all samples(s) and the playback memory unit is used for recording all the samples(s)t,at,rt,st+1) Wherein s istIndicating the current indoor required cooling load and outdoor wet bulb temperature, atRepresenting the operating frequencies of the cooling water pump and the cooling tower fan under the current indoor required cooling load and outdoor wet bulb temperature state, s' is represented at st’Performing action a in StatetNext state to which the post-transition is made, rtIs shown in the current state st’Lower execution action atAn immediate reward is obtained.
Preferably, in the step a4, the two reinforcement learning controllers model the control problem of the operating frequency of the cooling water pump and the cooling water tower fan as two Markov Decision Process (MDP) models, and define the states, actions and reward functions therein as follows:
B1. state, denoted by S, wherein CLs represents the current indoor demand cooling load, TwetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CLs,Twet};
B2. Actions, denoted by a, wherein fpumpRepresenting the frequency of the cooling water pump, ftowerRepresenting the frequency of the cooling tower fan, apump=fpummp;atower=ftower
B3. A reward function, denoted by r, where PchillerRepresenting the power consumption of the refrigerator, PtowerRepresenting the power consumption of the cooling tower fan, PpumpRepresenting the power consumption of the cooling water pump,
Figure BDA0003434746340000041
preferably, in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents a return value of adopting the action a in the state s, and Q (s, a) is equal to E [ R (s, a) ] if the value function Q (s, a) is an expectation regarding R (s, a).
Preferably, in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, where the algorithm training process is as follows:
C1. initializing a memory playback unit, wherein the capacity is N, and the memory playback unit is used for storing training samples;
C2. initializing a current value network, randomly initializing a weight parameter omega, initializing a target value network, wherein the structure and the initialization weight are the same as the current value network;
C3. obtaining Q (s, a) in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, after a value function is calculated through the current value network, selecting an action a by using a greedy strategy, marking the action as a time step t by each state transition, and storing data (s, a, r, s') obtained at each time step into a playback memory unit;
C4. defining a loss function:
L(ω)=E[(r+γmaxa′Q(s′,a′;ω-)-Q(s,a;ω))2];
C5. randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting the (s, a, r, s') to the current value network, the target value network and the L (omega), and updating the L (omega) by using a random gradient descent method with respect to the omega, wherein the updating formula is as follows:
Figure BDA0003434746340000051
preferably, the overall algorithm training process is as follows:
D1. at the current time step t, performing cold machine start-stop control according to the real-time cold load;
D2. observing the environmental state stRecording data such as real-time cold load, outdoor wet bulb temperature and the like;
D3. model-free method given control action atSelecting the action a with the maximum current Q value by using a greedy strategyt
D4. The system executes the control action to obtain the next environment state st+1Calculating the coefficient of refrigerating performance under the current action and taking the coefficient as a reward value r in the reinforcement learning algorithmt
D5. Training multi-agent deep reinforcement learning algorithm, executing parameter updating, and sampling(s)t,at,rt,st+1) Storing the data into an experience pool, randomly sampling the data from the experience pool, and executing algorithm training to update network parameters;
D6. ending the current time step t and starting the next time step t + 1.
Due to the application of the technical scheme, compared with the prior art, the invention has the following beneficial effects:
1. the control method adopts multi-agent deep reinforcement learning, namely, a plurality of agents and a neural network are introduced on the basis of the traditional reinforcement learning, the problem of dimension disaster caused by that the reinforcement learning calculates and stores state-action values one by one under the condition of slow convergence speed of a single agent and high-dimensional state space is solved, and particularly, the control method is applied to the aspect that a plurality of cooperative controllers exist in a system, models the control problem of the working frequency of a cooling water pump and a cooling water tower fan into a Markov decision process model according to the current indoor required cold load and outdoor wet bulb temperature, defines a loss function and updates the loss function by using a gradient descent method, and solves the optimal strategy for controlling the working frequency of the cooling water pump and the cooling water tower fan;
2. the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, and can respectively control the working frequencies of the cooling water pump and the cooling water tower fan by using a single agent;
3. the control method can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce unnecessary refrigerating capacity, reduce the working load of a cold machine, a cooling water pump and a cooling water tower fan, prolong the service life, reduce the failure rate and greatly reduce the energy consumption of the whole central air-conditioning system and even the total energy consumption of a building.
Drawings
Fig. 1 is a schematic layout diagram of a chiller, a cooling water pump and a cooling water tower in a central air-conditioning system according to an embodiment of a central air-conditioning control method based on multi-agent deep reinforcement learning.
Fig. 2 is a flowchart of an embodiment of a central air-conditioning control method based on multi-agent deep reinforcement learning according to the present invention.
Fig. 3 is a logic flow diagram of the sequence controller performing threshold calculation and action execution in an embodiment of a multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.
Fig. 4 is a flowchart of deep Q learning algorithm training performed by the reinforcement learning controller in an embodiment of the multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.
FIG. 5 is a flowchart of the overall algorithm training in an embodiment of a multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.
Detailed Description
The present invention will be further described in detail with reference to the following specific examples:
with reference to fig. 1 to 5, this embodiment is a central air-conditioning control method based on multi-agent deep reinforcement learning, and model-free optimal control is performed on start-stop states and working parameters of a chiller, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to a current indoor required cooling load and an outdoor wet bulb temperature, including chiller running sequence control and intelligent agent optimal control of working frequencies of the cooling water pump and the cooling water tower fan.
As shown in fig. 1, the cold machines, the cooling water pumps and the cooling water towers in the central air-conditioning system are sequentially connected and arranged in groups, as shown in fig. 2, the sequential control of the cold machines is realized by a sequential controller, and the intelligent optimization control of the working frequencies of the cooling water pumps and the fans of the cooling water towers is realized by a reinforcement learning controller respectively.
The present embodiment includes the following steps:
A1. recording the outdoor wet bulb temperature by an electronic thermometer;
A2. obtaining the current indoor required cooling load through energy consumption software EnergyPlus analog simulation;
A3. the sequence controller determines the number of cold machines to be started according to the current indoor cold load requirement;
A4. and after receiving the current state information, the reinforcement learning controller establishes an environment model for the received data information and provides an optimal strategy according to the environment model.
In step A2, the current room is modeled as a whole using EnergyPlus, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are input, wherein CL issRepresenting the current indoor demand cooling load, T representing the set of the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature, modelroomRepresents the current room model, outputs CLs={T,modelroom}。
As shown in FIG. 3, in step A3, the sequence controller performs threshold calculation and action execution, wherein thresholdnRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and thresholdnThe sequence controller calculates CL in real timesThreshold of fallnTo thresholdn+1When n is 0, the sequence controller closes all the coolers and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.
In step A4, two reinforcement learning controllers are respectively used as agents for controlling the working frequency of a cooling water pump and a cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network comprises two full-connection layers and a playback memory unit, the input layer is the current indoor required cold load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cold load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the reachable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all the reachable frequencies of the cooling water tower fan, and the playback memory unit is used for recording all samples (st,at,rt,st+1) Wherein s istIndicating the current indoor required cooling load and outdoor wet bulb temperature, atRepresenting the operating frequencies of the cooling water pump and the cooling tower fan under the current indoor required cooling load and outdoor wet bulb temperature state, s' is represented at st’Performing action a in StatetNext state to which the post-transition is made, rtIs shown in the current state st’Lower execution action atAn immediate reward is obtained.
In step a4, two reinforcement learning controllers model the control problem of the operating frequency of the cooling water pumps and cooling tower fans as two Markov Decision Process (MDP) models and define the states, actions and reward functions therein as follows:
B1. state, denoted by S, wherein CLs represents the current indoor demand cooling load, TwetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CLs,Twet};
B2. Actions, denoted by a, wherein fpumpRepresenting the frequency of the cooling water pump, ftowerRepresenting the frequency of the cooling tower fan, apump=fpump;atower=ftower
B3. A reward function, denoted by r, where PchillerRepresenting the power consumption of the refrigerator, PtowerRepresenting the power consumption of the cooling tower fan, PpumpRepresenting the power consumption of the cooling water pump,
Figure BDA0003434746340000101
in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents the return value of adopting action a in state s, and Q (s, a) is the expectation for R (s, a), and Q (s, a) is E [ R (s, a) ].
As shown in fig. 4, in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, and the algorithm training flow is as follows:
C1. initializing a memory playback unit, wherein the capacity is N, and the memory playback unit is used for storing training samples;
C2. initializing a current value network, randomly initializing a weight parameter omega, initializing a target value network, wherein the structure and the initialization weight are the same as the current value network;
C3. obtaining Q (s, a ') in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, selecting an action a by using an e-greedy strategy after calculating a value function through the current value network, marking the action as a time step t by each state transition, and storing data (s, a, rs') obtained at each time step into a playback memory unit;
C4. defining a loss function:
L(ω)=E[(r+γmaxa,Q(s′,a′;ω-)-Q(s,a;ω))2];
C5. randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting the (s, a, r, s') to the current value network, the target value network and the L (omega), and updating the L (omega) by using a random gradient descent method with respect to the omega, wherein the updating formula is as follows:
Figure BDA0003434746340000121
as shown in fig. 5, the present embodiment includes the following overall algorithm training process:
D1. at the current time step t, performing cold machine start-stop control according to the real-time cold load;
D2. observing the environmental state stRecording data such as real-time cold load, outdoor wet bulb temperature and the like;
D3. model-free method given control action atSelecting the action a with the maximum current Q value by using a greedy strategyt
D4. The system executes the control action to obtain the next environment state st+1Calculating the coefficient of refrigerating performance under the current action and taking the coefficient as a reward value r in the reinforcement learning algorithmt
D5. Training multi-agent deep reinforcement learning algorithm, executing parameter updating, and sampling(s)t,at,rt,st+1) Storing the data into an experience pool, randomly sampling the data from the experience pool, and executing algorithm training to update network parameters;
D6. ending the current time step t and starting the next time step t + 1.
The innovation of the invention is as follows:
1. the control method adopts multi-agent deep reinforcement learning, namely, a plurality of agents and a neural network are introduced on the basis of the traditional reinforcement learning, the problem of dimension disaster caused by that the reinforcement learning calculates and stores state-action values one by one under the condition of slow convergence speed of a single agent and high-dimensional state space is solved, and particularly, the control method is applied to the aspect that a plurality of cooperative controllers exist in a system, models the control problem of the working frequency of a cooling water pump and a cooling water tower fan into a Markov decision process model according to the current indoor required cold load and outdoor wet bulb temperature, defines a loss function and updates the loss function by using a gradient descent method, and solves the optimal strategy for controlling the working frequency of the cooling water pump and the cooling water tower fan;
2. the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, and can respectively control the working frequencies of the cooling water pump and the cooling water tower fan by using a single agent;
3. the control method can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce unnecessary refrigerating capacity, reduce the working load of a cold machine, a cooling water pump and a cooling water tower fan, prolong the service life, reduce the failure rate and greatly reduce the energy consumption of the whole central air-conditioning system and even the total energy consumption of a building.
The above-mentioned embodiments are merely illustrative of the technical idea and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention, and all equivalent changes or modifications made according to the spirit of the present invention should be covered in the scope of the present invention.

Claims (10)

1. A central air-conditioning control method based on multi-agent deep reinforcement learning is characterized in that: and performing model-free optimal control on start-stop states and working parameters of a central air conditioning system refrigerating machine, a cooling water pump and a cooling tower fan according to the current indoor required cooling load and the outdoor wet bulb temperature, wherein the model-free optimal control comprises the operation sequence control of the refrigerating machine and the intelligent body optimal control of the working frequency of the cooling water pump and the cooling tower fan.
2. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 1, characterized in that: the central air-conditioning system is characterized in that the cold machine, the cooling water pump and the cooling water tower are sequentially connected and arranged in groups, the cold machine sequence control is realized through a sequence controller, and the intelligent body optimization control of the working frequency of the cooling water pump and the cooling water tower fan is realized through a reinforcement learning controller respectively.
3. The central air-conditioning control method based on multi-agent deep reinforcement learning as claimed in claim 2, characterized by comprising the following steps:
A1. recording the outdoor wet bulb temperature by an electronic thermometer;
A2. obtaining the current indoor required cooling load through energy consumption software EnergyPlus analog simulation;
A3. the sequence controller determines the number of cold machines to be started according to the current indoor cold load requirement;
A4. and after receiving the current state information, the reinforcement learning controller establishes an environment model for the received data information and provides an optimal strategy according to the environment model.
4. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 3, characterized in that: in the step A2, the current room is wholly modeled by EnergyPlus, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are input, wherein CL issRepresenting the current indoor demand cooling load, T representing the set of the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature, modelroomRepresents the current room model, outputs CLs={T,modelroom}。
5. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 4, characterized in that: in the step A3, the sequence controller performs threshold calculation and action execution, wherein thresholdnRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and thresholdnThe sequence controller calculates CL in real timesThreshold of fallnTo thresholdn+1The range of (1) is always kept in the on state of the n refrigerators, and when n is equal to 0,the sequence controller turns off all the cold machines and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.
6. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 5, characterized in that: in the step a4, the two reinforcement learning controllers are respectively used as agents for controlling the working frequency of the cooling water pump and the cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network comprises two fully-connected layers and a playback memory unit, the input layer is the current indoor required cooling load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cooling load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the achievable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all the achievable frequencies of the cooling water tower fan, and the playback memory unit is used for recording all samples(s) and is used for recording the frequencies of the cooling water tower fant,at,rt,st+1) Wherein s istIndicating the current indoor required cooling load and outdoor wet bulb temperature, atRepresenting the operating frequencies of the cooling water pump and the cooling tower fan at the current indoor required cooling load and outdoor wet bulb temperature state, s, is represented at st’Performing action a in StatetNext state to which the post-transition is made, rtIs shown in the current state st’Lower execution action atAn immediate reward is obtained.
7. The multi-agent deep reinforcement learning-based central air conditioning control method according to claim 6, wherein in the step A4, two reinforcement learning controllers model the control problem of the working frequencies of the cooling water pump and the cooling water tower fan as two Markov Decision Process (MDP) models, and define the states, actions and reward functions therein as follows:
B1. state, denoted by S, in which CLsRepresenting the current indoor demand cooling load, TwetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CLs,Twet};
B2. Actions, denoted by a, wherein fpumpRepresenting the frequency of the cooling water pump, ftowerRepresenting the frequency of the cooling tower fan, apump=fpump;atower=ftower
B3. A reward function, denoted by r, where PchillerRepresenting the power consumption of the refrigerator, PtowerRepresenting the power consumption of the cooling tower fan, PpumpRepresenting the power consumption of the cooling water pump,
Figure FDA0003434746330000041
8. the central air-conditioning control method based on multi-agent deep reinforcement learning of claim 7, characterized in that: in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents a return value of action a in state s, and Q (s, a) is an expectation regarding R (s, a), and Q (s, a) is E [ R (s, a) ].
9. The multi-agent Deep reinforcement learning-based central air-conditioning control method according to claim 8, wherein in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, and the algorithm training process is as follows:
C1. initializing a memory playback unit, wherein the capacity is N, and the memory playback unit is used for storing training samples;
C2. initializing a current value network, randomly initializing a weight parameter omega, initializing a target value network, wherein the structure and the initialization weight are the same as the current value network;
C3. obtaining Q (s, a) in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, after a value function is calculated through the current value network, selecting an action a by using an e-greedy strategy, marking the action as a time step t by each state transition, and storing data (s, a, r, s') obtained at each time step into a playback memory unit;
C4. defining a loss function:
L(ω)=E[(r+γmaxa′Q(s′,a′;ω-)-Q(s,a;ω))2];
C5. randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting the (s, a, r, s') to the current value network, the target value network and the L (omega), and updating the L (omega) by using a random gradient descent method with respect to the omega, wherein the updating formula is as follows:
Figure FDA0003434746330000051
10. the central air-conditioning control method based on multi-agent deep strengthening exercise of calligraphy of claim 9, characterized by comprising the following overall algorithm training process:
D1. at the current time step t, performing cold machine start-stop control according to the real-time cold load;
D2. observing the environmental state stRecording data such as real-time cold load, outdoor wet bulb temperature and the like;
D3. model-free method given control action atSelecting the action a with the maximum current Q value by using a greedy strategyt
D4. The system executes the control action to obtain the next environment state st+1Calculating the coefficient of refrigerating performance under the current action and taking the coefficient as a reward value r in the reinforcement learning algorithmt
D5. Training multi-agent deep reinforcement learning algorithm, executing parameter updating, and sampling(s)t,at,rt,st+1) Storing the data into an experience pool, randomly sampling the data from the experience pool, and executing algorithm training to update network parameters;
D6. ending the current time step t and starting the next time step t + 1.
CN202111609118.0A 2021-12-27 2021-12-27 Central air conditioner control method based on multi-agent deep reinforcement learning Active CN114279042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111609118.0A CN114279042B (en) 2021-12-27 2021-12-27 Central air conditioner control method based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111609118.0A CN114279042B (en) 2021-12-27 2021-12-27 Central air conditioner control method based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114279042A true CN114279042A (en) 2022-04-05
CN114279042B CN114279042B (en) 2024-01-26

Family

ID=80875846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111609118.0A Active CN114279042B (en) 2021-12-27 2021-12-27 Central air conditioner control method based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114279042B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115289619A (en) * 2022-07-28 2022-11-04 安徽大学 Subway platform HVAC control method based on multi-agent deep reinforcement learning
CN115544899A (en) * 2022-11-23 2022-12-30 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
CN116485044A (en) * 2023-06-21 2023-07-25 南京邮电大学 Intelligent operation optimization method for power grid interactive type efficient commercial building

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100023167A1 (en) * 2007-04-04 2010-01-28 Yasuyuki Ito Air-conditioning system controller
WO2010020160A1 (en) * 2008-08-22 2010-02-25 Weldtech Technology (Shanghai) Co., Ltd. Method and system of energy-efficient control for central chiller plant systems
CN104089362A (en) * 2014-06-03 2014-10-08 杭州哲达科技股份有限公司 Cooling efficiency maximization method for cooling water system in central air-conditioner and control device
CN104534627A (en) * 2015-01-14 2015-04-22 江苏联宏自动化系统工程有限公司 Comprehensive efficiency control method of central air-conditioning cooling water system
CN105004002A (en) * 2015-07-06 2015-10-28 西安建筑科技大学 Energy saving control system and energy saving control method used for central air conditioner cooling water system
US9536191B1 (en) * 2015-11-25 2017-01-03 Osaro, Inc. Reinforcement learning using confidence scores
CN109475067A (en) * 2018-01-15 2019-03-15 香江科技股份有限公司 A kind of data center's multi-freezing pipe cooling and energy conserving system and its control method
CN111950158A (en) * 2020-08-17 2020-11-17 武汉理工大学 Central air conditioner energy consumption optimization method based on sequence least square programming
CN112325447A (en) * 2020-11-02 2021-02-05 珠海米枣智能科技有限公司 Refrigerating unit control device and control method based on reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100023167A1 (en) * 2007-04-04 2010-01-28 Yasuyuki Ito Air-conditioning system controller
WO2010020160A1 (en) * 2008-08-22 2010-02-25 Weldtech Technology (Shanghai) Co., Ltd. Method and system of energy-efficient control for central chiller plant systems
CN104089362A (en) * 2014-06-03 2014-10-08 杭州哲达科技股份有限公司 Cooling efficiency maximization method for cooling water system in central air-conditioner and control device
CN104534627A (en) * 2015-01-14 2015-04-22 江苏联宏自动化系统工程有限公司 Comprehensive efficiency control method of central air-conditioning cooling water system
CN105004002A (en) * 2015-07-06 2015-10-28 西安建筑科技大学 Energy saving control system and energy saving control method used for central air conditioner cooling water system
US9536191B1 (en) * 2015-11-25 2017-01-03 Osaro, Inc. Reinforcement learning using confidence scores
CN109475067A (en) * 2018-01-15 2019-03-15 香江科技股份有限公司 A kind of data center's multi-freezing pipe cooling and energy conserving system and its control method
CN111950158A (en) * 2020-08-17 2020-11-17 武汉理工大学 Central air conditioner energy consumption optimization method based on sequence least square programming
CN112325447A (en) * 2020-11-02 2021-02-05 珠海米枣智能科技有限公司 Refrigerating unit control device and control method based on reinforcement learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115289619A (en) * 2022-07-28 2022-11-04 安徽大学 Subway platform HVAC control method based on multi-agent deep reinforcement learning
CN115544899A (en) * 2022-11-23 2022-12-30 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
CN115544899B (en) * 2022-11-23 2023-04-07 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
CN116485044A (en) * 2023-06-21 2023-07-25 南京邮电大学 Intelligent operation optimization method for power grid interactive type efficient commercial building
CN116485044B (en) * 2023-06-21 2023-09-12 南京邮电大学 Intelligent operation optimization method for power grid interactive type efficient commercial building

Also Published As

Publication number Publication date
CN114279042B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN114279042B (en) Central air conditioner control method based on multi-agent deep reinforcement learning
WO2023093820A1 (en) Device control optimization method, display platform, cloud server, and storage medium
CN107940679B (en) Group control method based on performance curve of water chilling unit of data center
CN111126605B (en) Data center machine room control method and device based on reinforcement learning algorithm
CN104019526B (en) Improve PSO algorithm Fuzzy Adaptive PID temperature and humidity control system and method
CN108168030B (en) Intelligent control method based on refrigeration performance curve
CN101782261B (en) Nonlinear self-adapting energy-saving control method for heating ventilation air-conditioning system
CN113283156B (en) Energy-saving control method for subway station air conditioning system based on deep reinforcement learning
WO2023030522A1 (en) Data center air conditioning system diagnosis method and apparatus
CN110726218B (en) Air conditioner, control method and device thereof, storage medium and processor
CN108512258B (en) Wind power plant active scheduling method based on improved multi-agent consistency algorithm
CN108089440A (en) Energy-saving control method and device
CN110609474B (en) Data center energy efficiency optimization method based on reinforcement learning
CN112413831A (en) Energy-saving control system and method for central air conditioner
CN114383299B (en) Central air-conditioning system operation strategy optimization method based on big data and dynamic simulation
KR20180138371A (en) Method for evaluating data based models and conducting predictive control of capsule type ice thermal storage system using the same
Li Comparison of the characteristics of the control strategies based on artificial neural network and genetic algorithm for air conditioning systems
CN113791538A (en) Control method, control device and control system of machine room equipment
CN113757922A (en) Deep learning-based air conditioning system energy-saving control method, device and equipment and computer medium
CN115164378A (en) Fresh air handling unit regulation and control method based on digital twins and related device
CN115526504A (en) Energy-saving scheduling method and system for water supply system of pump station, electronic equipment and storage medium
WO2020098405A1 (en) Control method for air conditioner, air conditioner and storage medium
CN115877714B (en) Control method and device for refrigerating system, electronic equipment and storage medium
WO2022246627A1 (en) Method and apparatus for controlling refrigerating device
KR20160063839A (en) Intelligent Pigsty Air Vent Method Of Control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant