WO2020115631A1 - Method of operating cooling units in data center, and system thereof - Google Patents

Method of operating cooling units in data center, and system thereof Download PDF

Info

Publication number
WO2020115631A1
WO2020115631A1 PCT/IB2019/060347 IB2019060347W WO2020115631A1 WO 2020115631 A1 WO2020115631 A1 WO 2020115631A1 IB 2019060347 W IB2019060347 W IB 2019060347W WO 2020115631 A1 WO2020115631 A1 WO 2020115631A1
Authority
WO
WIPO (PCT)
Prior art keywords
cooling unit
cooling
temperature
condition
priority index
Prior art date
Application number
PCT/IB2019/060347
Other languages
French (fr)
Inventor
Arun Kumar Gupta
Anurag Nandwana
Nandkishor Kubal
Original Assignee
Abb Schweiz Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Abb Schweiz Ag filed Critical Abb Schweiz Ag
Publication of WO2020115631A1 publication Critical patent/WO2020115631A1/en

Links

Classifications

    • HELECTRICITY
    • H05ELECTRIC TECHNIQUES NOT OTHERWISE PROVIDED FOR
    • H05KPRINTED CIRCUITS; CASINGS OR CONSTRUCTIONAL DETAILS OF ELECTRIC APPARATUS; MANUFACTURE OF ASSEMBLAGES OF ELECTRICAL COMPONENTS
    • H05K7/00Constructional details common to different types of electric apparatus
    • H05K7/20Modifications to facilitate cooling, ventilating, or heating
    • H05K7/20709Modifications to facilitate cooling, ventilating, or heating for server racks or cabinets; for data centers, e.g. 19-inch computer racks
    • H05K7/20836Thermal management, e.g. server temperature control
    • HELECTRICITY
    • H05ELECTRIC TECHNIQUES NOT OTHERWISE PROVIDED FOR
    • H05KPRINTED CIRCUITS; CASINGS OR CONSTRUCTIONAL DETAILS OF ELECTRIC APPARATUS; MANUFACTURE OF ASSEMBLAGES OF ELECTRICAL COMPONENTS
    • H05K7/00Constructional details common to different types of electric apparatus
    • H05K7/20Modifications to facilitate cooling, ventilating, or heating
    • H05K7/20709Modifications to facilitate cooling, ventilating, or heating for server racks or cabinets; for data centers, e.g. 19-inch computer racks
    • H05K7/20718Forced ventilation of a gaseous coolant
    • H05K7/20745Forced ventilation of a gaseous coolant within rooms for removing heat from cabinets, e.g. by air conditioning device

Definitions

  • the present invention relates in general to data center. More particularly, the present invention relates to scheduling operations of cooling units for optimizing cooling in the data center.
  • a data center is equipped with a plurality of servers.
  • the plurality of servers is placed in arrays of server racks.
  • arrays of cooling units are used for cooling the plurality of servers.
  • pipes carrying cool air/ water run beneath the server racks which are used to reduce the temperature of the servers.
  • the data center is designed in a way to provide optimum cooling to each server rack.
  • the cooling units like computer room air conditioning (CRAC) units are controlled by operators in the data center.
  • redundant CRACs are provided in the data center. When an active CRAC is shut down for maintenance or has faults, redundant CRACs are operated. Conventionally, operators in the data center shut down the active CRACs and operate the redundant CRACs.
  • a method and a control system for operating a plurality of cooling units in a data center.
  • the control system comprises a scheduler for scheduling operations of the plurality of cooling units.
  • Each cooling unit is operated and controlled by one or more controllers for regulating temperature of plurality of server racks in the data center.
  • each server rack comprises a plurality of servers.
  • Each server rack can be associated with at least one cooling unit from the plurality of cooling units.
  • a plurality of temperature sensors is installed in the data center to measure temperature values of the server racks.
  • the scheduler determines one or more values of temperature of each server rack.
  • the one or more values of temperature can be received either from the plurality of temperature sensors or from data center models like Computational Fluid Dynamic (CFD) model.
  • the received one or more values of temperature are compared with a threshold range. The comparison is performed to detect if the temperature of the server racks has reached beyond the threshold (regions where temperature is beyond the threshold range are termed as hotspots) and specific cooling units are activated to reduce the temperature of the server racks. Further, an operating condition (ON condition or OFF condition) of one or more cooling units proximal to the at least one cooling unit is determined.
  • the operating condition of each cooling unit is updated in the scheduler at regular time intervals or when there is a change in the operating condition.
  • one or more values of temperature of each server rack is estimated when corresponding at least one cooling unit is considered to be operated in OFF condition. This estimation is performed to check the effect of shutting down the at least one cooling unit on corresponding server rack. The estimation is performed by monitoring the operating load of the corresponding server racks, and the operating condition of the one or more cooling unit proximal to the at least one cooling unit.
  • a priority index is calculated for each cooling unit based on the determined one or more values of temperature of each server rack, estimated one or more values of temperature of each server rack and the operating condition of the one or more cooling units proximal to the at least one cooling unit. Thereafter, the scheduler generates a schedule for operating each cooling unit based on the priority index of corresponding cooling unit.
  • the priority index of a cooling unit indicates severity of operating the cooling unit in OFF condition.
  • the priority of the cooling unit is set to "0" if the cooling unit should not be turned OFF or not to be operated in OFF condition.
  • the redundant cooling units are operated when the primary cooling units are turned OFF or operated in the OFF condition. Any cooling unit which has a priority index "0" indicates that turning OFF that cooling unit causes hotspots in corresponding server racks, and hence the cooling unit should not be turned OFF.
  • one or more cooling units having a priority index higher than priority index of rest of the cooling units is turned OFF and redundant cooling units are turned ON.
  • a priority index of "1" is provided to the redundant units which are turned ON and priority of index of rest of the cooling units are incremented by "1".
  • Figure 1 illustrates a simplified block diagram of a data center, in accordance with an embodiment of the present disclosure
  • Figure 2 shows a simplified block diagram of a scheduler in a control system for controlling and operating cooling units in a data center, in accordance with an embodiment of the present disclosure
  • Figure 3 shows an exemplary flowchart illustrating steps for generating a schedule for operating cooling units in a data center, in accordance with an embodiment of the present disclosure
  • Figure 4A- Figure 4C shows exemplary block diagrams of server racks and corresponding cooling units, in accordance with an embodiment of the present disclosure.
  • Figure 5 shows an exemplary flow chart illustrating steps of using a schedule for operating the cooling units in a data center.
  • FIG 1 shows a simplified diagram of a data center (100).
  • the data center (100) comprises a plurality of server racks (101A, 101B... 101 N)/ array of server racks (collectively referred as server racks (101)), a plurality of cooling units (102A, 102B... 102N) (collectively referred as cooling units (102)) and a scheduler (103).
  • each cooling unit is a Computer Room Air Conditioning (CRAC) unit controlled by respective controllers (not shown).
  • the cooling units (102) are used to provide cooling to the server racks (101).
  • each server rack in the server racks (101) comprises a plurality of servers (not shown).
  • each server rack is allotted with at least one cooling unit from the cooling units (102).
  • server rack (101A) is allotted with cooling unit (102A).
  • each server rack (101A, 101B....101N) is allotted with at least one cooling unit (102A, 102B .102N).
  • the allotted at least one cooling unit (102A) provides maximum cooling to particular area in the corresponding server rack (101A).
  • the at least one cooling unit (102A) is also referred as active cooling unit.
  • the data center (100) comprises a plurality of redundant cooling units (102R).
  • the redundant cooling units (102R) are operated (turned ON) to provide uniform cooling when the active cooling units are shut down for maintenance or when there are faults in the active cooling units (102).
  • the scheduler (103) is provided in the data center (100) for scheduling operations of the plurality of cooling units (102). Particularly, the scheduler (103) generates a schedule indicating one or more cooling units from the cooling units (102) to be operated in an ON condition and one or more cooling units to be operated in an OFF condition.
  • FIG. 2 shows a block diagram of the scheduler (103).
  • the scheduler (103) can be a part of the control system provided in the data center (100) or can be a standalone system, integrated with the control system of the data center (100).
  • the scheduler (103) comprises a temperature detection module (201), an operating condition detection module (202), a priority index generator (203), a schedule generator (204) a processor (205) and a memory (206).
  • the temperature detection module (201) is configured to receive one or more values of temperature of each cooling unit from the cooling units (101).
  • the one or more values of temperature can be obtained from a plurality of temperature sensors installed in the data center (100).
  • the one or more values of the temperature can be estimated using load of servers in each server rack. The estimation can be obtained from temperature models (CFD or alike) configured to estimate temperature of the data center (100) using various parameters including server load, number of cooling units, effect of cooling from each cooling unit, etc.
  • the processor (205) configures the temperature detection module (201) to obtain temperature of each server rack when corresponding cooling units (102) are assumed/ considered to be turned OFF/ operated in OFF condition.
  • the temperature of server racks (101) can be obtained during different operating conditions in the data center (100). For example, the temperature of each server rack is obtained during high loading of the servers, when corresponding cooling units are turned OFF.
  • the operating condition detection module (202) obtains operating condition of each cooling unit.
  • the operating condition is the OFF condition or the ON condition.
  • the operating condition is obtained at regular time intervals (e.g., every half hour) or based on a trigger.
  • the trigger can be generated when there is a change in operating condition of any of the cooling units (102). For example, if cooling unit (102A) changes operating state from ON condition to OFF condition.
  • the operating condition detection module (202) stores the operating status of each cooling unit (102) in the memory (205). In an embodiment, the operating condition detection module (202) overwrites operating statuses stored in the memory (206) upon obtaining updated operating statuses.
  • the priority index generator (203) determines a priority index for each cooling unit.
  • the priority index indicates severity of turning OFF the cooling units.
  • the priority index can vary from 1-10.
  • a cooling unit (102N) having a priority index of 1 can indicate high severity of shutting down the cooling unit (102N).
  • a cooling unit (102P) having the priority index of 10 can indicate that the severity of shutting down the cooling unit (102P) is low.
  • the cooling unit (102P) can be turned OFF and a redundant cooling unit (102) can be initiated.
  • the schedule generator (204) generates a schedule based on the priority index of each cooling unit (102).
  • the schedule provides an insight on operating the cooling units (102) in the data center (100). Based on the generated schedule, the cooling units (102) are automatically operated.
  • FIG 3 shows an exemplary flowchart for operating the plurality of cooling units (102).
  • the temperature detection module (201) determines one or more values of temperature of each server rack (102).
  • a server rack (101) is shown. As shown the server rack (101) is cooled by three cooling units (102A, 102B and 102C). Let cooling units (102A and 102B) provide maximum cooling to the server rack (101). Let cooling unit (102C) provide partial cooling to the server rack (101).
  • a plurality of temperature sensors is installed in the data center (100), preferably in the vicinity of the server rack (101).
  • the temperature detection module (201) obtains one or more values of temperature of the server rack (101) from the plurality of temperature sensors.
  • the temperature detection module (201) can obtain the one or more values of temperature from the temperature models.
  • the processor (205) determines operating condition of one or more cooling units proximal to the at least one cooling unit corresponding to each server rack.
  • each server rack is allotted with at least one cooling unit from the cooling units (102).
  • Each cooling unit has proximal cooling units, i.e., cooling units neighboring to the at least one cooling unit.
  • the allotted at least one cooling unit can have only one proximal cooling unit.
  • the allotted at least one cooling unit (102) can have a plurality of proximal cooling units.
  • the processor (205) retrieves the operating condition/ status of the one or more cooling units that is proximal to the allotted at least one cooling unit from the memory (206).
  • the operating status of each cooling unit is stored in the memory (206) by the operating condition detection module (202).
  • cooling units (102C and 102D) be in OFF condition and the cooling units (102A and 102B) be in ON condition.
  • the cooling unit (102B and 102C) are considered as neighboring/ proximal cooling units.
  • many cooling units (102) can be considered proximal to the cooling unit (102A).
  • the processor (205) retrieves the operating status of the proximal cooling units. In this scenario, the operating status of the cooling unit (102B) is ON condition and the operating status of the cooling unit (102C) is OFF condition. Likewise, the operating status of proximal cooling units is determined for the allotted at least one cooling unit corresponding to each server rack.
  • the processor (205) estimates one or more values of temperature of each server rack upon considering corresponding at least one cooling unit to be operated in the OFF condition.
  • the processor (205) calculates a zone of influence (Zol) of each cooling unit (102).
  • Zol of a cooling unit can be defined as effect of cooling of the cooling unit on the corresponding server rack.
  • the Zol of the cooling unit can be calculated using data related to data center layout, position of the server rack with respect to the cooling unit, and temperature and flow rate of the cooling unit.
  • the Zol can be calculated using models where flow and heat parameters are determined in a sectioned room, like CFD models.
  • the Zol can be calculated using data collected for operation of data center and identifying effect of the cooling unit of the corresponding server rack.
  • the Zol is used to estimate one or more values of temperature of the server rack (101) if the corresponding at least one cooling unit is turned OFF. The estimation is performed to determine criticality of turning OFF the at least one cooling unit. The estimation is performed using the model data (CFD or like models).
  • CFD model data
  • Zol of the cooling units (102A and 102B) are estimated. Let us consider that cooling unit (102A) is turned OFF.
  • the processor (205) estimates the temperature of the server rack (101) with the cooling unit (102A) being turned OFF. In this scenario, let us consider that a hotspot/ coldspot (401) is created in the server rack (101) upon turning off the cooling unit (102A).
  • the one or more values of temperature of the server rack (101) is estimated based on operating load of the plurality of servers in the server rack (101), and the operating condition of the proximal cooling units (102B and 102C).
  • the priority index generator (203) determines the priority index for the at least one cooling unit corresponding to each server rack.
  • the priority index indicates the criticality of switching OFF the at least one cooling unit.
  • the priority index for a cooling unit is determined based on the one or more values of temperature of corresponding server rack, estimated one or more values of temperature of the server rack upon considering the cooling unit is operated in OFF condition and operating condition/ status of the neighboring/ proximal cooling units.
  • the cooling unit is associated with the priority index 0.
  • the value 0 is only representative and any such value can be associated to indicate criticality level. For example, on a scale of 0-10, 10 may be associated to indicate high criticality. In this disclosure, the value 0 indicates high criticality.
  • the temperature of the server rack can vary due to high load, fault in the cooling unit (102), or during replacement of active cooling units with redundant cooling units (102R).
  • the corresponding cooling unit is associated with priority index 0. This indicates that the cooling unit cannot be turned OFF and if the cooling unit is turned OFF the temperature of the server rack varies beyond the threshold range, causing hotspots/ coldspots.
  • the cooling unit is associated with the priority index 0. Since the proximal cooling units are turned OFF, the server rack cannot be provided cooling upon turning OFF the cooling unit. Thus, the cooling unit is associated with the priority index 0.
  • the priority index is set to 0. In an embodiment, if all the three conditions are met, then the priority index is set to 1. In a subsequent iteration, if the conditions are met, the priority index is incremented by 1. A person skilled in the art should appreciate that the priority index can be incremented by any value and incrementing by 1 is exemplary to this disclosure. Further, the priority index is calculated for each cooling unit. The cooling unit having the highest priority index is turned OFF. Thereafter, a redundant cooling unit is turned ON and the priority index is set to 1. Thus, on a scale of 0 to N, N indicates the highest priority for turning OFF a cooling unit.
  • the cooling unit having priority index 0 is not incremented until the three conditions as described above are met.
  • the priority index can be determined based on historical data including but not limited to, load variation, time since operating cooling unit in OFF condition, total number of cooling units (102), etc. For example, let us consider a first cooling unit is operated for 10 hours and a second cooling unit is operated for 4 hours. Based on the historical working and current operating conditions of corresponding server rack, the priority index of the first cooling unit can be higher than the priority index of the second cooling unit. Thus, the first cooling unit is more likely to be shut down.
  • the schedule generator (204) generates a schedule for turning OFF the cooling units (102) based on the priority index associated with each cooling unit (102).
  • the schedule can indicate the order in which the cooling units (102) has to be turned OFF.
  • the schedule can be generated in various ways. In an exemplary embodiment, the schedule can be generated as shown in Figure 5.
  • the scheduler (103) predicts the load for an upcoming time slot (e.g., for upcoming 30 minutes).
  • the prediction can be made using the data center models.
  • the scheduler (103) checks if cooling capacity on cooling units (102) operated in ON condition is greater than the predicted load.
  • the scheduler checks if any of the cooling units (102) has to be turned OFF. If the cooling capacity is lesser than the predicted load, then the cooling units (102) are continued to be operated in ON condition and as shown in step 503, the scheduler waits for the next time slot. If the cooling capacity is greater than the predicted load, step 504 is executed.
  • the scheduler (103) calculates the Zol of cooling units (102) and priority index of each cooling unit corresponding to each server rack.
  • the scheduler (103) identifies cooling units (102) having largest priority index.
  • the scheduler (103) configures the controller of the identified cooling units (102) to turn OFF the cooling units (102). Further, the steps 502 to 506 are iteratively repeated.
  • the identified cooling units are listed and are stored in the memory (206). Every time slot, the memory (206) is updated.
  • the objective of the optimization is to determine: min
  • R ref is the reference temperature at each sensor
  • D is allowable variation in temperature at given point.
  • the value of D is important to select as low-level controller will control the temperature within this limit.
  • the demand can be calculated from historical data.
  • the historical data is stored in one or more databases associated with the data center (100).
  • the present invention results in reduction in unplanned shut-down of cooling units (102).
  • the aspects of the present invention reduce hotspots and coldspots in the data center.
  • the aspects of the present invention amount to balanced utilization of cooling units (102), and thereby results in reduced energy consumption.
  • This written description uses examples to describe the subject matter herein, including the best mode, and also to enable any person skilled in the art to make and use the subject matter.
  • the patentable scope of the subject matter is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal language of the claims.

Abstract

The present invention relates to a method and a system for operating cooling units in a data center. The method comprises determining temperature of server racks, determining operation condition of cooling units proximal to cooling units allotted to each server rack. Thereafter, estimating temperature of server racks upon corresponding cooling units is considered to be operated in OFF condition. Further, a priority index is calculated for each cooling unit based on the determined temperature, estimated temperature and the operating condition of proximal cooling units. A schedule is generated using the priority index and the cooling units are operated according to the schedule. Therefore, a robust mechanism to effectively operate the cooling units is proposed in this disclosure.

Description

Title: METHOD OF OPERATING COOLING UNITS IN DATA CENTER, AND SYSTEM THEREOF
Technical Field
[0001] The present invention relates in general to data center. More particularly, the present invention relates to scheduling operations of cooling units for optimizing cooling in the data center.
Background
[0002] Typically, a data center is equipped with a plurality of servers. The plurality of servers is placed in arrays of server racks. Further, arrays of cooling units are used for cooling the plurality of servers. Generally, pipes carrying cool air/ water run beneath the server racks which are used to reduce the temperature of the servers. The data center is designed in a way to provide optimum cooling to each server rack. The cooling units like computer room air conditioning (CRAC) units are controlled by operators in the data center. [0003] Usually, redundant CRACs are provided in the data center. When an active CRAC is shut down for maintenance or has faults, redundant CRACs are operated. Conventionally, operators in the data center shut down the active CRACs and operate the redundant CRACs. However, operators do not foresee the loading requirements and criticality in shutting down the active CRAC. Typically, if a CRAC is operated for 10-15 hours, the CRAC will be shut down by the operators and redundant CRACs are operated. However, shutting down the active CRAC may affect the servers and may cause non-uniform supply of cooling in the data center. Especially, when high activity is anticipated, few active CRACs are essentially to be operated even though they are operated for a long time. [0004] Also, operators do not consider a procedure or logic while commissioning redundant
CRACs. The design of the data center allows few CRACs to provide more cooling to certain server racks. However, redundant CRACs may not provide the same cooling effect as provided by the active CRACs. Hence, the redundant CRACs have to be operated such that the cooling is not reduced, and hot spots and cold spots are prevented in the data center. [0005] Thus, there is a need to address the abovementioned problems and a method and a system is required for optimum cooling in the data center. Summary
[0006] In embodiments, a method and a control system are disclosed for operating a plurality of cooling units in a data center. The control system comprises a scheduler for scheduling operations of the plurality of cooling units. Each cooling unit is operated and controlled by one or more controllers for regulating temperature of plurality of server racks in the data center. In an embodiment, each server rack comprises a plurality of servers. Each server rack can be associated with at least one cooling unit from the plurality of cooling units. Further, a plurality of temperature sensors is installed in the data center to measure temperature values of the server racks. [0007] The scheduler determines one or more values of temperature of each server rack. The one or more values of temperature can be received either from the plurality of temperature sensors or from data center models like Computational Fluid Dynamic (CFD) model. In an embodiment, the received one or more values of temperature are compared with a threshold range. The comparison is performed to detect if the temperature of the server racks has reached beyond the threshold (regions where temperature is beyond the threshold range are termed as hotspots) and specific cooling units are activated to reduce the temperature of the server racks. Further, an operating condition (ON condition or OFF condition) of one or more cooling units proximal to the at least one cooling unit is determined. In an embodiment, the operating condition of each cooling unit is updated in the scheduler at regular time intervals or when there is a change in the operating condition.
[0008] In an embodiment one or more values of temperature of each server rack is estimated when corresponding at least one cooling unit is considered to be operated in OFF condition. This estimation is performed to check the effect of shutting down the at least one cooling unit on corresponding server rack. The estimation is performed by monitoring the operating load of the corresponding server racks, and the operating condition of the one or more cooling unit proximal to the at least one cooling unit. [0009] In an embodiment, a priority index is calculated for each cooling unit based on the determined one or more values of temperature of each server rack, estimated one or more values of temperature of each server rack and the operating condition of the one or more cooling units proximal to the at least one cooling unit. Thereafter, the scheduler generates a schedule for operating each cooling unit based on the priority index of corresponding cooling unit.
[00010] In an embodiment, the priority index of a cooling unit indicates severity of operating the cooling unit in OFF condition. The priority of the cooling unit is set to "0" if the cooling unit should not be turned OFF or not to be operated in OFF condition. Among the plurality of cooling units, there may be few redundant cooling units and many primary cooling units. The redundant cooling units are operated when the primary cooling units are turned OFF or operated in the OFF condition. Any cooling unit which has a priority index "0" indicates that turning OFF that cooling unit causes hotspots in corresponding server racks, and hence the cooling unit should not be turned OFF. In an embodiment, among the plurality of cooling units, one or more cooling units having a priority index higher than priority index of rest of the cooling units is turned OFF and redundant cooling units are turned ON. In an exemplary embodiment, a priority index of "1" is provided to the redundant units which are turned ON and priority of index of rest of the cooling units are incremented by "1". A person of ordinary skill will appreciate that the incremental value and the priority index can take any values and are not limited to the values that are used in the present invention.
Brief Description of the drawings
[00011] The subject matter of the invention will be explained in more detail in the following text with reference to preferred exemplary embodiments which are illustrated in the drawings, in which: [00012] Figure 1 illustrates a simplified block diagram of a data center, in accordance with an embodiment of the present disclosure; [00013] Figure 2 shows a simplified block diagram of a scheduler in a control system for controlling and operating cooling units in a data center, in accordance with an embodiment of the present disclosure;
[00014] Figure 3 shows an exemplary flowchart illustrating steps for generating a schedule for operating cooling units in a data center, in accordance with an embodiment of the present disclosure;
[00015] Figure 4A-Figure 4C shows exemplary block diagrams of server racks and corresponding cooling units, in accordance with an embodiment of the present disclosure; and
[00016] Figure 5 shows an exemplary flow chart illustrating steps of using a schedule for operating the cooling units in a data center.
Detailed description
[00017] In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments, which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments, and it is to be understood that other embodiments may be utilized, and that logical, mechanical, electrical and other changes may be made without departing from the scope of the embodiments. The following detailed description is, therefore, not to be taken in a limiting sense.
[00018] Figure 1 shows a simplified diagram of a data center (100). The data center (100) comprises a plurality of server racks (101A, 101B... 101 N)/ array of server racks (collectively referred as server racks (101)), a plurality of cooling units (102A, 102B... 102N) (collectively referred as cooling units (102)) and a scheduler (103). In an embodiment each cooling unit is a Computer Room Air Conditioning (CRAC) unit controlled by respective controllers (not shown). The cooling units (102) are used to provide cooling to the server racks (101). In an embodiment, each server rack in the server racks (101) comprises a plurality of servers (not shown).
[00019] In one embodiment, each server rack is allotted with at least one cooling unit from the cooling units (102). For example, server rack (101A) is allotted with cooling unit (102A). Likewise, each server rack (101A, 101B....101N) is allotted with at least one cooling unit (102A, 102B .102N). The allotted at least one cooling unit (102A) provides maximum cooling to particular area in the corresponding server rack (101A). In an embodiment, the at least one cooling unit (102A) is also referred as active cooling unit. In an embodiment, the data center (100) comprises a plurality of redundant cooling units (102R). The redundant cooling units (102R) are operated (turned ON) to provide uniform cooling when the active cooling units are shut down for maintenance or when there are faults in the active cooling units (102). The scheduler (103) is provided in the data center (100) for scheduling operations of the plurality of cooling units (102). Particularly, the scheduler (103) generates a schedule indicating one or more cooling units from the cooling units (102) to be operated in an ON condition and one or more cooling units to be operated in an OFF condition.
[00020] Figure 2 shows a block diagram of the scheduler (103). In an embodiment, the scheduler (103) can be a part of the control system provided in the data center (100) or can be a standalone system, integrated with the control system of the data center (100). The scheduler (103) comprises a temperature detection module (201), an operating condition detection module (202), a priority index generator (203), a schedule generator (204) a processor (205) and a memory (206).
[00021] In an embodiment, the temperature detection module (201) is configured to receive one or more values of temperature of each cooling unit from the cooling units (101). In an embodiment, the one or more values of temperature can be obtained from a plurality of temperature sensors installed in the data center (100). In an alternate embodiment, the one or more values of the temperature can be estimated using load of servers in each server rack. The estimation can be obtained from temperature models (CFD or alike) configured to estimate temperature of the data center (100) using various parameters including server load, number of cooling units, effect of cooling from each cooling unit, etc. In an embodiment, the processor (205) configures the temperature detection module (201) to obtain temperature of each server rack when corresponding cooling units (102) are assumed/ considered to be turned OFF/ operated in OFF condition. In an embodiment, the temperature of server racks (101) can be obtained during different operating conditions in the data center (100). For example, the temperature of each server rack is obtained during high loading of the servers, when corresponding cooling units are turned OFF.
[00022] In an embodiment, the operating condition detection module (202) obtains operating condition of each cooling unit. The operating condition is the OFF condition or the ON condition. The operating condition is obtained at regular time intervals (e.g., every half hour) or based on a trigger. The trigger can be generated when there is a change in operating condition of any of the cooling units (102). For example, if cooling unit (102A) changes operating state from ON condition to OFF condition. The operating condition detection module (202) stores the operating status of each cooling unit (102) in the memory (205). In an embodiment, the operating condition detection module (202) overwrites operating statuses stored in the memory (206) upon obtaining updated operating statuses.
[00023] In an embodiment, the priority index generator (203) determines a priority index for each cooling unit. The priority index indicates severity of turning OFF the cooling units. For example, the priority index can vary from 1-10. In an exemplary embodiment, a cooling unit (102N) having a priority index of 1 can indicate high severity of shutting down the cooling unit (102N). In an embodiment, a cooling unit (102P) having the priority index of 10 can indicate that the severity of shutting down the cooling unit (102P) is low. Thus, the cooling unit (102P) can be turned OFF and a redundant cooling unit (102) can be initiated.
[00024] In an embodiment, the schedule generator (204) generates a schedule based on the priority index of each cooling unit (102). The schedule provides an insight on operating the cooling units (102) in the data center (100). Based on the generated schedule, the cooling units (102) are automatically operated.
[00025] Figure 3 shows an exemplary flowchart for operating the plurality of cooling units (102). At step 301, the temperature detection module (201) determines one or more values of temperature of each server rack (102). Referring to Figure 4A, a server rack (101) is shown. As shown the server rack (101) is cooled by three cooling units (102A, 102B and 102C). Let cooling units (102A and 102B) provide maximum cooling to the server rack (101). Let cooling unit (102C) provide partial cooling to the server rack (101). In this scenario, let us consider that a plurality of temperature sensors is installed in the data center (100), preferably in the vicinity of the server rack (101). The temperature detection module (201) obtains one or more values of temperature of the server rack (101) from the plurality of temperature sensors. Alternatively, the temperature detection module (201) can obtain the one or more values of temperature from the temperature models.
[00026] At step 302, the processor (205) determines operating condition of one or more cooling units proximal to the at least one cooling unit corresponding to each server rack. As described earlier, each server rack is allotted with at least one cooling unit from the cooling units (102). Each cooling unit has proximal cooling units, i.e., cooling units neighboring to the at least one cooling unit. In some embodiments, the allotted at least one cooling unit can have only one proximal cooling unit. In some embodiment, the allotted at least one cooling unit (102) can have a plurality of proximal cooling units. The processor (205) retrieves the operating condition/ status of the one or more cooling units that is proximal to the allotted at least one cooling unit from the memory (206). The operating status of each cooling unit is stored in the memory (206) by the operating condition detection module (202).
[00027] Referring to Figure 4B, let the cooling units (102C and 102D) be in OFF condition and the cooling units (102A and 102B) be in ON condition. In one embodiment, while calculating priority index for cooling unit (102A), the cooling unit (102B and 102C) are considered as neighboring/ proximal cooling units. In another embodiment many cooling units (102) can be considered proximal to the cooling unit (102A). The processor (205) retrieves the operating status of the proximal cooling units. In this scenario, the operating status of the cooling unit (102B) is ON condition and the operating status of the cooling unit (102C) is OFF condition. Likewise, the operating status of proximal cooling units is determined for the allotted at least one cooling unit corresponding to each server rack.
[00028] At step 303, the processor (205) estimates one or more values of temperature of each server rack upon considering corresponding at least one cooling unit to be operated in the OFF condition. Here, the processor (205) calculates a zone of influence (Zol) of each cooling unit (102). Zol of a cooling unit can be defined as effect of cooling of the cooling unit on the corresponding server rack. In an embodiment, the Zol of the cooling unit can be calculated using data related to data center layout, position of the server rack with respect to the cooling unit, and temperature and flow rate of the cooling unit. In an embodiment, the Zol can be calculated using models where flow and heat parameters are determined in a sectioned room, like CFD models. In another embodiment, the Zol can be calculated using data collected for operation of data center and identifying effect of the cooling unit of the corresponding server rack. [00029] In an embodiment, the Zol is used to estimate one or more values of temperature of the server rack (101) if the corresponding at least one cooling unit is turned OFF. The estimation is performed to determine criticality of turning OFF the at least one cooling unit. The estimation is performed using the model data (CFD or like models). [00030] Referring now to Figure 4C, let us consider that cooling units (102A and 102B) are the active cooling units providing maximum cooling to the server rack. The cooling units (102A and 102B) are essential to cool the server rack (101). Let the cooling unit (102C) act as redundant cooling unit. Using model data, Zol of the cooling units (102A and 102B) are estimated. Let us consider that cooling unit (102A) is turned OFF. The processor (205) estimates the temperature of the server rack (101) with the cooling unit (102A) being turned OFF. In this scenario, let us consider that a hotspot/ coldspot (401) is created in the server rack (101) upon turning off the cooling unit (102A). The one or more values of temperature of the server rack (101) is estimated based on operating load of the plurality of servers in the server rack (101), and the operating condition of the proximal cooling units (102B and 102C).
[00031] At step 304, the priority index generator (203) determines the priority index for the at least one cooling unit corresponding to each server rack. The priority index indicates the criticality of switching OFF the at least one cooling unit. The priority index for a cooling unit is determined based on the one or more values of temperature of corresponding server rack, estimated one or more values of temperature of the server rack upon considering the cooling unit is operated in OFF condition and operating condition/ status of the neighboring/ proximal cooling units.
[00032] In an embodiment, if the one or more values of temperature of the server rack varies beyond a certain threshold range, then the cooling unit is associated with the priority index 0. A person of ordinary skill will appreciate that the value 0 is only representative and any such value can be associated to indicate criticality level. For example, on a scale of 0-10, 10 may be associated to indicate high criticality. In this disclosure, the value 0 indicates high criticality. The temperature of the server rack can vary due to high load, fault in the cooling unit (102), or during replacement of active cooling units with redundant cooling units (102R).
[00033] In an embodiment, if the estimated one or more values of temperature of the server rack (101) varies beyond the threshold range, then the corresponding cooling unit is associated with priority index 0. This indicates that the cooling unit cannot be turned OFF and if the cooling unit is turned OFF the temperature of the server rack varies beyond the threshold range, causing hotspots/ coldspots.
[00034] In an embodiment, if the operating status of the proximal cooling units is OFF, then the cooling unit is associated with the priority index 0. Since the proximal cooling units are turned OFF, the server rack cannot be provided cooling upon turning OFF the cooling unit. Thus, the cooling unit is associated with the priority index 0.
[00035] In an embodiment, based on at least one of the above three conditions, the priority index is set to 0. In an embodiment, if all the three conditions are met, then the priority index is set to 1. In a subsequent iteration, if the conditions are met, the priority index is incremented by 1. A person skilled in the art should appreciate that the priority index can be incremented by any value and incrementing by 1 is exemplary to this disclosure. Further, the priority index is calculated for each cooling unit. The cooling unit having the highest priority index is turned OFF. Thereafter, a redundant cooling unit is turned ON and the priority index is set to 1. Thus, on a scale of 0 to N, N indicates the highest priority for turning OFF a cooling unit. In an embodiment, the cooling unit having priority index 0 is not incremented until the three conditions as described above are met. [00036] In an embodiment, the priority index can be determined based on historical data including but not limited to, load variation, time since operating cooling unit in OFF condition, total number of cooling units (102), etc. For example, let us consider a first cooling unit is operated for 10 hours and a second cooling unit is operated for 4 hours. Based on the historical working and current operating conditions of corresponding server rack, the priority index of the first cooling unit can be higher than the priority index of the second cooling unit. Thus, the first cooling unit is more likely to be shut down. [00037] At step 305, the schedule generator (204) generates a schedule for turning OFF the cooling units (102) based on the priority index associated with each cooling unit (102). The schedule can indicate the order in which the cooling units (102) has to be turned OFF. [00038] In an embodiment, the schedule can be generated in various ways. In an exemplary embodiment, the schedule can be generated as shown in Figure 5.
[00039] At step 501, the scheduler (103) predicts the load for an upcoming time slot (e.g., for upcoming 30 minutes). The prediction can be made using the data center models.
[00040] At step 502, the scheduler (103) checks if cooling capacity on cooling units (102) operated in ON condition is greater than the predicted load. Here, the scheduler checks if any of the cooling units (102) has to be turned OFF. If the cooling capacity is lesser than the predicted load, then the cooling units (102) are continued to be operated in ON condition and as shown in step 503, the scheduler waits for the next time slot. If the cooling capacity is greater than the predicted load, step 504 is executed.
[00041] At step 504, the scheduler (103) calculates the Zol of cooling units (102) and priority index of each cooling unit corresponding to each server rack.
[00042] At step 505, the scheduler (103) identifies cooling units (102) having largest priority index.
[00043] At step 506, the scheduler (103) configures the controller of the identified cooling units (102) to turn OFF the cooling units (102). Further, the steps 502 to 506 are iteratively repeated.
In an embodiment, the identified cooling units are listed and are stored in the memory (206). Every time slot, the memory (206) is updated.
[00044] In an exemplary embodiment, the schedule can be generated by solving an optimization problem. For example, let D = [demand project matrix]Txl
C = [Binary state of each cooling unit] cxi
5 = [Binary schedule matrix ] Txc
H = [Binary history of cooling unit state] Iarj?e7.xC
Zc = [Cooling capacity in zone of influence for each cooling unit]
R = [temprature measurement in landscape of
Figure imgf000014_0004
[00045] The objective of the optimization is to determine: min
Figure imgf000014_0003
[00046] In the above objective upon equalizing the utilization of each cooling unit (102) for a selected time. The objective will subject to constraints such that equations 1 (demand constraint) and 2 (temperature uniformity constraint) are satisfied:
Figure imgf000014_0001
Where,
Zc for each St requires solving a non-linear model;
D is projected demand;
Figure imgf000014_0002
Where,
Rref is the reference temperature at each sensor; and
D is allowable variation in temperature at given point. In an embodiment, the value of D is important to select as low-level controller will control the temperature within this limit.
[00047] In an embodiment, the demand can be calculated from historical data. In an embodiment, the historical data is stored in one or more databases associated with the data center (100). [00048] In an embodiment, the present invention results in reduction in unplanned shut-down of cooling units (102).
[00049] In an embodiment, the aspects of the present invention reduce hotspots and coldspots in the data center.
[00050] In an embodiment, the aspects of the present invention amount to balanced utilization of cooling units (102), and thereby results in reduced energy consumption. [00051] This written description uses examples to describe the subject matter herein, including the best mode, and also to enable any person skilled in the art to make and use the subject matter. The patentable scope of the subject matter is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal language of the claims.
Referral Numerals
Data center - 100
Server racks - 101
Cooling units - 102
Scheduler - 103
Temperature detection module - 201
Operation condition detection module - 202
Priority index generator - 203
Schedule generator - 204
Processor - 205
Memory - 206
hotspots/ coldspots - 401

Claims

We claim:
1. A method of operating a plurality of cooling units in a data center, wherein the plurality of cooling units are controlled and operated for regulating temperature of a plurality of servers placed in a plurality of server racks, wherein each server rack is allotted with at least one cooling unit from the plurality of cooling units, wherein the method performed by a control system comprising a scheduler, the method comprising:
determining one or more values of temperature of each server rack, wherein the one or more values of temperature is compared with a threshold range;
determining an operating condition of one or more cooling units proximal to the least one cooling unit allotted to corresponding to each server rack;
estimating one or more values of temperature of each server rack upon considering corresponding at least one cooling unit to be operated in an“OFF” condition by calculating a Zone of Influence (Zol) of the corresponding at least one cooling unit, and based on an operating load of each server rack and operating condition of the one or more cooling units proximal to the at least one cooling unit;
determining a priority index for each cooling unit based on the determined one or more values of temperature of corresponding server rack, the operating condition associated with the one or more cooling units and the estimated one or more values of temperature of corresponding server rack, wherein the priority index indicates severity of operating each cooling unit; and generating a schedule for selectively operating each cooling unit in an“ON’ condition or the“OFF” condition based on the priority index associated with corresponding cooling units, wherein each cooling unit is operated based on the generated schedule.
2. The method as claimed in claim 1, wherein the Zol is indicative of effect of a cooling unit on corresponding server rack.
3. The method as claimed in claim 1, wherein the operating load of each server is predicted based on historical working of respective server.
4. The method as claimed in claim 1, wherein the priority index of a cooling unit is set to a first value when at least one of the determined one or more values of temperature of the corresponding cooling unit varies beyond the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“OFF” condition.
5. The method as claimed in claim 1, wherein the priority index of a cooling unit is incremented by a predefined value when the determined value of temperature of the corresponding cooling unit is within the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“ON” condition, wherein the cooling unit is operated in“OFF’ condition upon the priority index of the cooling unit being greater than the priority index of remaining cooling units in the data center.
6. The method as claimed in claim 1, wherein the priority index indicates severity of operating each cooling unit in the“OFF’ condition, wherein the priority index is determined for each cooling unit based on historical data and current data related to each cooling unit
7. A control system comprising a scheduler for operating a plurality of cooling units in a data center, wherein the plurality of cooling units is controlled and operated for regulating temperature of a plurality of servers placed in a plurality of server racks, the scheduler comprising:
a memory; and
one or more processors configured to:
determine one or more values of temperature of each server rack, wherein the one or more values of temperature are compared with a threshold range;
determine operating condition associated with one or more cooling units proximal to the allotted at least one cooling unit corresponding to each server rack;
estimate one or more values of temperature of each server rack upon considering corresponding at least one cooling unit to be operated in an“OFF” condition by calculating a Zone of Influence (Zol) of the corresponding at least one cooling unit, and based on an operating load of each server rack and operating condition of the one or more cooling units proximal to the at least one cooling unit;
determine a priority index for each cooling unit based on the determined one or more values of temperature of corresponding server rack, the operating condition associated with the one or more cooling units and the estimated one or more values of temperature of corresponding server rack, wherein the priority index indicates severity of operating each cooling unit; and
generate a schedule for selectively operating each cooling unit in the“ON* condition or the“OFF” condition based on the priority index associated with corresponding cooling units, wherein each cooling unit is operated based on the generated schedule.
8. The scheduler as claimed in claim 7, wherein the one or more processors are configured to set a first value of the priority index of a cooling unit when at least one of the determined one or more values of temperature of the corresponding cooling unit varies beyond the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“OFF” condition.
9. The scheduler as claimed in claim 7, wherein the one or more processors are configured to increment the priority index of a cooling unit by a predefined value when the determined one or more values of the temperature of the corresponding cooling unit is within the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“ON” condition, wherein the cooling unit is operated in“OFF” condition upon the priority index of the cooling unit being greater than the priority index of remaining cooling units in the data center.
10. The scheduler as claimed in claim 6, wherein the Zol is indicative of effect of a cooling unit on corresponding server rack.
PCT/IB2019/060347 2018-12-03 2019-12-02 Method of operating cooling units in data center, and system thereof WO2020115631A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201841045615 2018-12-03
IN201841045615 2018-12-03

Publications (1)

Publication Number Publication Date
WO2020115631A1 true WO2020115631A1 (en) 2020-06-11

Family

ID=69167857

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2019/060347 WO2020115631A1 (en) 2018-12-03 2019-12-02 Method of operating cooling units in data center, and system thereof

Country Status (1)

Country Link
WO (1) WO2020115631A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120203516A1 (en) * 2011-02-08 2012-08-09 International Business Machines Corporation Techniques for Determining Physical Zones of Influence
US20150184883A1 (en) * 2013-12-27 2015-07-02 International Business Machines Corporation Automatic Computer Room Air Conditioning Control Method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120203516A1 (en) * 2011-02-08 2012-08-09 International Business Machines Corporation Techniques for Determining Physical Zones of Influence
US20150184883A1 (en) * 2013-12-27 2015-07-02 International Business Machines Corporation Automatic Computer Room Air Conditioning Control Method

Similar Documents

Publication Publication Date Title
US11747800B2 (en) Model predictive maintenance system with automatic service work order generation
EP2606406B1 (en) Energy-optimal control decisions for hvac systems
US10890904B2 (en) Model predictive maintenance system for building equipment
US11409274B2 (en) Model predictive maintenance system for performing maintenance as soon as economically viable
US10152394B2 (en) Data center cost optimization using predictive analytics
US11604441B2 (en) Automatic threshold selection of machine learning/deep learning model for anomaly detection of connected chillers
US20190311332A1 (en) Model predictive maintenance system with incentive incorporation
JP5820375B2 (en) Method and apparatus for efficiently adjusting a data center cooling unit
US10700942B2 (en) Building management system with predictive diagnostics
EP3400738B1 (en) Systems and methods for extending the battery life of a wireless sensor in a building control system
US20190325368A1 (en) Model predictive maintenance system with budgetary constraints
EP3655825B1 (en) Building management system with dynamic rules with sub-rule reuse and equation driven smart diagnostics
EP2778821B1 (en) Supervisory controller for HVAC systems
US20200356087A1 (en) Model predictive maintenance system with event or condition based performance
US11703815B2 (en) Building control system with predictive maintenance based on time series analysis
US7894191B2 (en) Fan rotation control method, fan rotation control system, and fan rotation control program
US20150316907A1 (en) Building management system for forecasting time series values of building variables
US20170213303A1 (en) Building fault triage system with crowdsourced feedback for fault diagnostics and suggested resolutions
JP5801732B2 (en) Operation management method of information processing system
US11243005B2 (en) Determining the cause of a fault in an HVAC system
WO2020115631A1 (en) Method of operating cooling units in data center, and system thereof
EP3680733A1 (en) Central plant control system with dynamic computation reduction
WO2021016264A1 (en) Model predictive maintenance system with financial analysis functionality
EP2746688B1 (en) Control device, control method, program and recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19836820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19836820

Country of ref document: EP

Kind code of ref document: A1