WO2020115631A1

WO2020115631A1 - Method of operating cooling units in data center, and system thereof

Info

Publication number: WO2020115631A1
Application number: PCT/IB2019/060347
Authority: WO
Inventors: Arun Kumar Gupta; Anurag Nandwana; Nandkishor Kubal
Original assignee: Abb Schweiz Ag
Priority date: 2018-12-03
Filing date: 2019-12-02
Publication date: 2020-06-11

Abstract

The present invention relates to a method and a system for operating cooling units in a data center. The method comprises determining temperature of server racks, determining operation condition of cooling units proximal to cooling units allotted to each server rack. Thereafter, estimating temperature of server racks upon corresponding cooling units is considered to be operated in OFF condition. Further, a priority index is calculated for each cooling unit based on the determined temperature, estimated temperature and the operating condition of proximal cooling units. A schedule is generated using the priority index and the cooling units are operated according to the schedule. Therefore, a robust mechanism to effectively operate the cooling units is proposed in this disclosure.

Description

Title: METHOD OF OPERATING COOLING UNITS IN DATA CENTER, AND SYSTEM THEREOF

Technical Field

[0001] The present invention relates in general to data center. More particularly, the present invention relates to scheduling operations of cooling units for optimizing cooling in the data center.

Background

[0002] Typically, a data center is equipped with a plurality of servers. The plurality of servers is placed in arrays of server racks. Further, arrays of cooling units are used for cooling the plurality of servers. Generally, pipes carrying cool air/ water run beneath the server racks which are used to reduce the temperature of the servers. The data center is designed in a way to provide optimum cooling to each server rack. The cooling units like computer room air conditioning (CRAC) units are controlled by operators in the data center. [0003] Usually, redundant CRACs are provided in the data center. When an active CRAC is shut down for maintenance or has faults, redundant CRACs are operated. Conventionally, operators in the data center shut down the active CRACs and operate the redundant CRACs. However, operators do not foresee the loading requirements and criticality in shutting down the active CRAC. Typically, if a CRAC is operated for 10-15 hours, the CRAC will be shut down by the operators and redundant CRACs are operated. However, shutting down the active CRAC may affect the servers and may cause non-uniform supply of cooling in the data center. Especially, when high activity is anticipated, few active CRACs are essentially to be operated even though they are operated for a long time. [0004] Also, operators do not consider a procedure or logic while commissioning redundant

CRACs. The design of the data center allows few CRACs to provide more cooling to certain server racks. However, redundant CRACs may not provide the same cooling effect as provided by the active CRACs. Hence, the redundant CRACs have to be operated such that the cooling is not reduced, and hot spots and cold spots are prevented in the data center. [0005] Thus, there is a need to address the abovementioned problems and a method and a system is required for optimum cooling in the data center. Summary

[0006] In embodiments, a method and a control system are disclosed for operating a plurality of cooling units in a data center. The control system comprises a scheduler for scheduling operations of the plurality of cooling units. Each cooling unit is operated and controlled by one or more controllers for regulating temperature of plurality of server racks in the data center. In an embodiment, each server rack comprises a plurality of servers. Each server rack can be associated with at least one cooling unit from the plurality of cooling units. Further, a plurality of temperature sensors is installed in the data center to measure temperature values of the server racks. [0007] The scheduler determines one or more values of temperature of each server rack. The one or more values of temperature can be received either from the plurality of temperature sensors or from data center models like Computational Fluid Dynamic (CFD) model. In an embodiment, the received one or more values of temperature are compared with a threshold range. The comparison is performed to detect if the temperature of the server racks has reached beyond the threshold (regions where temperature is beyond the threshold range are termed as hotspots) and specific cooling units are activated to reduce the temperature of the server racks. Further, an operating condition (ON condition or OFF condition) of one or more cooling units proximal to the at least one cooling unit is determined. In an embodiment, the operating condition of each cooling unit is updated in the scheduler at regular time intervals or when there is a change in the operating condition.

[0008] In an embodiment one or more values of temperature of each server rack is estimated when corresponding at least one cooling unit is considered to be operated in OFF condition. This estimation is performed to check the effect of shutting down the at least one cooling unit on corresponding server rack. The estimation is performed by monitoring the operating load of the corresponding server racks, and the operating condition of the one or more cooling unit proximal to the at least one cooling unit. [0009] In an embodiment, a priority index is calculated for each cooling unit based on the determined one or more values of temperature of each server rack, estimated one or more values of temperature of each server rack and the operating condition of the one or more cooling units proximal to the at least one cooling unit. Thereafter, the scheduler generates a schedule for operating each cooling unit based on the priority index of corresponding cooling unit.

[00010] In an embodiment, the priority index of a cooling unit indicates severity of operating the cooling unit in OFF condition. The priority of the cooling unit is set to "0" if the cooling unit should not be turned OFF or not to be operated in OFF condition. Among the plurality of cooling units, there may be few redundant cooling units and many primary cooling units. The redundant cooling units are operated when the primary cooling units are turned OFF or operated in the OFF condition. Any cooling unit which has a priority index "0" indicates that turning OFF that cooling unit causes hotspots in corresponding server racks, and hence the cooling unit should not be turned OFF. In an embodiment, among the plurality of cooling units, one or more cooling units having a priority index higher than priority index of rest of the cooling units is turned OFF and redundant cooling units are turned ON. In an exemplary embodiment, a priority index of "1" is provided to the redundant units which are turned ON and priority of index of rest of the cooling units are incremented by "1". A person of ordinary skill will appreciate that the incremental value and the priority index can take any values and are not limited to the values that are used in the present invention.

Brief Description of the drawings

[00011] The subject matter of the invention will be explained in more detail in the following text with reference to preferred exemplary embodiments which are illustrated in the drawings, in which: [00012] Figure 1 illustrates a simplified block diagram of a data center, in accordance with an embodiment of the present disclosure; [00013] Figure 2 shows a simplified block diagram of a scheduler in a control system for controlling and operating cooling units in a data center, in accordance with an embodiment of the present disclosure;

[00014] Figure 3 shows an exemplary flowchart illustrating steps for generating a schedule for operating cooling units in a data center, in accordance with an embodiment of the present disclosure;

[00015] Figure 4A-Figure 4C shows exemplary block diagrams of server racks and corresponding cooling units, in accordance with an embodiment of the present disclosure; and

[00016] Figure 5 shows an exemplary flow chart illustrating steps of using a schedule for operating the cooling units in a data center.

Detailed description

[00017] In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments, which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments, and it is to be understood that other embodiments may be utilized, and that logical, mechanical, electrical and other changes may be made without departing from the scope of the embodiments. The following detailed description is, therefore, not to be taken in a limiting sense.

[00018] Figure 1 shows a simplified diagram of a data center (100). The data center (100) comprises a plurality of server racks (101A, 101B... 101 N)/ array of server racks (collectively referred as server racks (101)), a plurality of cooling units (102A, 102B... 102N) (collectively referred as cooling units (102)) and a scheduler (103). In an embodiment each cooling unit is a Computer Room Air Conditioning (CRAC) unit controlled by respective controllers (not shown). The cooling units (102) are used to provide cooling to the server racks (101). In an embodiment, each server rack in the server racks (101) comprises a plurality of servers (not shown).

[00019] In one embodiment, each server rack is allotted with at least one cooling unit from the cooling units (102). For example, server rack (101A) is allotted with cooling unit (102A). Likewise, each server rack (101A, 101B....101N) is allotted with at least one cooling unit (102A, 102B .102N). The allotted at least one cooling unit (102A) provides maximum cooling to particular area in the corresponding server rack (101A). In an embodiment, the at least one cooling unit (102A) is also referred as active cooling unit. In an embodiment, the data center (100) comprises a plurality of redundant cooling units (102R). The redundant cooling units (102R) are operated (turned ON) to provide uniform cooling when the active cooling units are shut down for maintenance or when there are faults in the active cooling units (102). The scheduler (103) is provided in the data center (100) for scheduling operations of the plurality of cooling units (102). Particularly, the scheduler (103) generates a schedule indicating one or more cooling units from the cooling units (102) to be operated in an ON condition and one or more cooling units to be operated in an OFF condition.

[00020] Figure 2 shows a block diagram of the scheduler (103). In an embodiment, the scheduler (103) can be a part of the control system provided in the data center (100) or can be a standalone system, integrated with the control system of the data center (100). The scheduler (103) comprises a temperature detection module (201), an operating condition detection module (202), a priority index generator (203), a schedule generator (204) a processor (205) and a memory (206).

[00021] In an embodiment, the temperature detection module (201) is configured to receive one or more values of temperature of each cooling unit from the cooling units (101). In an embodiment, the one or more values of temperature can be obtained from a plurality of temperature sensors installed in the data center (100). In an alternate embodiment, the one or more values of the temperature can be estimated using load of servers in each server rack. The estimation can be obtained from temperature models (CFD or alike) configured to estimate temperature of the data center (100) using various parameters including server load, number of cooling units, effect of cooling from each cooling unit, etc. In an embodiment, the processor (205) configures the temperature detection module (201) to obtain temperature of each server rack when corresponding cooling units (102) are assumed/ considered to be turned OFF/ operated in OFF condition. In an embodiment, the temperature of server racks (101) can be obtained during different operating conditions in the data center (100). For example, the temperature of each server rack is obtained during high loading of the servers, when corresponding cooling units are turned OFF.

[00022] In an embodiment, the operating condition detection module (202) obtains operating condition of each cooling unit. The operating condition is the OFF condition or the ON condition. The operating condition is obtained at regular time intervals (e.g., every half hour) or based on a trigger. The trigger can be generated when there is a change in operating condition of any of the cooling units (102). For example, if cooling unit (102A) changes operating state from ON condition to OFF condition. The operating condition detection module (202) stores the operating status of each cooling unit (102) in the memory (205). In an embodiment, the operating condition detection module (202) overwrites operating statuses stored in the memory (206) upon obtaining updated operating statuses.

[00023] In an embodiment, the priority index generator (203) determines a priority index for each cooling unit. The priority index indicates severity of turning OFF the cooling units. For example, the priority index can vary from 1-10. In an exemplary embodiment, a cooling unit (102N) having a priority index of 1 can indicate high severity of shutting down the cooling unit (102N). In an embodiment, a cooling unit (102P) having the priority index of 10 can indicate that the severity of shutting down the cooling unit (102P) is low. Thus, the cooling unit (102P) can be turned OFF and a redundant cooling unit (102) can be initiated.

[00024] In an embodiment, the schedule generator (204) generates a schedule based on the priority index of each cooling unit (102). The schedule provides an insight on operating the cooling units (102) in the data center (100). Based on the generated schedule, the cooling units (102) are automatically operated.

[00025] Figure 3 shows an exemplary flowchart for operating the plurality of cooling units (102). At step 301, the temperature detection module (201) determines one or more values of temperature of each server rack (102). Referring to Figure 4A, a server rack (101) is shown. As shown the server rack (101) is cooled by three cooling units (102A, 102B and 102C). Let cooling units (102A and 102B) provide maximum cooling to the server rack (101). Let cooling unit (102C) provide partial cooling to the server rack (101). In this scenario, let us consider that a plurality of temperature sensors is installed in the data center (100), preferably in the vicinity of the server rack (101). The temperature detection module (201) obtains one or more values of temperature of the server rack (101) from the plurality of temperature sensors. Alternatively, the temperature detection module (201) can obtain the one or more values of temperature from the temperature models.

[00026] At step 302, the processor (205) determines operating condition of one or more cooling units proximal to the at least one cooling unit corresponding to each server rack. As described earlier, each server rack is allotted with at least one cooling unit from the cooling units (102). Each cooling unit has proximal cooling units, i.e., cooling units neighboring to the at least one cooling unit. In some embodiments, the allotted at least one cooling unit can have only one proximal cooling unit. In some embodiment, the allotted at least one cooling unit (102) can have a plurality of proximal cooling units. The processor (205) retrieves the operating condition/ status of the one or more cooling units that is proximal to the allotted at least one cooling unit from the memory (206). The operating status of each cooling unit is stored in the memory (206) by the operating condition detection module (202).

[00027] Referring to Figure 4B, let the cooling units (102C and 102D) be in OFF condition and the cooling units (102A and 102B) be in ON condition. In one embodiment, while calculating priority index for cooling unit (102A), the cooling unit (102B and 102C) are considered as neighboring/ proximal cooling units. In another embodiment many cooling units (102) can be considered proximal to the cooling unit (102A). The processor (205) retrieves the operating status of the proximal cooling units. In this scenario, the operating status of the cooling unit (102B) is ON condition and the operating status of the cooling unit (102C) is OFF condition. Likewise, the operating status of proximal cooling units is determined for the allotted at least one cooling unit corresponding to each server rack.

[00028] At step 303, the processor (205) estimates one or more values of temperature of each server rack upon considering corresponding at least one cooling unit to be operated in the OFF condition. Here, the processor (205) calculates a zone of influence (Zol) of each cooling unit (102). Zol of a cooling unit can be defined as effect of cooling of the cooling unit on the corresponding server rack. In an embodiment, the Zol of the cooling unit can be calculated using data related to data center layout, position of the server rack with respect to the cooling unit, and temperature and flow rate of the cooling unit. In an embodiment, the Zol can be calculated using models where flow and heat parameters are determined in a sectioned room, like CFD models. In another embodiment, the Zol can be calculated using data collected for operation of data center and identifying effect of the cooling unit of the corresponding server rack. [00029] In an embodiment, the Zol is used to estimate one or more values of temperature of the server rack (101) if the corresponding at least one cooling unit is turned OFF. The estimation is performed to determine criticality of turning OFF the at least one cooling unit. The estimation is performed using the model data (CFD or like models). [00030] Referring now to Figure 4C, let us consider that cooling units (102A and 102B) are the active cooling units providing maximum cooling to the server rack. The cooling units (102A and 102B) are essential to cool the server rack (101). Let the cooling unit (102C) act as redundant cooling unit. Using model data, Zol of the cooling units (102A and 102B) are estimated. Let us consider that cooling unit (102A) is turned OFF. The processor (205) estimates the temperature of the server rack (101) with the cooling unit (102A) being turned OFF. In this scenario, let us consider that a hotspot/ coldspot (401) is created in the server rack (101) upon turning off the cooling unit (102A). The one or more values of temperature of the server rack (101) is estimated based on operating load of the plurality of servers in the server rack (101), and the operating condition of the proximal cooling units (102B and 102C).

[00031] At step 304, the priority index generator (203) determines the priority index for the at least one cooling unit corresponding to each server rack. The priority index indicates the criticality of switching OFF the at least one cooling unit. The priority index for a cooling unit is determined based on the one or more values of temperature of corresponding server rack, estimated one or more values of temperature of the server rack upon considering the cooling unit is operated in OFF condition and operating condition/ status of the neighboring/ proximal cooling units.

[00032] In an embodiment, if the one or more values of temperature of the server rack varies beyond a certain threshold range, then the cooling unit is associated with the priority index 0. A person of ordinary skill will appreciate that the value 0 is only representative and any such value can be associated to indicate criticality level. For example, on a scale of 0-10, 10 may be associated to indicate high criticality. In this disclosure, the value 0 indicates high criticality. The temperature of the server rack can vary due to high load, fault in the cooling unit (102), or during replacement of active cooling units with redundant cooling units (102R).

[00033] In an embodiment, if the estimated one or more values of temperature of the server rack (101) varies beyond the threshold range, then the corresponding cooling unit is associated with priority index 0. This indicates that the cooling unit cannot be turned OFF and if the cooling unit is turned OFF the temperature of the server rack varies beyond the threshold range, causing hotspots/ coldspots.

[00034] In an embodiment, if the operating status of the proximal cooling units is OFF, then the cooling unit is associated with the priority index 0. Since the proximal cooling units are turned OFF, the server rack cannot be provided cooling upon turning OFF the cooling unit. Thus, the cooling unit is associated with the priority index 0.

[00035] In an embodiment, based on at least one of the above three conditions, the priority index is set to 0. In an embodiment, if all the three conditions are met, then the priority index is set to 1. In a subsequent iteration, if the conditions are met, the priority index is incremented by 1. A person skilled in the art should appreciate that the priority index can be incremented by any value and incrementing by 1 is exemplary to this disclosure. Further, the priority index is calculated for each cooling unit. The cooling unit having the highest priority index is turned OFF. Thereafter, a redundant cooling unit is turned ON and the priority index is set to 1. Thus, on a scale of 0 to N, N indicates the highest priority for turning OFF a cooling unit. In an embodiment, the cooling unit having priority index 0 is not incremented until the three conditions as described above are met. [00036] In an embodiment, the priority index can be determined based on historical data including but not limited to, load variation, time since operating cooling unit in OFF condition, total number of cooling units (102), etc. For example, let us consider a first cooling unit is operated for 10 hours and a second cooling unit is operated for 4 hours. Based on the historical working and current operating conditions of corresponding server rack, the priority index of the first cooling unit can be higher than the priority index of the second cooling unit. Thus, the first cooling unit is more likely to be shut down. [00037] At step 305, the schedule generator (204) generates a schedule for turning OFF the cooling units (102) based on the priority index associated with each cooling unit (102). The schedule can indicate the order in which the cooling units (102) has to be turned OFF. [00038] In an embodiment, the schedule can be generated in various ways. In an exemplary embodiment, the schedule can be generated as shown in Figure 5.

[00039] At step 501, the scheduler (103) predicts the load for an upcoming time slot (e.g., for upcoming 30 minutes). The prediction can be made using the data center models.

[00040] At step 502, the scheduler (103) checks if cooling capacity on cooling units (102) operated in ON condition is greater than the predicted load. Here, the scheduler checks if any of the cooling units (102) has to be turned OFF. If the cooling capacity is lesser than the predicted load, then the cooling units (102) are continued to be operated in ON condition and as shown in step 503, the scheduler waits for the next time slot. If the cooling capacity is greater than the predicted load, step 504 is executed.

[00041] At step 504, the scheduler (103) calculates the Zol of cooling units (102) and priority index of each cooling unit corresponding to each server rack.

[00042] At step 505, the scheduler (103) identifies cooling units (102) having largest priority index.

[00043] At step 506, the scheduler (103) configures the controller of the identified cooling units (102) to turn OFF the cooling units (102). Further, the steps 502 to 506 are iteratively repeated.

In an embodiment, the identified cooling units are listed and are stored in the memory (206). Every time slot, the memory (206) is updated.

[00044] In an exemplary embodiment, the schedule can be generated by solving an optimization problem. For example, let D = [demand project matrix]_Txl

C = [Binary state of each cooling unit] cxi

5 = [Binary schedule matrix ] Txc

H = [Binary history of cooling unit state] _Iarj?e7._xC

Z_c = [Cooling capacity in zone of influence for each cooling unit]

R = [temprature measurement in landscape of

[00045] The objective of the optimization is to determine: min

[00046] In the above objective upon equalizing the utilization of each cooling unit (102) for a selected time. The objective will subject to constraints such that equations 1 (demand constraint) and 2 (temperature uniformity constraint) are satisfied:

Where,

Z_c for each S_t requires solving a non-linear model;

D is projected demand;

Where,

R_ref is the reference temperature at each sensor; and

D is allowable variation in temperature at given point. In an embodiment, the value of D is important to select as low-level controller will control the temperature within this limit.

[00047] In an embodiment, the demand can be calculated from historical data. In an embodiment, the historical data is stored in one or more databases associated with the data center (100). [00048] In an embodiment, the present invention results in reduction in unplanned shut-down of cooling units (102).

[00049] In an embodiment, the aspects of the present invention reduce hotspots and coldspots in the data center.

[00050] In an embodiment, the aspects of the present invention amount to balanced utilization of cooling units (102), and thereby results in reduced energy consumption. [00051] This written description uses examples to describe the subject matter herein, including the best mode, and also to enable any person skilled in the art to make and use the subject matter. The patentable scope of the subject matter is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal language of the claims.

Referral Numerals

Data center - 100

Server racks - 101

Cooling units - 102

Scheduler - 103

Temperature detection module - 201

Operation condition detection module - 202

Priority index generator - 203

Schedule generator - 204

Processor - 205

Memory - 206

hotspots/ coldspots - 401

Claims

We claim:

1. A method of operating a plurality of cooling units in a data center, wherein the plurality of cooling units are controlled and operated for regulating temperature of a plurality of servers placed in a plurality of server racks, wherein each server rack is allotted with at least one cooling unit from the plurality of cooling units, wherein the method performed by a control system comprising a scheduler, the method comprising:

determining one or more values of temperature of each server rack, wherein the one or more values of temperature is compared with a threshold range;

determining an operating condition of one or more cooling units proximal to the least one cooling unit allotted to corresponding to each server rack;

estimating one or more values of temperature of each server rack upon considering corresponding at least one cooling unit to be operated in an“OFF” condition by calculating a Zone of Influence (Zol) of the corresponding at least one cooling unit, and based on an operating load of each server rack and operating condition of the one or more cooling units proximal to the at least one cooling unit;

determining a priority index for each cooling unit based on the determined one or more values of temperature of corresponding server rack, the operating condition associated with the one or more cooling units and the estimated one or more values of temperature of corresponding server rack, wherein the priority index indicates severity of operating each cooling unit; and generating a schedule for selectively operating each cooling unit in an“ON’ condition or the“OFF” condition based on the priority index associated with corresponding cooling units, wherein each cooling unit is operated based on the generated schedule.

2. The method as claimed in claim 1, wherein the Zol is indicative of effect of a cooling unit on corresponding server rack.

3. The method as claimed in claim 1, wherein the operating load of each server is predicted based on historical working of respective server.

4. The method as claimed in claim 1, wherein the priority index of a cooling unit is set to a first value when at least one of the determined one or more values of temperature of the corresponding cooling unit varies beyond the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“OFF” condition.

5. The method as claimed in claim 1, wherein the priority index of a cooling unit is incremented by a predefined value when the determined value of temperature of the corresponding cooling unit is within the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“ON” condition, wherein the cooling unit is operated in“OFF’ condition upon the priority index of the cooling unit being greater than the priority index of remaining cooling units in the data center.

6. The method as claimed in claim 1, wherein the priority index indicates severity of operating each cooling unit in the“OFF’ condition, wherein the priority index is determined for each cooling unit based on historical data and current data related to each cooling unit

7. A control system comprising a scheduler for operating a plurality of cooling units in a data center, wherein the plurality of cooling units is controlled and operated for regulating temperature of a plurality of servers placed in a plurality of server racks, the scheduler comprising:

a memory; and

one or more processors configured to:

determine one or more values of temperature of each server rack, wherein the one or more values of temperature are compared with a threshold range;

determine operating condition associated with one or more cooling units proximal to the allotted at least one cooling unit corresponding to each server rack;

estimate one or more values of temperature of each server rack upon considering corresponding at least one cooling unit to be operated in an“OFF” condition by calculating a Zone of Influence (Zol) of the corresponding at least one cooling unit, and based on an operating load of each server rack and operating condition of the one or more cooling units proximal to the at least one cooling unit;

determine a priority index for each cooling unit based on the determined one or more values of temperature of corresponding server rack, the operating condition associated with the one or more cooling units and the estimated one or more values of temperature of corresponding server rack, wherein the priority index indicates severity of operating each cooling unit; and

generate a schedule for selectively operating each cooling unit in the“ON^* condition or the“OFF” condition based on the priority index associated with corresponding cooling units, wherein each cooling unit is operated based on the generated schedule.

8. The scheduler as claimed in claim 7, wherein the one or more processors are configured to set a first value of the priority index of a cooling unit when at least one of the determined one or more values of temperature of the corresponding cooling unit varies beyond the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“OFF” condition.

9. The scheduler as claimed in claim 7, wherein the one or more processors are configured to increment the priority index of a cooling unit by a predefined value when the determined one or more values of the temperature of the corresponding cooling unit is within the threshold range and the at least one cooling unit proximal to the cooling unit is operated in the“ON” condition, wherein the cooling unit is operated in“OFF” condition upon the priority index of the cooling unit being greater than the priority index of remaining cooling units in the data center.

10. The scheduler as claimed in claim 6, wherein the Zol is indicative of effect of a cooling unit on corresponding server rack.