WO2015059710A2 - Système et procédé de surveillance et de commande d'état thermique d'un centre de données en temps réel - Google Patents

Système et procédé de surveillance et de commande d'état thermique d'un centre de données en temps réel Download PDF

Info

Publication number
WO2015059710A2
WO2015059710A2 PCT/IN2014/000119 IN2014000119W WO2015059710A2 WO 2015059710 A2 WO2015059710 A2 WO 2015059710A2 IN 2014000119 W IN2014000119 W IN 2014000119W WO 2015059710 A2 WO2015059710 A2 WO 2015059710A2
Authority
WO
WIPO (PCT)
Prior art keywords
heat generating
data center
generating devices
temperatures
cold
Prior art date
Application number
PCT/IN2014/000119
Other languages
English (en)
Other versions
WO2015059710A3 (fr
Inventor
Anirudh DEODHAR
Harshad Girish BHAGWAT
Umesh SINGH
Narayanan SANKARA
Bhavani PANNAMANENI
Amarendra Kumar Singh
Rajesh Jayaprakash
Anand Sivasubramaniam
Original Assignee
Tata Consultancy Services Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Limited filed Critical Tata Consultancy Services Limited
Publication of WO2015059710A2 publication Critical patent/WO2015059710A2/fr
Publication of WO2015059710A3 publication Critical patent/WO2015059710A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling

Definitions

  • the present subject matter described herein in general, relates to a method and a system for monitoring and controlling thermal state of a data center.
  • Data centers are centralized repository for housing and managing number of electronic equipments: These electronic equipments are generally placed in number of housing units such as racks arranged in the data center.
  • the electronic equipments may comprise servers, computers, communication devices etc.
  • There are frequent temperature variations inside the data center on account of variety of causes which include variation in heat load patterns with changing utilization of servers, change in ambient temperatures due to seasons and temperature cycling in cooling units among others. These result in creation of hot and cold conditions in different regions of the data center, which may cause serious damages to the electronic equipments placed therein.
  • a data center manager has to confront a dual challenge of ensuring the thermal safety of data center equipment and simultaneously keep the cooling costs at a minimum. Individual control of cooling units does not cater to global need of energy efficient operation.
  • a data center manager tends to overcool the data center rather than bearing the risk of unsafe thermal operation. This leads to unnecessary cooling costs. Therefore a data center needs a centralized and continuous monitoring and controlling system in order to maintain the data center in a thermally safe yet an energy efficient state.
  • CRAC or other types of air cooling units are used to take away the heat generated by the electronic equipment.
  • the CRACs may be further classified into supply-air controlled CRAC and return-air controlled CRAC.
  • Compressors of a typical return air controlled CRAC may l frequently switch over a time and thus results in a huge temperature variation in supply temperatures.
  • This calls for a robust monitoring and control system, which would provide corrective actions based on the analysis of current temperature scenario in the data center and yet maintain energy efficiency.
  • a data center encompasses a complex interplay of fluid flow and heat transfer, where a number of CRACs and racks interact with each other directly or indirectly.
  • CRACs and racks interact with each other directly or indirectly.
  • Some known techniques may be used for identifying the affected regions in the data center. In order to locate the affected regions, large number of temperature sensors may be used which may add cost to the overall solution. Minimizing the use of the temperature sensors is yet another concern in the known techniques.
  • the affected regions may be an indicative for the data center being in any one of the hot condition and the cold condition.
  • a system for real-time monitoring and control to optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center comprises a processor and a memory coupled to the processor for executing a set of modules stored in the memory.
  • the set of modules comprises an obtaining module, a temperature analyzing module, a computing module, an identification module, and a generation module.
  • the obtaining module is configured to obtain continuously a first-set of temperatures for the one or more heat generating devices in the real-time during every predefined time interval (t).
  • the each temperature of the first-set of temperatures is estimated by a thermal predictor in the real-time and further, a temperature of each heat generating device is obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t).
  • the first-set of temperatures obtained is analyzed by the temperature analyzing module in order to identify the one or more heat generating devices under one of a hot condition and a cold condition, and further to categorize state of the data center into one of a hot detection mode and a cold detection mode.
  • the computing module is configured to compute one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition.
  • the identification module is configured to identify a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and based on a historical control signal log.
  • the generation module is configured to iteratively generate a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner.
  • method for real-time monitoring and control to optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center comprises a step of obtaining continuously a first-set of temperatures for the one or more heat generating devices in the real-time during a pre-defined time interval (t).
  • the each temperature of the first-set of temperatures is estimated by a thermal predictor in the real-time and further, a temperature of each heat generating device is obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t).
  • the method is provided for analyzing the first-set of temperatures obtained for identifying the one or more heat generating devices under one of a hot condition and a cold condition, and further categorizing state of the data center into one of a hot detection mode and a cold detection mode. Also, the method is provided for computing one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition. The method is further enabled for identifying a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and further based on a historical control signal log.
  • the method Upon identification of the target cooling unit, the method is provided for iteratively generating a control signal comprising one or more gradual changes in an operational parameter of the target codling unit to optimize operation of the data center in a stepwise manner.
  • the method for the obtaining, the analyzing, the identifying, and the iteratively generating are performed by the processor.
  • computer program product having embodied thereon a computer program for real-time monitoring and control to optimize operation of a data center by controlling operational parameter of one or more cooling units impacting one or more heat generating devices in the data center.
  • the computer program product comprises a step of obtaining continuously a first-set of temperatures for the one or more heat generating devices in the real-time during a pre-defined time interval (t).
  • the each temperature of the first-set of temperatures is estimated by a thermal predictor in the realtime and further, temperature of each heat generating device is obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t).
  • an instruction is provided for analyzing the first-set of temperatures in order to identify the one or more heat generating devices under one of a hot condition and a cold condition, and further to categorize state of the data center into one of a hot detection mode and a cold detection mode. Also, an instruction is further provided for analyzing the first-set of temperatures in order to identify the one or more heat generating devices under one of a hot condition and a cold condition and further to categorize state of the data center into one of a hot detection mode and a cold detection mode. Further, the instruction is enabled for computing one of a hot-reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition.
  • an instruction is further provided for identifying a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and further based on historical control signal log.
  • an instruction is provided for iteratively generating a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner.
  • Figure 1 illustrates a network implementation of a system for real-time monitoring and control to optimize operation of a data center is shown, in accordance with an embodiment of the present subject matter.
  • Figure 2 illustrates the system, in accordance with an embodiment of the present subject matter.
  • FIG 3 illustrates a detailed working of the system, in accordance with an embod iment of the present subject matter.
  • Figure 4 illustrate detail explanation of the obtaining module for obtaining the temperatures of the heat generating devices, in accordance with one embodiment of the present subject matter.
  • Figure 5 illustrates detail description of the temperature analyzing module for analyzing the first-set of temperatures, in accordance with an embodiment of the present subject matter.
  • FIG. 6 illustrates detail explanation of identification module and the generation module, in accordance with an embodiment of the present subject matter.
  • Figure 7 illustrates detail working of the system 102 for achieving a stable thermal state, in accordance with one embodiment of the present subject matter.
  • Figure 8 illustrates a method for real-time monitoring and control to optimize operation of a data center, in accordance with an embodiment of the present subject matter.
  • the data center comprises one or more cooling units and one or more heat generating devices.
  • the "one or more cooling units” hereinafter referred as “cooling units” may be a computer room air condition (CRAC) or computer room air handler (CRAH) or other types of air cooling units which may be used for cooling the data center.
  • the CRACs may be of different types like supply-air controlled CRAC and return-air controlled CRAC.
  • the "one or more heat generating devices” hereinafter referred as “heat generating devices” may comprise a number of electronic equipments capable of generating heat such as various types of servers, computers, communication devices etc.
  • the heat generating devices may be mounted upon different sized rack-cabinets placed in a prearranged order based on dynamics of the data center.
  • the data center operation is very dynamic in nature due to server utilization changes, seasonal temperature variations and CRAC cycling among others. Therefore, there are twin challenges of maintaining the thermal safety of the data center as well as ensuring an energy efficient operation.
  • the present subject matter provides the means to address these challenges.
  • temperature prediction from the heat generating devices may be required in a real-time. It may be noted that, the present subject matter is enabled for obtaining the temperatures of the heat generating devices in a real-time fashion at one or more instances during a pre-defined time interval (t). Further, each temperature of the heat generating devices may be predicted by a thermal predictor explained in detail in subsequent paragraphs of the detail description. Based on the temperatures obtained/predicted, thermal condition of the heat generating devices may be identified i.e., a heat generating device may either identified in a hot condition or in a cold condition for a particular time interval. Similarly, on basis of temperatures obtained/predicted, state of the data center may be categorized into any one of a hot detection mode and cold detection mode.
  • recommendations may be generated for optimizing the operations of the data center by controlling operational parameter of the cooling units impacting the heat generating devices in the data center.
  • the operational parameter may be a set-point to be determined in terms of a temperature for the cooling units.
  • the cooling units may be prioritized based on their impact on the heat generating devices in the data center.
  • the determination of the set-point may be done for the cooling units prioritized.
  • the thermal state of the data center may be controlled in a stepwise manner. Based on the priority, at a single instance, only one cooling unit amongst all the cooling units in the data center is designated as a most-influential cooling unit or a target cooling unit for which the set-point is to be determined.
  • the set-point determined for the most-influential cooling unit may be applied and its impact on the thermal state of the data center may be monitored. Based on the impact on the thermal state of the data center in response of the set-point applied, the present subject matter is enabled for designating another cooling unit, based on the priority, as a next most- influential cooling unit from the cooling units. All over again, for said next most-influential cooling unit designated based on the priority, a set-point is determined and applied for controlling the thermal state of the data center.
  • only one cooling unit i.e., the most-influential cooling unit may be taken into consideration for determination of the set-point at a time rather than determining the set-point for each of the cooling units impacting the heat generating devices in the data center and further applying the set-points of all the cooling units at a same time.
  • the present subject matter is enabled to control return-air controlled type CRACs where direct control of supply temperature may not be possible. This makes the present subject matter to be used in a more generalized manner and also gives a wider application domain.
  • the system 102 facilitates optimization of the operation of the data center by controlling operational parameter of cooling units impacting heat generating devices in the data center.
  • the system 102 may obtain continuously a first-set of temperatures for the one or more heat generating devices during a pre-defined time interval (t).
  • Each temperature of the first-set of temperatures may be estimated by a thermal predictor, where a temperature of each heat generating device is obtained at each instance of pre-determined number of instances of the pre-defined time interval (t).
  • the system 102 may analyze the first-set of temperatures to identify one or more heat generating devices under one of a hot condition and a cold condition, and to categorize state of the data center into one of a hot detection mode and a cold detection mode. Thereafter, the system 102 may compute one of a hot- reference temperature and a cold-reference temperature for the one or more heat generating devices classified under one of the hot condition and the cold condition. Upon computation, the system 102 may further identify a target cooling unit amongst the one or more cooling units based on a collective influence of the one or more cooling units on the one or more heat generating devices classified under one of the hot condition and the cold condition, and historical control signal log. Further, the system 102 may to iteratively generate a control signal comprising one or more gradual changes in an operational parameter of the target cooling unit to optimize operation of the data center in a stepwise manner.
  • the system 102 is implemented for optimizing operation of the data center on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a lapto computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2...104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.
  • the network 106 may be a wireless network, a wired network or a combination thereof.
  • the network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like.
  • the network 106 may either be a dedicated network or a shared network.
  • the shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another.
  • the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
  • the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206.
  • the at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
  • the at least one processor 202 is configured to fetch and execute computer-readable instructions or modules stored in the memory 206.
  • the I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like.
  • the I/O interface 204 may allow the system 102 to interact with a user directly or through the client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown).
  • the I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite.
  • the I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
  • the memory 206 may include any computer-readable medium or computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or nonvolatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, a compact disks (CDs), digital versatile disc or digital video disc (DVDs) and magnetic tapes.
  • volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM)
  • nonvolatile memory such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, a compact disks (CDs), digital versatile disc or digital video disc (DVDs) and magnetic tapes.
  • ROM read only memory
  • ROM erasable programmable ROM
  • flash memories such as compact disks (CDs), digital versatile disc or digital video disc (DVDs) and magnetic tapes.
  • CDs compact disks
  • the modules 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types.
  • the modules 208 may include an obtaining module 210, a temperature analyzing module 212, a computing module 214, an identification module 216, a generation module 218, a user-interface module 220 and other modules 222.
  • the other modules 222 may include programs or coded instructions that supplement applications and functions of the system 102. According to embodiments of present subject matter, the other modules 222 may comprise a thermal predictor (302).
  • the data 224 serves as a repository for storing data processed, received, and generated by one or more of the modules 208.
  • the data 224 may also include an influence index metrics 226, a historical control signal log 228, and other data 230. Further, it may be noted that the influence index metrics 226 may be obtained based on a methodology/technique disclosed in an Indian Patent application 652/MUM/201 1 , hereinafter incorporated as a reference.
  • a user may use the client device 104 to access the system 102 via the I/O interface 204. The users may register them using the I/O interface 204 in order to use the system 102.
  • the working of the system 102 may be explained in detail in Figure 3 explained below.
  • the system 102 may be used for real-time monitoring and control to optimize operation of a data center.
  • the optimization may be done by controlling operational parameter of cooling units impacting heat generating devices in the data center.
  • FIG. 3 a detailed working of the system 102 is illustrated, in accordance with an embodiment of the present subject matter.
  • the purpose of the present subject matter is to provide a thermally safe as well as an energy efficient data center. Due to its dynamic nature, the temperature pattern of the data center may undergoes numerous changes frequently due to variety of reasons like changing in load utilizations of the heat generating devices, seasonal variations, and cycling of cooling units. Thus, it may be the purpose of the subject matter to enable the system 102 to overcome such dynamic nature of the data center by quickly adapting to such changing conditions in a real-time and providing recommendations accordingly.
  • the system 102 is enabled for providing real-time monitoring and control to optimize operation of the data center.
  • Optimizing the operation of the data center may be performed by controlling operational parameter of cooling units impacting heat generating devices in the data center.
  • the heat generating devices may comprise a number of electronic equipments capable of generating heat such as various types of servers, computers, communication devices etc.
  • the dynamic parameters like seasonal variations, variations in utilization of the heat generating devices and cycling of the cooling units, may take the data center into either an over-heated or an over-cooled condition.
  • the over-heated and the Over-cooled condition may be referred as a hot detection mode and cold detection mode respectively throughout the detail description of the present subject matter.
  • the system 102 comprises various modules 208 stored in memory 206 of the system 102.
  • One of such module is an obtaining module 210 which may be configured to continuously obtain temperatures for the heat generating devices in the real-time at one or more instances during a pre-defined time interval (t).
  • the detail explanation of the obtaining module 210 for obtaining the temperatures of the heat generating devices may be understood by referring a flow diagram (400) of figure 4.
  • the temperature obtaining process may be initiated by the obtaining module 210, where the temperatures may be obtained continuously for pre-defined time interval (t) as shown in block 404.
  • the temperatures obtained by the obtaining module 210 (for the heat generating devices in the data center) may be referred as a first-set of temperatures.
  • Each temperature of the first-set of temperatures may be estimated by a thermal predictor 302.
  • a temperature of each heat generating device may be obtained at each instance of a predetermined number of instances of the pre-defined time interval (t).
  • the thermal predictor 302 may be a module/set of instructions stored in the memory 206 of the system 102.
  • the thermal predictor 302 may use an influence index metrics 226, and set of supply temperatures of the cooling units, power dissipation of the heat generating devices, or combinations thereof. Further, the set of supply temperatures of the cooling units may be obtained from one or more sensors (not shown in figure). The one or more sensors may be a temperature sensing device capable for sensing the supply temperatures from the cooling units.
  • the influence index metrics 226 may be an indicative of an influence of different sources in the data center (i e. the cooling units and hot air recirculation from outlets of the heat generating devices) onto the data center's targets (i.e. inlets of the heat generating devices) in terms of air flow distribution. For each heat generating device in the data center, a metric is obtained through the influence index metrics 226, each of which indicates influence of the above discussed sources on the heat generating devices. Further, the influence index metrics 226 may be computed by using one of mathematical and experimental methods (as disclosed in the Indian Patent Application 652/MUM/201 1).
  • the present subject matter is capable of using different set of influence index metrics preconfigured in database of the system 102 for several varieties of data center i.e., depending upon the dynamics (design and air-flow configurations) of the data center.
  • the one or more sensors may be a temperature sensing device.
  • the temperatures sensing device may be further capable for recording and storing the supply temperature of the cooling units sensed and further transmitting the temperature to a desired location.
  • the each temperature of the first-set of temperatures estimated by the thermal predictor may be obtained.
  • a temperature for each heat generating device may be obtained at each instance of predetermined number of instances in the pre-defined time interval (t).
  • the first-set of temperatures for the heat generating devices may be obtained for further analysis.
  • the time interval (t) consisting of n instances of obtaining temperatures may be measured backwards in time from the current real-time instance of obtaining the temperatures. Every new instance of obtaining the temperatures may update the first-set of temperatures by adding the temperatures obtained at current real-time instance to the set and deleting the temperatures obtained at nth instance measured backwards in time from the current instance.
  • the first-set of temperatures are continuously obtained and stored in the database of the system 102, it may be further processed by a temperature analyzing module 212 as shown in figure 3 in a real-time.
  • the detail description of the temperature analyzing module 212 for analyzing the first-set of temperatures may be understood by referring a flow diagram 500 of figure 5 as below.
  • the first-set of temperatures (FST) 412 may be received for being analyzed by a temperature analyzing module 212.
  • One of a purpose of analyzing the FST may be to identifying the heat generating devices (HGD) under one of hot condition and cold condition.
  • Another purpose for analyzing the first-set of temperatures (FST) may be to categorize the data center into one of the hot detection mode and the cold detection mode.
  • the temperature analyzing module To overcome such one or more concerns, the temperature analyzing module
  • the 212 may be configured to continuously monitor and analyze the first-set of temperatures (received at the block 502) for pre-defined time interval (t).
  • the next step of the analysis performed by the temperature analyzing module 212 at block 504 (conditional block), where the FST are analyzed with a threshold temperature (T T re shoid) to verify the heat generating devices falling under hot condition or cold condition.
  • T T re shoid threshold temperature
  • those heat generating devices satisfying the above condition i.e., corresponding temperatures of the heat generating devices crossing the threshold temperature at one or more instances in a sub-time interval (t hd ) of the pre-defined time interval (t) may be classified under the hot condition.
  • those heat generating devices falling under the hot condition may be listed along with their temperatures (hot-reference temperatures). Further, the corresponding temperatures for heat generating devices categorized/falling under the hot condition may be referred as a second-set of temperatures, thus the second-set of temperatures are subset of the first-set of temperatures.
  • the sub-time interval (thd) By introducing the sub-time interval (thd) while performing the analysis at the block 504, it may be used as a check-point by the temperature analyzing module 212 to avoid any hot detection of the heat generating devices due to sporadic and insignificant surges in temperatures caused by variety of above discussed reasons in the data center. Since, the heat generating devices are classified under the hot detection, if their corresponding temperatures crossing the threshold temperature (T T h re5 hoid) at one or more instances of the sub-time interval (thd), it may facilitate more accurate categorization of the heat generating devices falling under the hot condition.
  • T T h re5 hoid threshold temperature
  • the heat generating devices may be classified under the hot condition, if the temperatures of the heat generating devices are crossing the threshold temperature (T Thresh0 id) for two instances of the one or more instances in the sub-time interval (thd).
  • the determination of the sub-time interval (thd) may be done on basis of a detail study of the dynamics of the data center. It may be noted to a person skilled in the art that the heat generating devices may be classified under the hot condition, if the temperatures of the heat generating devices are crossing the threshold temperature (T T hres oid) for "x" number of instances of the one or more instances in the sub-time interval (thd).
  • a hot-reference temperature may be determined for each heat generating device identified under the hot condition.
  • the hot-reference temperature may be referred to a temperature obtained for the heat generating devices at any instance of the one or more instance in the sub-time interval (thd).
  • the hot-reference temperature may also be determined by performing either one of mathematical or statistical operation on the first set of temperatures.
  • the heat generating devices falling under the hot condition along with their respective hot-reference temperatures may be shortlisted at block 506, for applying corrective actions which is explained later in detail.
  • the heat generating devices may be identified/classified under the cold condition.
  • the identified heat generating devices under the cold condition along with their cold-reference temperature may be shortlisted at block 510.
  • the t cd may also be used at the block 508 as a check-point for identifying the heat generating devices under the cold condition. Further, for determining t c d detail knowledge of the data center dynamics may be required Thus, by analyzing the first- set of temperatures for another predefined time interval (t C d) may facilitate more accurateness for identifying the heat generating devices falling under the cold condition.
  • temperatures of the heat generating devices falling under the cold condition may be referred as a third-set of temperatures, where the third-set of temperatures may be a sub-set of the first-set of temperatures.
  • a cold-reference temperature may be determined for each heat generating device identified under the cold condition.
  • the cold-reference temperature may refer to a temperature obtained for the heat generating device having maximum number of instances along with a maximum temperature value i.e., a most common maximum temperature. It may be further noted to a person skilled in the art, that the cold-reference temperature may be referred to a temperature having different combinations of instances with different combinations of temperature value.
  • the determination of the most common maximum temperature is more necessary as it has been generally observed that such type of CRAC have fluctuating supply temperatures due to switching of their compressors. Due to such fluctuation, the temperature of the heat generating devices may also get fluctuated. Thus, it necessitates the determination of the cold-reference temperature for the heat generating devices and then considers the heat generating devices for applying the corrective actions. Thus, at the block 510, the heat generating devices under the cold condition along with their cold reference temperature may be listed for further processing.
  • the first-set of temperatures obtained, the second-set of temperature corresponding to the heat generating devices identified under the hot condition, the third-set of temperatures corresponding to the heat generating devices identified under the cold condition, and the threshold temperature (Threshold) may be stored in a database of the memory 206 of the system 102.
  • the data center may be categorized under the hot detection mode even if only one heat generating device is identified in the hot condition. In such case, only the hot reference temperature may be determined and thus, no cold reference temperature is determined.
  • a computing module 214 (figure 3) is configured to compute the hot-reference temperatures and the cold- reference temperatures in order to take the corrective actions or provide recommendations for controlling the unbalanced thermal state of the data center.
  • the identification module 216 and the generation module 218 may be required.
  • the detailed explanation of both the modules i.e., 216 and 218 for generating recommendation may be understood by referring a flow diagram 600 of figure 6.
  • operational parameters of the cooling units needs to controlled.
  • the list of the heat generating devices classified under the hot condition and the cold condition are received by the identification module 216 for analysis. From the list received, the identification module 216 may perform a check at block 604, to check whether the data center is in a hot detection or a cold detection mode. . If data center is under the hot detection mode, only hot reference temperatures may be determined for the heat generating devices under the hot condition. Further, if the data center is in the cold detection mode, only cold reference temperatures may be calculated for the heat generating devices under the cold condition.
  • the next step may be performed by the identification module 216 is to identify a most-impacting or most-influential cooling unit amongst the cooling units impacting the heat generating devices.
  • the most-impacting or the most-influential cooling unit may be referred as a "target cooling unit”.
  • the cooling units impacting the heat generating devices (under the hot condition) may be analyzed by the identification module 216.
  • Each cooling unit of the cooling units may have an influence on the heat generating devices (under the hot condition) in the data center.
  • the influence may be in terms of a "metric" which may be derived from an influence index metrics (as disclosed in the Indian Patent application 652/MUM/201 1 ).
  • the metric indicates an impact of the cooling unit on the heat generating devices in the data center.
  • the identification module 216 may be configured to determine a collective influence of the cooling units based on the metric. According to embodiments of present subject matter, the collective influence may be determined by a statistical technique.
  • the statistical technique may determine the collective influence based on temperature predicted for each of the heat generating devices and the total number of the heat generating devices affected by each cooling unit of the cooling units. Further, it may be noted to a person skilled in art, that there may other statistical and/or mathematical techniques may be used for determining the collective influence for the cooling units.
  • the cooling units may be prioritized on basis of their impact i.e., collective influence.
  • a cooling unit amongst the cooling units having the maximum collective influence on the racks identified under hot detection may be considered as a target cooling unit.
  • a priority list comprising one or more target cooling units may be obtained on basis of ascending order of their impact at block 608, where said priority list may be referred as a "hot priority list".
  • the identification module 216 may be further configured to determine a priority list of one or more target cooling units impacting the heat generating devices in the cold condition. But, the priority list may be determined in an ascending order of their impact on the heat generating devices, where said priority list may be referred as a "cold priority list". It may be noted to a person skilled in art, that practically only one priority list may be generated, because the data center may be detected either in the hot detection mode or in cold detection mode at a time. Thus, depending upon the mode- (the hot detection mode or the cold detection mode) of the data center, the priority list may be processed.
  • the identification module 216 may be further configured to refer a historical control signal log 228 (figure 3).
  • the historical control signal log 228 may comprise a historical recommendation data for the cooling units in the data center.
  • the historical recommendation data may indicate outcomes associated with previous recommendations. The outcomes may be a transition of the data center into the hot detection mode or into the cold detection mode.
  • the identification module 216 may check whether the current hot detection mode of the data center is a direct consequence of a previous recommendation i.e., previously recommended operational parameter for a target cooling unit. As, it may be possible that by implementing a cold recommendation i.e. a recommendation provided against a cold detection mode of the data center, typically consisting of increasing the set point of one or more target cooling units, the state of the data center may transit from the cold detection mode to the hot detection mode again.
  • a cold recommendation i.e. a recommendation provided against a cold detection mode of the data center, typically consisting of increasing the set point of one or more target cooling units
  • the hot detection mode may be assumed to be a direct consequence of that previous cold recommendation implemented.
  • the generated hot recommendation may be finalized for being implemented at block 612.
  • the hot recommendation i.e., combinations of set- points (set points before the recommendation) responsible for the hot condition may be stored in a database as a hot flag.
  • the system 102 may ensure that this hot flagged combination of set-points may not be recommended again for another fixed time interval T p . Therefore all such combinations of set points that may be present in a generated priority list may be deleted from the finalized priority list. According to embodiments of present subject matter, all combinations of set-points may get reset after said another fixed time interval T p .
  • cold priority list (having one or more target cooling units) for the heat generating devices under the cold condition may have to be finalized.
  • the identification module 216 may be configured to check recommendations (set-points) from the database for the hot flag. If any combination of set-points found to be matched with the hot flag stored in the database, then the identification module 216 may further be configured to delete set-point combinations from the cold priority list. After performing this operation, the cold priority list may be finalized and implemented. Once implemented, the recommendation (the set point combination) is again deleted from the cold priority list, so that the second recommendation in the cold priority list takes it place at the top.
  • the set-point referred in the above discussions may be considered as operational parameter of the cooling units, more specifically, target cooling units.
  • a generation module 218 is configured to iteratively generate a control signal to optimize operation of the data center in a stepwise manner.
  • the control signal generated may comprise one or more gradual changes in an operational parameter of the target cooling unit.
  • the operational parameter i.e., the set-point in terms of a temperature for the target cooling unit identified.
  • the set points may be increased if data center is in cold detection mode. This recommendation is referred to as cold recommendation. Conversely, set points may be decreased if the data center is in hot detection mode.
  • the system 102 passes the control over the obtaining module 210.
  • the obtaining module 210 may be further configured for obtaining the first-set of temperatures (temperatures of the heat generating devices) for a next pre-defined time interval (t).
  • FIG. 7 detail working of the system 102 for achieving a stable thermal state is explained in figure 7. From the figure 7, it may be seen that for two CRACs i.e., CRAC 1 and CRAC 2, the set-points are being controlled for achieving the stable thermal state in a data center.
  • a stability curve as shown in the figure 7 is an imaginary curve representing the combination of set points for the two CRACs (CRAC 1 and CRAC 2) which would maintain the data center in a thermally stable and energy efficient state.
  • the imaginary curve will be a constant provided the heat dissipation pattern of the heat generating devices and other data center environment remains constant. Any combination of set points below the stability curve represents cold detection mode for the data center.
  • any combination of set points above the curve represents a hot detection mode. Therefore the curve represents an umbrella for the set points of the two CRACs (CRAC 1 and CRAC 2) which will ensure no hot detection of the data center.
  • a set point combination located close to the curve would represent one of the near optimum states for the data center both from thermal safety as well as energy efficiency point of view.
  • the curve may be replaced by a multidimensional entity if more CRACs are present.
  • the present set-point for CRAC 1 and CRAC 2 is x and y respectively at state 1, where the state 1 is in a cold region i.e., under cold detection mode.
  • a cold recommendation may be provided for the state 1 .
  • the cold recommendation may comprise modified set-points as "x+1 and y" to reach at another state 2.
  • the set-point of the CRAC 1 is incremented by 1 i.e., a gradual increase and the set- point of CRAC 2 is remained same (y).
  • a further recommendation may be provided by the system 102.
  • the further recommendation may be provided by modifying the set-point combinations as x+1 and y+ 1 which takes the data center from the state 2 to state 3.
  • the set-point of CRAC 1 remains same and the set-point of CRAC 2 is modified from y to y+1 i.e., a further gradual increase in the set-point of CRAC 2.
  • the state 3 seems to be closer to the stability curve, but it is still falling under the cold detection mode for which cold recommendation is further provided by the system 102.
  • the recommendation for the state 3 may be provided by modifying the current set- points of state 2 i.e., x+1 and y+1 into a new combination of set-points i.e., x+1 and y+2. This new combination of set-points takes the data center into a hot region i.e., under hot detection mode at state 4.
  • the new combination of set-points recommended for state 3 may be a direct consequence of previously generated recommendations for the cold detection mode.
  • Another combination of set-point may be directly implemented as the recommendation.
  • Another combinations of set-point is x+2 and y+1, wherein the set-point of CRAC 1 is gradually increased by 1 and the set-point of CRAC 2 is gradually decreased by 1.
  • This combination transits the data center into a state 5 which is still falling under the hot detection mode.
  • the system 102 further recommends a return to previous set-points i.e., x+1 and y+1. Further, since the system has found the state 4 and state 5 to be under hot detection, the set point combinations of the state 4 and the state 5 may be hot flagged and hence will not be recommended again for a given interval of time Tp.
  • the state 3 will be a stable and optimum state of the data center for given heat dissipation in given data center environment.
  • the system 102 is enabled to achieve the stable and optimum thermal state in the data center in step-wise manner.
  • the data center in the hot detection mode or in the cold detection mode may be notified by a notification message to a user.
  • the system 102 also comprises an user-interface module 220 configured to display layout of the data center, the first-set of temperatures, notification messages indicative of the datacenter being into one of the hot detection mode and the cold detection mode, and the gradual changes generated to be applied in the operational parameter i.e., the set-point.
  • the layout of the data center comprises arrangements of the heat generating devices and cooling units in the data center.
  • the recommendations generated in terms of the set-points of the cooling units may be automatically implemented by the system 102. Specifically, the system 102 may automatically control the set-points of the target cooling units, based on the recommendations provided, via an interface between the cooling units and a control center of the data center.
  • the operation of the data center may be optimized by controlling operational parameters of cooling units impacting heat generating devices in the data center.
  • the method 800 may be described in the general context of computer executable instructions.
  • computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types.
  • the method 800 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network.
  • computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
  • the order in which the method 800 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 800 or alternate methods. Additionally, individual blocks may be deleted from the method 800 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combinations thereof. However, for ease of explanation, in the embodiments described below, the method 800 may be considered to be implemented in the above described system 102.
  • a first-set of temperatures for the heat generating devices in the data center may be obtained during a pre-defined time interval (t).
  • Each temperature of the first-set of temperatures may be estimated by a thermal predictor 302.
  • the thermal predictor 302 may be a module/set of instruction stored in memory 206 of the system 102. Further, the thermal predictor 302 may estimate the each temperature of the first-set of temperatures by using influence index metrics 226, supply temperature of the cooling units, power dissipation of the heat generating devices, or combination thereof. Further, a temperature of the first-set of temperatures associated with each heat generating device may be obtained at each instance of a pre-determined number of instances of the pre-defined time interval (t).
  • the first-set of temperatures obtained may be analyzed for identification of the heat generating devices in one of a hot condition and a cold condition.
  • the first-set of temperatures may also be analyzed for categorizing state of the data center into one of a hot detection mode and a cold detection mode.
  • a second-set of temperatures and a third-set of temperatures may be selected.
  • the second-set of temperature refers to a set of temperatures crossing a threshold temperature (Threshold) at one or more instances in a sub-time interval (t hc j) of the predefined time interval (t).
  • the heat generating devices under the hot condition may be identified.
  • a hot-reference temperature may be determined for each heat generating device identified under the hot condition.
  • the third-set of temperatures refers to a set of temperatures for the heat generating devices not falling under the hot condition for a predefined time interval (t C d).
  • the heat generating devices under the cold condition may be identified.
  • a cold-reference temperature may be determined for each heat generating device identified under the cold condition.
  • the hot reference temperature and the cold-reference temperature determined for the heat generating devices detected under the hot detection mode and the cold detection mode may be computed for taking corrective actions or provide recommendations for controlling thermal state of the data center.
  • one or more target cooling units may be identified amongst the cooling units impacting the heat generating devices in the data center.
  • the one or more target cooling units identified may be based on collective influence of the cooling units on the heat generating devices classified under one of the hot condition and the cold condition. Further, the one or more target cooling units identified may also be based on a historical control signal log 228.
  • the target cooling unit identified is most-impacting or most-influential cooling unit amongst the cooling units in the data center. Upon identification, the one or more target cooling units may be prioritized based on their influence/impact on the heat generating devices.
  • a control signal comprising gradual changes in operational parameter of the one or more target cooling units may be iteratively generated to optimize operation of the data center.
  • the operational parameter may be a set-point in terms of a temperature for the one or more target cooling units identified.
  • the gradual changes may be applied to the set- point of the one or more target cooling units depending upon the condition of the data center i.e., the hot detection mode and cold detection mode.
  • the set-point when the data center is detected in the hot detection mode, the set-point is decremented gradually by predefined value using gradual changes in order to achieve a stable thermal state in the data center. Similarly, when the data center is detected in the cold detection mode, the set-point is incremented gradually by a predefined value using gradual changes for achieving the stable thermal state in the data center.
  • the system 102 may be enabled for achieving the stable thermal state gradually in a stepwise manner
  • the system 102 provides an energy efficient method for optimizing operation of the data center in a real-time, thus saving energy which is consumed by cooling units for controlling thermal state of the data center.
  • the system 102 is enabled for gradually achieving and maintaining a stable and close to optimum thermal and energy state in the data center thereby eliminating the possible overcooling costs. [0075] The system 102 is enabled for providing corrective recommendations in terms of set points of the cooling units which facilitates the control of data center cooled by a return air-controlled CRAC, where the return air-controlled CRACs may not provide user with an explicit control over the supply temperature.
  • the system 102 by using influence index metrics for temperature prediction eliminates the need of extensive temperature sensor network across the data center. Thus, the sensors are only used for sensing supply temperature of CRACs and no other sensors are required in the data center.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

La présente invention concerne un procédé et un système permettant une surveillance et une commande en temps réel afin d'optimiser le fonctionnement d'un centre de données par une commande d'un paramètre de fonctionnement d'unités de refroidissement agissant sur des dispositifs générateurs de chaleur dans le centre de données. Un premier ensemble de températures pour les dispositifs générateurs de chaleur est acquis en permanence à des fins d'analyse. Lors de l'analyse, les dispositifs générateurs de chaleur sont identifiés dans un état chaud ou dans un état froid, et ensuite le centre de données est catégorisé dans un mode de détection chaud ou un mode de détection froid. Parmi les unités de refroidissement, une unité de refroidissement cible est déterminée pour être utilisée pour optimiser le fonctionnement du centre de données. Par ailleurs, un signal de commande est généré d'une manière itérative comprenant des changements progressifs des paramètres de fonctionnement pour l'unité de refroidissement cible identifiée. Le paramètre de fonctionnement comprend en outre une consigne de l'unité de refroidissement cible qui est incrémentée ou décrémentée d'une valeur prédéfinie sur la base de la catégorisation du centre de données.
PCT/IN2014/000119 2013-10-21 2014-02-25 Système et procédé de surveillance et de commande d'état thermique d'un centre de données en temps réel WO2015059710A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN3290/MUM/2013 2013-10-21
IN3290MU2013 IN2013MU03290A (fr) 2013-10-21 2014-02-25

Publications (2)

Publication Number Publication Date
WO2015059710A2 true WO2015059710A2 (fr) 2015-04-30
WO2015059710A3 WO2015059710A3 (fr) 2015-11-12

Family

ID=52993712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2014/000119 WO2015059710A2 (fr) 2013-10-21 2014-02-25 Système et procédé de surveillance et de commande d'état thermique d'un centre de données en temps réel

Country Status (2)

Country Link
IN (1) IN2013MU03290A (fr)
WO (1) WO2015059710A2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104793667A (zh) * 2015-05-05 2015-07-22 国家电网公司 高寒地区配电自动化终端加热方法
CN107876347A (zh) * 2017-11-30 2018-04-06 武汉劲野科技有限公司 用于金属合金处理的高温固化炉的控制方法
KR20190059207A (ko) 2017-11-22 2019-05-30 동우 화인켐 주식회사 광변환 수지 조성물 및 이를 포함하는 광변환 적층기재, 이를 이용한 화상표시장치
CN116963482A (zh) * 2023-09-21 2023-10-27 广东云下汇金科技有限公司 一种基于数据中心暖通系统的智能化节能方法及相关设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8290629B1 (en) * 2006-12-18 2012-10-16 Sprint Communications Company L.P. Airflow management
US8509959B2 (en) * 2010-08-12 2013-08-13 Schneider Electric It Corporation System and method for predicting transient cooling performance for a data center
US8531839B2 (en) * 2010-10-29 2013-09-10 International Business Machines Corporation Liquid cooled data center with alternating coolant supply lines
US8725307B2 (en) * 2011-06-28 2014-05-13 Schneider Electric It Corporation System and method for measurement aided prediction of temperature and airflow values in a data center
US10180665B2 (en) * 2011-09-16 2019-01-15 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Fluid-cooled computer system with proactive cooling control using power consumption trend analysis

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104793667A (zh) * 2015-05-05 2015-07-22 国家电网公司 高寒地区配电自动化终端加热方法
KR20190059207A (ko) 2017-11-22 2019-05-30 동우 화인켐 주식회사 광변환 수지 조성물 및 이를 포함하는 광변환 적층기재, 이를 이용한 화상표시장치
KR20190059208A (ko) 2017-11-22 2019-05-30 동우 화인켐 주식회사 광변환 수지 조성물 및 이를 포함하는 광변환 적층기재, 이를 이용한 화상표시장치
CN107876347A (zh) * 2017-11-30 2018-04-06 武汉劲野科技有限公司 用于金属合金处理的高温固化炉的控制方法
CN116963482A (zh) * 2023-09-21 2023-10-27 广东云下汇金科技有限公司 一种基于数据中心暖通系统的智能化节能方法及相关设备
CN116963482B (zh) * 2023-09-21 2023-12-05 广东云下汇金科技有限公司 一种基于数据中心暖通系统的智能化节能方法及相关设备

Also Published As

Publication number Publication date
IN2013MU03290A (fr) 2015-07-17
WO2015059710A3 (fr) 2015-11-12

Similar Documents

Publication Publication Date Title
Gill et al. Holistic resource management for sustainable and reliable cloud computing: An innovative solution to global challenge
US10404547B2 (en) Workload optimization, scheduling, and placement for rack-scale architecture computing systems
US10331185B2 (en) Temperature trend controlled cooling system
US10041844B1 (en) Fluid flow rate assessment by a non-intrusive sensor in a fluid transfer pump system
EP3089034B1 (fr) Système et procédé pour optimiser la consommation d'énergie par des processeurs
US10234926B2 (en) Method and apparatus for customized energy policy based on energy demand estimation for client systems
US7975156B2 (en) System and method for adapting a power usage of a server during a data center cooling failure
US20170286252A1 (en) Workload Behavior Modeling and Prediction for Data Center Adaptation
US9495272B2 (en) Method and system for generating a power consumption model of at least one server
US9541971B2 (en) Multiple level computer system temperature management for cooling fan control
JP6193393B2 (ja) 分散コンピューティングシステムのための電力の最適化
US10721845B2 (en) System and method for optimizing cooling efficiency of a data center
US20160217378A1 (en) Identifying anomalous behavior of a monitored entity
WO2020109937A1 (fr) Procédé d'optimisation d'infrastructure de centre de données basé sur un apprentissage causal
WO2015059710A2 (fr) Système et procédé de surveillance et de commande d'état thermique d'un centre de données en temps réel
US11620539B2 (en) Method and device for monitoring a process of generating metric data for predicting anomalies
US10423201B2 (en) Method and apparatus for demand estimation for energy management of client systems
US9753773B1 (en) Performance-based multi-mode task dispatching in a multi-processor core system for extreme temperature avoidance
Khan et al. Advanced data analytics modeling for evidence-based data center energy management
Fu et al. SPC methods for nonstationary correlated count data with application to network surveillance
US10216606B1 (en) Data center management systems and methods for compute density efficiency measurements
JP2009193205A (ja) 自動チューニングシステム、自動チューニング装置、自動チューニング方法
Yang et al. Optimization of virtual resources provisioning for cloud applications to cope with traffic burst
US20220345419A1 (en) Human supervision and guidance for autonomously configured shared resources
Wang et al. DeepScaling: Autoscaling Microservices With Stable CPU Utilization for Large Scale Production Cloud Systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14855864

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14855864

Country of ref document: EP

Kind code of ref document: A2