WO2017107187A1 - Techniques de détection d'anomalie pour des serveurs et des centres de données - Google Patents

Techniques de détection d'anomalie pour des serveurs et des centres de données Download PDF

Info

Publication number
WO2017107187A1
WO2017107187A1 PCT/CN2015/098925 CN2015098925W WO2017107187A1 WO 2017107187 A1 WO2017107187 A1 WO 2017107187A1 CN 2015098925 W CN2015098925 W CN 2015098925W WO 2017107187 A1 WO2017107187 A1 WO 2017107187A1
Authority
WO
WIPO (PCT)
Prior art keywords
historical
probability value
operating condition
collective
data center
Prior art date
Application number
PCT/CN2015/098925
Other languages
English (en)
Inventor
Cong Li
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2015/098925 priority Critical patent/WO2017107187A1/fr
Publication of WO2017107187A1 publication Critical patent/WO2017107187A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/81Threshold

Definitions

  • Embodiments herein generally relate to component health monitoring with respect to servers and data centers.
  • sensors may be used to monitor the run-time status of the data center and the servers therein.
  • sensors may be used to monitor the internal air temperatures of servers in the data center, the air intake temperatures at air inlets of such servers, the power supply voltages of such servers, and/or one or more other operating conditions.
  • Many operating conditions may fluctuate widely in the ordinary course of ongoing operation.
  • a substantially-elevated temperature within a given server at a given point in time may result simply from a particularly high workload, rather from any problem with a cooling fan. As such, it may be desirable to perform some such anomaly detection based on sets of observations.
  • FIG. 1 illustrates an embodiment of a first operating environment.
  • FIG. 2 illustrates an embodiment of a second operating environment.
  • FIG. 3 illustrates an embodiment of a server.
  • FIG. 4 illustrates an embodiment of an apparatus.
  • FIG. 5 illustrates an embodiment of a first logic flow.
  • FIG. 6 illustrates an embodiment of a second logic flow.
  • FIG. 7 illustrates an embodiment of a storage medium.
  • FIG. 8 illustrates an embodiment of a computing architecture.
  • FIG. 9 illustrates an embodiment of a communications architecture.
  • an apparatus may comprise circuitry and an anomaly detection component for execution by the circuitry to identify an observation set comprising a plurality of observations associated with an operating condition of a data center and determine a probability value set for the observation set, the probability value set to comprise a plurality of probability values, each of the plurality of probability values to correspond to a respective one of the plurality of observations in the observation set, the anomaly detection component for execution by the circuitry to determine a collective probability value for the observation set as a product of the plurality of probability values in the probability value set and determine whether the observation set is indicative of an anomaly by comparing the collective probability value to a collective probability value threshold for the operating condition.
  • Other embodiments are described and claimed.
  • Various embodiments may comprise one or more elements.
  • An element may comprise any structure arranged to perform certain operations.
  • Each element may be implemented as hardware, software, or any combination thereof, as desired for a given set of design parameters or performance constraints.
  • an embodiment may be described with a limited number of elements in a certain topology by way of example, the embodiment may include more or less elements in alternate topologies as desired for a given implementation.
  • any reference to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
  • the appearances of the phrases “in one embodiment, ” “in some embodiments, ” and “in various embodiments” in various places in the specification are not necessarily all referring to the same embodiment.
  • FIG. 1 illustrates an example of an operating environment 100 that may be representative of various embodiments.
  • a data center 102 comprises a plurality of servers 112.
  • the plurality of servers 112 are distributed among four server racks 104, 106, 108, and 110.
  • data center 102 may generally occupy and/or be comprised within a substantially-enclosed area 114.
  • substantially-enclosed area 114 may comprise a dedicated room or enclosure for data center 102.
  • substantially-enclosed area 114 may comprise a region within a room or enclosure that houses other equipment in addition to data center 102.
  • a climate control system 116 may generally be configured to maintain desired/appropriate ambient air conditions in substantially-enclosed area 114.
  • climate control system 116 may be configured to provide cooling in order to counteract the generation of heat by the various servers 112 of data center 102 and/or other devices in substantially-enclosed area 114. Additional examples of operations that climate control system 116 may perform in various embodiments in conjunction with maintaining desired/appropriate ambient air conditions in substantially-enclosed area 114 may include –without limitation –heating, humidification, dehumidification, and air purification.
  • various components of the devices therein may reasonably be expected to wear out and fail over time.
  • the failure of a given component may disrupt operations within data center 102.
  • such a disruption may result in reduced availability of data stored in data center 102 to clients wishing to access that data.
  • a health monitoring scheme may be implemented in order to enable identification of unhealthy components in data center 102.
  • FIG. 2 illustrates an example of an operating environment 200 that may be representative of the implementation of such a health monitoring scheme for data center 102 of FIG. 1 according to some embodiments. More particularly, operating environment 200 may be representative of a health monitoring scheme according to which potential component health issues are identified based on detected anomalies associated with various types of measurements and/or other indicators of conditions within data center 102.
  • data center 102 In operating environment 200, during ongoing operation, data center 102 generates and/or assembles health parameters 218 and provides them to a health monitoring management node 220.
  • health parameters 218 may comprise measurements, indicators, and/or other parameters that are generally indicative of one or more operating conditions within data center 102.
  • health parameters 218 may include one or more measurements, indicators, and/or other parameters that are generally indicative of ambient air conditions within data center 102.
  • health parameters 218 may additionally or alternatively include one or more measurements, indicators, and/or other parameters that are generally indicative of other types of operating conditions within data center 102. Examples of health parameters 218 according to some embodiments may include –without limitation –temperature measurements, voltage measurements, current measurements, magnetic field measurements, humidity measurements, and air purity measurements. It is to be appreciated that health monitoring management node 220 may not necessarily be external to data center 102.
  • health monitoring management node 220 may be hosted/implemented on a server within data center 102. It is also to be appreciated that in some embodiments, some or all of health parameters 218 may be indicative of operating conditions of/at particular components within data center 102. For example, a given health parameter 218 may comprise a measurement of the power supply voltage at a particular server within data center 102. As such, in various embodiments, health parameters 218 may originate from multiple devices and/or locations within data center 102. The embodiments are not limited in this context.
  • FIG. 3 illustrates an example of a server 300 that may be representative of some embodiments.
  • server 300 may be representative of a given one of the plurality of servers 112 in operating environment 100 of FIG. 1.
  • server 300 may comprise one or more sensors of various types.
  • server 300 may comprise one or more internal air temperature sensors 350, which may be operative to measure the ambient air temperature at one or more locations within a case, chassis, or cabinet of server 300.
  • server 300 may additionally or alternatively comprise one or more component temperature sensors 352, which may be operative to measure the temperatures of one or more components within server 300.
  • server 300 may additionally or alternatively comprise one or more air intake temperature sensors, which may be operative to measure the temperature of incoming air as it is drawn into server 300 through one or more air inlets.
  • server 300 may additionally or alternatively comprise one or more power supply voltage sensors 356, which may be operative to measure the power supply voltage that is provided to server 300 via a voltage regulator of server 300.
  • server 300 may additionally or alternatively comprise one or more other sensors 358, which may perform measurements associated with one or more other characteristics associated with conditions at/in server 300. Examples of such other characteristics may include –without limitation –measurements of the power supply current at server 300, measurements of a magnetic field incident upon server 300, measurements of the humidity of the ambient air within server 300 and/or the air flowing into server 300, and measurements of the purity of the ambient air within server 300 and/or the air flowing into server 300. It is to be appreciated that FIG. 3 is simply intended to illustrate examples of various types of sensors that may be comprised in server 300, and is not intended to depict a complete architecture for server 300. The embodiments are not limited in this context.
  • health monitoring management node 220 may generally provide component health monitoring for data center 102.
  • health monitoring management node 220 may analyze health parameters 218 with which it is provided by data center 102.
  • health monitoring management node 220 may identify potential component health issues by detecting anomalies in values of health parameters 218.
  • health parameters 218 may include measurements provided by one or more servers within data center 102.
  • the measurements provided by any particular one of such servers may include measurements performed by one or more sensors of that server, such as measurements performed by one or more of the example sensors previously discussed in reference to server 300 of FIG. 3.
  • health monitoring management node 220 may generally use historical health parameters 222 to check for anomalies in the values of receive health parameters 218.
  • historical health parameters 222 may generally comprise previously observed values with respect to one or more health parameters.
  • historical health parameters 222 may include information indicating, or usable to determine, a historically-observed distribution with respect to one or more characteristics associated with conditions within data center 102.
  • health monitoring management node 220 may comprise one of multiple health monitoring management nodes that provides component health monitoring for data center 102.
  • data center 102 may comprise one of multiple data centers for which health monitoring management node 220 provides component health monitoring. The embodiments are not limited in this context.
  • FIG. 4 illustrates a block diagram of an example of an apparatus 400 that may implement health monitoring management node 220 of FIG. 2 according to some embodiments.
  • apparatus 400 comprises multiple elements including circuitry 402, memory 404, and storage 406.
  • the embodiments, however, are not limited to the type, number, or arrangement of elements shown in this figure.
  • apparatus 400 may comprise circuitry 402.
  • Circuitry 402 may be arranged to execute one or more software or firmware implemented modules or components, which may include a modeling component 408 and an anomaly detection component 410.
  • circuitry 402 may comprise circuitry of a processor or logic device, such as a complex instruction set computer (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, an x86 instruction set compatible processor, a processor implementing a combination of instruction sets, a multi-core processor such as a dual-core processor or dual-core mobile processor, or any other microprocessor or central processing unit (CPU) .
  • CISC complex instruction set computer
  • RISC reduced instruction set computing
  • VLIW very long instruction word
  • x86 instruction set compatible processor a processor implementing a combination of instruction sets
  • a multi-core processor such as a dual-core processor or dual-core mobile processor, or any other microprocessor or central processing unit (CPU)
  • circuitry 402 may comprise circuitry of a dedicated processor, such as a controller, a microcontroller, an embedded processor, a chip multiprocessor (CMP) , a co-processor, a digital signal processor (DSP) , a network processor, a media processor, an input/output (I/O) processor, a media access control (MAC) processor, a radio baseband processor, an application specific integrated circuit (ASIC) , a field programmable gate array (FPGA) , a programmable logic device (PLD) , and so forth.
  • a dedicated processor such as a controller, a microcontroller, an embedded processor, a chip multiprocessor (CMP) , a co-processor, a digital signal processor (DSP) , a network processor, a media processor, an input/output (I/O) processor, a media access control (MAC) processor, a radio baseband processor, an application specific integrated circuit (ASIC) , a field programmable gate array (F
  • circuitry 402 may be implemented using any of various commercially available processors, including –without limitation – processors; application, embedded and secure processors; processors; IBM and processors; Core (2) Core i3, Core i5, Core i7, Xeon processors; and similar processors. The embodiments are not limited in this context.
  • apparatus 400 may comprise or be arranged to communicatively couple with memory 404.
  • Memory 404 may be implemented using any machine-readable or computer-readable media capable of storing data, including both volatile and non-volatile memory.
  • memory 404 may include read-only memory (ROM) , random-access memory (RAM) , dynamic RAM (DRAM) , Double-Data-Rate DRAM (DDRAM) , synchronous DRAM (SDRAM) , static RAM (SRAM) , programmable ROM (PROM) , erasable programmable ROM (EPROM) , electrically erasable programmable ROM (EEPROM) , flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, or any other type of media suitable for storing information.
  • ROM read-only memory
  • RAM random-access memory
  • DRAM dynamic RAM
  • memory unit 404 may be included on the same integrated circuit as circuitry 402, or alternatively some portion or all of memory 404 may be disposed on an integrated circuit or other medium, for example a hard disk drive, that is external to the integrated circuit of circuitry 402.
  • memory 404 is comprised within apparatus 400 in FIG. 4, memory 404 may be external to apparatus 400 in some embodiments. The embodiments are not limited in this context.
  • apparatus 400 may comprise storage 406.
  • Storage 406 may be implemented as a non-volatile storage device such as, but not limited to, a magnetic disk drive, optical disk drive, tape drive, an internal storage device, an attached storage device, flash memory, battery backed-up SDRAM (synchronous DRAM) , and/or a network accessible storage device.
  • storage 406 may include technology to increase the storage performance enhanced protection for valuable digital media when multiple hard drives are included, for example.
  • storage 406 may include a hard disk, floppy disk, Compact Disk Read Only Memory (CD-ROM) , Compact Disk Recordable (CD-R) , Compact Disk Rewriteable (CD-RW) , optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of DVD devices, a tape device, a cassette device, or the like. The embodiments are not limited in this context.
  • modeling component 408 may be executed by circuitry 402 to configure a probabilistic model 424 for an operating condition of a data center 450.
  • probabilistic model 424 may generally define a set of expectations regarding observations associated with that operating condition.
  • the operating condition may generally comprise an ambient condition within data center 450, such as an ambient air temperature within data center 450.
  • the operating condition may generally comprise a condition associated with a particular component within data center 450, such as a condition associated with data center node 460.
  • data center node 460 may comprise a server of data center 450, and the operating condition may comprise a temperature within that server.
  • Examples of operating conditions for which modeling component 408 may configure a probabilistic model 424 may include –without limitation –temperatures, voltages, currents, magnetic fields, humidity levels, and air purity levels. It is to be appreciated in various embodiments, modeling component 408 may be executed by circuitry 402 to configure multiple probabilistic models 424 for multiple respective operating conditions of data center 450. The embodiments are not limited in this context.
  • probabilistic model 424 may define a distribution that observations/measurements associated with an operating condition of data center 450 are expected to exhibit in the absence of a health issue.
  • a probabilistic model 424 for a power supply voltage of data center node 460 may generally define a distribution that measurements of that power supply voltage are expected to exhibit if a voltage regulator of data center node 460 is functioning properly.
  • probabilistic model 424 may define a univariate probability distribution for observations associated with the operating condition.
  • Examples of such a univariate probability distribution may include a normal distribution, a Student’s t-distribution, a chi-square distribution, an F-distribution, a gamma distribution, and a multinomial distribution.
  • the embodiments are not limited to these examples.
  • modeling component 408 may configure a probabilistic model 424 for an operating condition of data center 450 based on a plurality of historical observations 426 associated with that operating condition.
  • historical observations 426 may comprise a set of previous observations/measurements associated with the operating condition.
  • data center node 460 comprises a server
  • historical observations 426 may comprise a set of previously-measured temperatures within that server
  • modeling component 408 may configure probabilistic model 424 by identifying, for a given probability distribution, distribution parameters that best fit that distribution to a plurality of historical observations 426.
  • modeling component 408 may configure probabilistic model 424 by identifying values of ⁇ and ⁇ that best fit a normal distribution to a plurality of historical observations 426.
  • modeling component 408 may retrieve historical observations 426 from storage 406 in conjunction with configuring probabilistic model 424 based on historical observations 426. The embodiments are not limited to this example.
  • probabilistic model 424 may comprise information indicating –or usable to determine –an estimated probability that any given observation associated with an operating condition of data center 450 will match a particular value or fall within a particular range of values.
  • probabilistic model 424 may comprise information indicating –or usable to determine –an estimated probability that any given measurement of the temperature within that server will be equal to a value T, or will fall within a range of values T 1 to T 2 .
  • the embodiments are not limited to this example.
  • apparatus 400 may receive an observation set 462 from data center node 460.
  • observation set 462 may generally comprise a set of recent observations/measurements associated with a same operating condition of data center 450as are historical observations 426.
  • historical observations 426 comprise previously-measured temperatures within a server
  • observation set 462 may comprise more recently measured temperatures within that server. The embodiments are not limited to this example.
  • anomaly detection component 410 may determine a probability value set 436 based on observation set 462.
  • probability value set 436 may comprise a respective probability value for each observation in observation set 462.
  • the embodiments are not limited in this context.
  • anomaly detection component 410 may determine a collective probability value 438 for observation set 462.
  • collective probability value 438 may comprise a probability value indicating an estimated probability of observing the set of observations/measurements comprised in observation set 462.
  • anomaly detection component 410 may determine collective probability value 438 based on probability value set 436.
  • anomaly detection component 410 may use probability value set 436 to determine collective probability value 438 based on a independence assumption.
  • anomaly detection component 410 may determine collective probability value 438 based on probability value set 436 according to an algorithm that treats each probability value in probability value set 436 as independent variable with respect to each other probability value in probability value set 436.
  • CPV represents collective probability value 438.
  • the embodiments are not limited in this context.
  • anomaly detection component 410 may determine a collective probability value 438 for that observation set 462 and compare the collective probability value 438 to a collective probability value threshold 434 for the operating condition. In some embodiments, anomaly detection component 410 may determine that the observation set 462is not indicative of an anomaly when the collective probability value 438 is greater than the collective probability value threshold 434. In various embodiments, anomaly detection component 410 may determine that the observation set 462is indicative of an anomaly when the collective probability value 438 is less than the collective probability value threshold 434. The embodiments are not limited in this context.
  • modeling component 408 may determine a collective probability value threshold 434 for any given operating condition of data center 450 according to a threshold determination procedure.
  • the threshold determination procedure may generally comprise determining a collective probability value threshold 434 for a given operating condition of data center 450 based on historical observations 426 associated with that operating condition.
  • modeling component 408 may identify a plurality of historical observation sets 428 associated with an operating condition of data center 450.
  • each such historical observation set 428 may comprise a plurality of historical observations 426 associated with the operating condition. The embodiments are not limited in this context.
  • modeling component 408 may determine a plurality of historical probability value sets 430 based on a plurality of historical observation sets 428.
  • each of the plurality of historical probability value sets 430 may correspond to a respective one of the plurality of historical observation sets 428.
  • each one of the plurality of historical probability value sets 430 may comprise a respective probability value for each observation in its corresponding historical observation set 428.
  • modeling component 408 may determine each probability value in each historical probability value set 430 based on the probabilistic model 424 for the operating condition with which the plurality of historical observation sets 428 are associated.
  • modeling component 408 may determine that plural number T of n-element historical probability value sets 430 of the form where represents a probability value indicating an estimated probability of observing according to the applicable probabilistic model 424.
  • the embodiments are not limited in this context.
  • modeling component 408 may determine a plurality of historical collective probability values 432 based on a plurality of historical probability value sets 430.
  • each of the plurality of historical collective probability values 432 may correspond to a respective one of the plurality of historical probability value sets 430.
  • each one of the plurality of historical collective probability values 432 may comprise a probability value indicating an estimated probability of observing the set of observations/measurements comprised in a respective one of the plurality of historical observation sets 428.
  • modeling component 408 may use the plurality of historical probability value sets 430 to determine the plurality of historical collective probability values 432 based on a independence assumption.
  • modeling component 408 may determine the plurality of historical collective probability values 432 based on the plurality of historical probability value sets 430 according to an algorithm that treats each probability value in any particular historical probability value set 430 as an independent variable with respect to each other probability value in that historical probability value set 430.
  • HCPV ⁇ represents the historical collective probability value 432 associated with the ⁇ th historical probability value set 430.
  • the embodiments are not limited in this context.
  • modeling component 408 may determine a collective probability value threshold 434 for a given operating condition of data center 450 based on a plurality of historical collective probability values 432 associated with that operating condition. In various embodiments, modeling component 408 may determine such a collective probability value threshold 434 as a smallest historical collective probability value 432 comprised among the plurality of historical collective probability values 432 associated with the operating condition. In some other embodiments, modeling component 408 may determine such a collective probability value threshold 434 as a value corresponding to a predefined percentile with respect to the plurality of historical collective probability values 432. In various embodiments, based on a plural number T of historical collective probability values 432, modeling component 408 may determine collective probability value threshold 434 according to Equation (3) as follows:
  • represents the collective probability value threshold 434.
  • the embodiments are not limited in this context.
  • anomaly detection component 410 may generate anomaly detection information 440.
  • anomaly detection information 440 may comprise information identifying the operating condition with respect to which anomaly detection component 410 has identified the anomaly.
  • anomaly detection information 440 may additionally or alternatively comprise information identifying a component with which the operating condition is associated. For example, having detected an anomaly in the internal temperature of a particular server, anomaly detection component 410 may generate anomaly detection information 440 that identifies the server and indicates that anomalous internal temperature readings have been observed that indicate a possible health issue with respect to a cooling fan of that server. The embodiments are not limited to this example.
  • FIG. 1 Some of the figures may include a logic flow. Although such figures presented herein may include a particular logic flow, it can be appreciated that the logic flow merely provides an example of how the general functionality as described herein can be implemented. Further, the given logic flow does not necessarily have to be executed in the order presented unless otherwise indicated. In addition, the given logic flow may be implemented by a hardware element, a software element executed by a processor, or any combination thereof. The embodiments are not limited in this context.
  • FIG. 5 illustrates an example of a logic flow 500 that may be representative of operations that may be performed in conjunction with one or more of the disclosed techniques according to various embodiments.
  • logic flow 500 may be representative of a threshold determination procedure that modeling component 408 of FIG. 4 may perform in some embodiments in order to determine a collective probability value threshold 434 for an operating condition of data center 450.
  • a plurality of historical observation sets may be identified at 502 that are associated with an operating condition of a data center.
  • modeling component 408 of FIG. 4 may identify a plurality of historical observation sets 428 that are associated with an operating condition of data center 450.
  • a plurality of historical probability value sets may be determined that are associated with an operating condition of a data center, and each of the plurality of historical probability value sets may correspond to a respective one of a plurality of historical observation sets associated with that operating condition of the data center.
  • modeling component 408 of FIG. 4 may determine a plurality of historical probability value sets 430 associated with an operating condition of data center 450, each of which may correspond to a respective one of a plurality of historical observation sets 428 associated with that operating condition of data center 450.
  • a plurality of historical collective probability values may be determined that are associated with an operating condition of a data center, and each of the plurality of historical collective probability values may correspond to a respective one of a plurality of historical probability value sets associated with that operating condition of the data center.
  • modeling component 408 of FIG. 4 may determine a plurality of historical collective probability values 432 associated with an operating condition of data center 450, each of which may correspond to a respective one of a plurality of historical probability value sets 430 associated with that operating condition of data center 450.
  • a collective probability value threshold associated with an operating condition of a data center may be determined based on a plurality of historical collective probability values associated with that operating condition of the data center.
  • modeling component 408 of FIG. 4 may determine a collective probability value threshold 434 for an operating condition associated with data center 450 based on a plurality of historical collective probability values 432 associated with that operating condition of data center 450.
  • the embodiments are not limited to these examples.
  • FIG. 6 illustrates an example of a logic flow 600 that may be representative of operations that may be performed in conjunction with one or more of the disclosed techniques according to various embodiments.
  • logic flow 600 may be representative of operations that may be performed in some embodiments by anomaly detection component 410 of FIG. 4 in conjunction with the performance of component health monitoring for data center 450.
  • an observation set may be identified at 602 that comprises a plurality of observations associated with an operating condition of a data center.
  • anomaly detection component 410 of FIG. 4 may identify an observation set 462 that comprises a plurality of observations associated with an operating condition of data center 450.
  • a probability value set may be determined for the observation set, and the probability value set may comprise a plurality of probability values, each one of which may correspond to a respective one of the plurality of observations in the observation set identified at 602.
  • anomaly detection component 410 of FIG. 4 may determine, for observation set 462, a probability value set 436 comprising a plurality of probability values, each one of which may correspond to a respective one of a plurality of observations in observation set 462.
  • a collective probability value may be determined for the observation set as a product of the plurality of probability values in the probability value set.
  • a collective probability value 438 for observation set 462 may determine a collective probability value 438 for observation set 462 as a product of a plurality of probability values in a probability value set 436 for observation set 462.
  • the collective probability value may be compared to a collective probability value threshold for the operating condition in order to determine whether the observation set is indicative of an anomaly.
  • anomaly detection component 410 of FIG. 4 may compare a collective probability value 438 to a collective probability value threshold 434 in order to determine whether an observation set 462 indicates an anomaly with respect to an operating condition of data center 450.
  • the embodiments are not limited to these examples.
  • FIG. 7 illustrates an embodiment of a storage medium 700.
  • Storage medium 700 may comprise any non-transitory computer-readable storage medium or machine-readable storage medium, such as an optical, magnetic or semiconductor storage medium. In various embodiments, storage medium 700 may comprise an article of manufacture.
  • storage medium 700 may store computer-executable instructions, such as computer-executable instructions to implement one or both of logic flow 500 of FIG. 5 and logic flow 600 of FIG. 6.
  • Examples of a computer-readable storage medium or machine-readable storage medium may include any tangible media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth.
  • Examples of computer-executable instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, object-oriented code, visual code, and the like. The embodiments are not limited in this context.
  • FIG. 8 illustrates an embodiment of an exemplary computing architecture 800 suitable for implementing various embodiments as previously described.
  • the computing architecture 800 may comprise or be implemented as part of an electronic device.
  • the computing architecture 800 may be representative, for example, of one or both of health monitoring management node 220 of FIG. 2 and apparatus 400 of FIG. 4. The embodiments are not limited in this context.
  • a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium) , an object, an executable, a thread of execution, a program, and/or a computer.
  • a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium) , an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a server and the server can be a component.
  • One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Further, components may be communicatively coupled to each other by various types of communications media to coordinate operations. The coordination may involve the uni-directional or bi-directional exchange of information. For instance, the components may communicate information in the form of signals communicated over the communications media. The information can be implemented as signals allocated to various signal lines. In such allocations, each message is a signal. Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces.
  • the computing architecture 800 includes various common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth.
  • processors multi-core processors
  • co-processors memory units
  • chipsets controllers
  • peripherals interfaces
  • oscillators oscillators
  • timing devices video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth.
  • the embodiments are not limited to implementation by the computing architecture 800.
  • the computing architecture 800 comprises a processing unit 804, a system memory 806 and a system bus 808.
  • the processing unit 804 can be any of various commercially available processors, including without limitation an processors; application, embedded and secure processors; processors; IBM and Cell processors; Core (2) processors; and similar processors. Dual microprocessors, multi-core processors, and other multi-processor architectures may also be employed as the processing unit 804.
  • the system bus 808 provides an interface for system components including, but not limited to, the system memory 806 to the processing unit 804.
  • the system bus 808 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller) , a peripheral bus, and a local bus using any of a variety of commercially available bus architectures.
  • Interface adapters may connect to the system bus 808 via a slot architecture.
  • Example slot architectures may include without limitation Accelerated Graphics Port (AGP) , Card Bus, (Extended) Industry Standard Architecture ( (E) ISA) , Micro Channel Architecture (MCA) , NuBus, Peripheral Component Interconnect (Extended) (PCI (X) ) , PCI Express, Personal Computer Memory Card International Association (PCMCIA) , and the like.
  • AGP Accelerated Graphics Port
  • E Extended) Industry Standard Architecture
  • MCA Micro Channel Architecture
  • NuBus NuBus
  • PCI (X) Peripheral Component Interconnect
  • PCI Express PCI Express
  • PCMCIA Personal Computer Memory Card International Association
  • the system memory 806 may include various types of computer-readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM) , random-access memory (RAM) , dynamic RAM (DRAM) , Double-Data-Rate DRAM (DDRAM) , synchronous DRAM (SDRAM) , static RAM (SRAM) , programmable ROM (PROM) , erasable programmable ROM (EPROM) , electrically erasable programmable ROM (EEPROM) , flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory, solid state drives (SSD) and any other type of storage media suitable for storing information.
  • the system memory 806 can include various types of computer-
  • the computer 802 may include various types of computer-readable storage media in the form of one or more lower speed memory units, including an internal (or external) hard disk drive (HDD) 814, a magnetic floppy disk drive (FDD) 816 to read from or write to a removable magnetic disk 818, and an optical disk drive 820 to read from or write to a removable optical disk 822 (e.g., a CD-ROM or DVD) .
  • the HDD 814, FDD 816 and optical disk drive 820 can be connected to the system bus 808 by a HDD interface 824, an FDD interface 826 and an optical drive interface 828, respectively.
  • the HDD interface 824 for external drive implementations can include at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
  • the drives and associated computer-readable media provide volatile and/or nonvolatile storage of data, data structures, computer-executable instructions, and so forth.
  • a number of program modules can be stored in the drives and memory units 810, 812, including an operating system 830, one or more application programs 832, other program modules 834, and program data 836.
  • the one or more application programs 832, other program modules 834, and program data 836 can include, for example, the various applications and/or components of one or both of health monitoring management node 220 of FIG. 2 and apparatus 400 of FIG. 4.
  • a user can enter commands and information into the computer 802 through one or more wire/wireless input devices, for example, a keyboard 838 and a pointing device, such as a mouse 840.
  • Other input devices may include microphones, infra-red (IR) remote controls, radio-frequency (RF) remote controls, game pads, stylus pens, card readers, dongles, finger print readers, gloves, graphics tablets, joysticks, keyboards, retina readers, touch screens (e.g., capacitive, resistive, etc. ) , trackballs, trackpads, sensors, styluses, and the like.
  • IR infra-red
  • RF radio-frequency
  • input devices are often connected to the processing unit 804 through an input device interface 842 that is coupled to the system bus 808, but can be connected by other interfaces such as a parallel port, IEEE 1394 serial port, a game port, a USB port, an IR interface, and so forth.
  • a monitor 844 or other type of display device is also connected to the system bus 808 via an interface, such as a video adaptor 846.
  • the monitor 844 may be internal or external to the computer 802.
  • a computer typically includes other peripheral output devices, such as speakers, printers, and so forth.
  • the computer 802 may operate in a networked environment using logical connections via wire and/or wireless communications to one or more remote computers, such as a remote computer 848.
  • the remote computer 848 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 802, although, for purposes of brevity, only a memory/storage device 850 is illustrated.
  • the logical connections depicted include wire/wireless connectivity to a local area network (LAN) 852 and/or larger networks, for example, a wide area network (WAN) 854.
  • LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet.
  • the computer 802 When used in a LAN networking environment, the computer 802 is connected to the LAN 852 through a wire and/or wireless communication network interface or adaptor 856.
  • the adaptor 856 can facilitate wire and/or wireless communications to the LAN 852, which may also include a wireless access point disposed thereon for communicating with the wireless functionality of the adaptor 856.
  • the computer 802 can include a modem 858, or is connected to a communications server on the WAN 854, or has other means for establishing communications over the WAN 854, such as by way of the Internet.
  • the modem 858 which can be internal or external and a wire and/or wireless device, connects to the system bus 808 via the input device interface 842.
  • program modules depicted relative to the computer 802, or portions thereof can be stored in the remote memory/storage device 850. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
  • the computer 802 is operable to communicate with wire and wireless devices or entities using the IEEE 802 family of standards, such as wireless devices operatively disposed in wireless communication (e.g., IEEE 802.16 over-the-air modulation techniques) .
  • the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.
  • Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, n, etc. ) to provide secure, reliable, fast wireless connectivity.
  • a Wi-Fi network can be used to connect computers to each other, to the Internet, and to wire networks (which use IEEE 802.3-related media and functions) .
  • FIG. 9 illustrates a block diagram of an exemplary communications architecture 900 suitable for implementing various embodiments as previously described.
  • the communications architecture 900 includes various common communications elements, such as a transmitter, receiver, transceiver, radio, network interface, baseband processor, antenna, amplifiers, filters, power supplies, and so forth.
  • the embodiments are not limited to implementation by the communications architecture 900.
  • the communications architecture 900 comprises includes one or more clients 902 and servers 904.
  • the clients 902 and the servers 904 are operatively connected to one or more respective client data stores 908 and server data stores 910 that can be employed to store information local to the respective clients 902 and servers 904, such as cookies and/or associated contextual information.
  • Any one of clients 902 and/or servers 904 may implement one or more of health monitoring management node 220 of FIG. 2, apparatus 400 of FIG. 4, logic flow 500 of FIG. 5, logic flow 600 of FIG. 6, storage medium 700 of FIG. 7, and computing architecture 800 of FIG. 8.
  • one or both of health monitoring management node 220 of FIG. 2 and apparatus 400 of FIG. 4 may be implemented in one or more switching devices and/or routing devices in communication framework 906.
  • the clients 902 and the servers 904 may communicate information between each other using a communication framework 906.
  • the communications framework 906 may implement any well-known communications techniques and protocols.
  • the communications framework 906 may be implemented as a packet-switched network (e.g., public networks such as the Internet, private networks such as an enterprise intranet, and so forth) , a circuit-switched network (e.g., the public switched telephone network) , or a combination of a packet-switched network and a circuit-switched network (with suitable gateways and translators) .
  • the communications framework 906 may implement various network interfaces arranged to accept, communicate, and connect to a communications network.
  • a network interface may be regarded as a specialized form of an input output interface.
  • Network interfaces may employ connection protocols including without limitation direct connect, Ethernet (e.g., thick, thin, twisted pair 10/100/1000 Base T, and the like) , token ring, wireless network interfaces, cellular network interfaces, IEEE 802.11a-x network interfaces, IEEE 802.16 network interfaces, IEEE 802.20 network interfaces, and the like.
  • multiple network interfaces may be used to engage with various communications network types. For example, multiple network interfaces may be employed to allow for the communication over broadcast, multicast, and unicast networks.
  • a communications network may be any one and the combination of wired and/or wireless networks including without limitation a direct interconnection, a secured custom connection, a private network (e.g., an enterprise intranet) , a public network (e.g., the Internet) , a Personal Area Network (PAN) , a Local Area Network (LAN) , a Metropolitan Area Network (MAN) , an Operating Missions as Nodes on the Internet (OMNI) , a Wide Area Network (WAN) , a wireless network, a cellular network, and other communications networks.
  • a private network e.g., an enterprise intranet
  • a public network e.g., the Internet
  • PAN Personal Area Network
  • LAN Local Area Network
  • MAN Metropolitan Area Network
  • OMNI Operating Missions as Nodes on the Internet
  • WAN Wide Area Network
  • wireless network a cellular network, and other communications networks.
  • Various embodiments may be implemented using hardware elements, software elements, or a combination of both.
  • hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth) , integrated circuits, application specific integrated circuits (ASIC) , programmable logic devices (PLD) , digital signal processors (DSP) , field programmable gate array (FPGA) , logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • processors microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth) , integrated circuits, application specific integrated circuits (ASIC) , programmable logic devices (PLD) , digital signal processors (DSP) , field programmable gate array (FPGA) , logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • ASIC application specific integrated circuit
  • Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API) , instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
  • circuitry may refer to, be part of, or include an Application Specific Integrated Circuit (ASIC) , an electronic circuit, a processor (shared, dedicated, or group) , and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality.
  • ASIC Application Specific Integrated Circuit
  • the circuitry may be implemented in, or functions associated with the circuitry may be implemented by, one or more software or firmware modules.
  • circuitry may include logic, at least partially operable in hardware.
  • One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein.
  • Such representations known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
  • Some embodiments may be implemented, for example, using a machine-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the embodiments.
  • Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software.
  • the machine-readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, Compact Disk Read Only Memory (CD-ROM) , Compact Disk Recordable (CD-R) , Compact Disk Rewriteable (CD-RW) , optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD) , a tape, a cassette, or the like.
  • DVD Digital Versatile Disk
  • the instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
  • Example 1 is an apparatus, comprising circuitry, and an anomaly detection component for execution by the circuitry to identify an observation set comprising a plurality of observations associated with an operating condition of a data center, determine a probability value set for the observation set, the probability value set to comprise a plurality of probability values, each of the plurality of probability values to correspond to a respective one of the plurality of observations in the observation set, determine a collective probability value for the observation set as a product of the plurality of probability values in the probability value set, and determine whether the observation set is indicative of an anomaly by comparing the collective probability value to a collective probability value threshold for the operating condition.
  • Example 2 is the apparatus of Example 1, comprising a modeling component for execution by the circuitry to configure a probabilistic model for the operating condition, the anomaly detection component for execution by the circuitry to determine the probability value set based on the probabilistic model for the operating condition.
  • Example 3 is the apparatus of Example 2, the probabilistic model to define a univariate probability distribution for observations associated with the operating condition.
  • Example 4 is the apparatus of Example 3, the univariate probability distribution to comprise a normal distribution.
  • Example 5 is the apparatus of Example 3, the univariate probability distribution to comprise a multinomial distribution.
  • Example 6 is the apparatus of any of Examples 2 to 5, the modeling component for execution by the circuitry to configure the probabilistic model for the operating condition based on a plurality of historical observations associated with the operating condition.
  • Example 7 is the apparatus of Example 6, the modeling component for execution by the circuitry to define the collective probability value threshold for the operating condition based on the plurality of historical observations associated with the operating condition.
  • Example 8 is the apparatus of Example 7, the modeling component for execution by the circuitry to identify a plurality of historical observation sets associated with the operating condition, determine a plurality of historical probability value sets, each of the plurality of historical probability value sets to correspond to a respective one of the plurality of historical observation sets, determine a plurality of historical collective probability values, each of the plurality of historical collective probability values to correspond to a respective one of the plurality of historical probability value sets, and determine the collective probability value threshold based on the plurality of historical collective probability values.
  • Example 9 is the apparatus of Example 8, each of the plurality of historical probability value sets to comprise a respective plurality of probability values.
  • Example 10 is the apparatus of Example 9, the modeling component for execution by the circuitry to determine the respective plurality of probability values in each of the plurality of historical probability value sets based on the probabilistic model for the operating condition.
  • Example 11 is the apparatus of any of Examples 9 to 10, the modeling component for execution by the circuitry to determine each of the plurality of historical collective probability values as a product of the plurality of probability values in a respective one of the plurality of historical probability value sets.
  • Example 12 is the apparatus of any of Examples 8 to 11, the modeling component for execution by the circuitry to determine the collective probability value threshold as a smallest historical collective probability value among the plurality of historical collective probability values.
  • Example 13 is the apparatus of any of Examples 8 to 11, the modeling component for execution by the circuitry to determine the collective probability value threshold as a value corresponding to a predefined percentile with respect to the plurality of historical collective probability values.
  • Example 14 is the apparatus of any of Examples 1 to 13, the anomaly detection component for execution by the circuitry to determine that the observation set is not indicative of an anomaly in response to a determination that the collective probability value is greater than the collective probability value threshold.
  • Example 15 is the apparatus of any of Examples 1 to 14, the anomaly detection component for execution by the circuitry to determine that the observation set is indicative of an anomaly in response to a determination that the collective probability value is less than the collective probability value threshold.
  • Example 16 is the apparatus of Example 15, the anomaly detection component for execution by the circuitry to generate anomaly detection information in response to the determination that the observation set is indicative of an anomaly.
  • Example 17 is the apparatus of Example 16, the operating condition associated with a component of the data center, the anomaly detection information to identify the component with which the operating condition is associated.
  • Example 18 is the apparatus of Example 17, the component to comprise a component of a server of the data center.
  • Example 19 is the apparatus of Example 18, the component to comprise a fan of the server.
  • Example 20 is the apparatus of Example 18, the component to comprise a voltage regulator of the server.
  • Example 21 is the apparatus of Example 17, the component to comprise a climate control system of the data center.
  • Example 22 is the apparatus of any of Examples 1 to 21, the plurality of observations to comprise temperature measurements.
  • Example 23 is the apparatus of any of Examples 1 to 21, the plurality of observations to comprise voltage measurements.
  • Example 24 is the apparatus of any of Examples 1 to 21, the plurality of observations to comprise current measurements.
  • Example 25 is the apparatus of any of Examples 1 to 21, the plurality of observations to comprise magnetic field measurements.
  • Example 26 is the apparatus of any of Examples 1 to 21, the plurality of observations to comprise humidity measurements.
  • Example 27 is the apparatus of any of Examples 1 to 21, the plurality of observations to comprise air purity measurements.
  • Example 28 is a system, comprising an apparatus according to any of Examples 1 to 27, and a display.
  • Example 29 is a method, comprising identifying, by a processor circuit, an observation set comprising a plurality of observations associated with an operating condition of a data center, determining a probability value set for the observation set, the probability value set to comprise a plurality of probability values, each of the plurality of probability values to correspond to a respective one of the plurality of observations in the observation set, determining a collective probability value for the observation set as a product of the plurality of probability values in the probability value set, and determining whether the observation set is indicative of an anomaly by comparing the collective probability value to a collective probability value threshold for the operating condition.
  • Example 30 is the method of Example 29, comprising configuring a probabilistic model for the operating condition, and determining the probability value set based on the probabilistic model for the operating condition.
  • Example 31 is the method of Example 30, the probabilistic model to define a univariate probability distribution for observations associated with the operating condition.
  • Example 32 is the method of Example 31, the univariate probability distribution to comprise a normal distribution.
  • Example 33 is the method of Example 31, the univariate probability distribution to comprise a multinomial distribution.
  • Example 34 is the method of any of Examples 30 to 33, comprising configuring the probabilistic model for the operating condition based on a plurality of historical observations associated with the operating condition.
  • Example 35 is the method of Example 34, comprising defining the collective probability value threshold for the operating condition based on the plurality of historical observations associated with the operating condition.
  • Example 36 is the method of Example 35, comprising identifying a plurality of historical observation sets associated with the operating condition, determining a plurality of historical probability value sets, each of the plurality of historical probability value sets to correspond to a respective one of the plurality of historical observation sets, determining a plurality of historical collective probability values, each of the plurality of historical collective probability values to correspond to a respective one of the plurality of historical probability value sets, and determining the collective probability value threshold based on the plurality of historical collective probability values.
  • Example 37 is the method of Example 36, each of the plurality of historical probability value sets to comprise a respective plurality of probability values.
  • Example 38 is the method of Example 37, comprising determining the respective plurality of probability values in each of the plurality of historical probability value sets based on the probabilistic model for the operating condition.
  • Example 39 is the method of any of Examples 37 to 38, comprising determining each of the plurality of historical collective probability values as a product of the plurality of probability values in a respective one of the plurality of historical probability value sets.
  • Example 40 is the method of any of Examples 36 to 39, comprising determining the collective probability value threshold as a smallest historical collective probability value among the plurality of historical collective probability values.
  • Example 41 is the method of any of Examples 36 to 39, comprising determining the collective probability value threshold as a value corresponding to a predefined percentile with respect to the plurality of historical collective probability values.
  • Example 42 is the method of any of Examples 29 to 41, comprising determining that the observation set is not indicative of an anomaly in response to a determination that the collective probability value is greater than the collective probability value threshold.
  • Example 43 is the method of any of Examples 29 to 42, comprising determining that the observation set is indicative of an anomaly in response to a determination that the collective probability value is less than the collective probability value threshold.
  • Example 44 is the method of Example 43, comprising generating anomaly detection information in response to the determination that the observation set is indicative of an anomaly.
  • Example 45 is the method of Example 44, the operating condition associated with a component of the data center, the anomaly detection information to identify the component with which the operating condition is associated.
  • Example 46 is the method of Example 45, the component to comprise a component of a server of the data center.
  • Example 47 is the method of Example 46, the component to comprise a fan of the server.
  • Example 48 is the method of Example 46, the component to comprise a voltage regulator of the server.
  • Example 49 is the method of Example 45, the component to comprise a climate control system of the data center.
  • Example 50 is the method of any of Examples 29 to 49, the plurality of observations to comprise temperature measurements.
  • Example 51 is the method of any of Examples 29 to 49, the plurality of observations to comprise voltage measurements.
  • Example 52 is the method of any of Examples 29 to 49, the plurality of observations to comprise current measurements.
  • Example 53 is the method of any of Examples 29 to 49, the plurality of observations to comprise magnetic field measurements.
  • Example 54 is the method of any of Examples 29 to 49, the plurality of observations to comprise humidity measurements.
  • Example 55 is the method of any of Examples 29 to 49, the plurality of observations to comprise air purity measurements.
  • Example 56 is at least one computer-readable storage medium comprising a set of instructions that, in response to being executed on a computing device, cause the computing device to perform a method according to any of Examples 29 to 55.
  • Example 57 is an apparatus, comprising means for performing a method according to any of Examples 29 to 55.
  • Example 58 is a system, comprising the apparatus of Example 57, and a display.
  • Example 59 is at least one computer-readable storage medium comprising a set of instructions that, in response to being executed on a computing device, cause the computing device to identify an observation set comprising a plurality of observations associated with an operating condition of a data center, determine a probability value set for the observation set, the probability value set to comprise a plurality of probability values, each of the plurality of probability values to correspond to a respective one of the plurality of observations in the observation set, determine a collective probability value for the observation set as a product of the plurality of probability values in the probability value set, and determine whether the observation set is indicative of an anomaly by comparing the collective probability value to a collective probability value threshold for the operating condition.
  • Example 60 is the at least one computer-readable storage medium of Example 59, comprising instructions that, in response to being executed on the computing device, cause the computing device to configure a probabilistic model for the operating condition, and determine the probability value set based on the probabilistic model for the operating condition.
  • Example 61 is the at least one computer-readable storage medium of Example 60, the probabilistic model to define a univariate probability distribution for observations associated with the operating condition.
  • Example 62 is the at least one computer-readable storage medium of Example 61, the univariate probability distribution to comprise a normal distribution.
  • Example 63 is the at least one computer-readable storage medium of Example 61, the univariate probability distribution to comprise a multinomial distribution.
  • Example 64 is the at least one computer-readable storage medium of any of Examples 60 to 63, comprising instructions that, in response to being executed on the computing device, cause the computing device to configure the probabilistic model for the operating condition based on a plurality of historical observations associated with the operating condition.
  • Example 65 is the at least one computer-readable storage medium of Example 64, comprising instructions that, in response to being executed on the computing device, cause the computing device to define the collective probability value threshold for the operating condition based on the plurality of historical observations associated with the operating condition.
  • Example 66 is the at least one computer-readable storage medium of Example 65, comprising instructions that, in response to being executed on the computing device, cause the computing device to identify a plurality of historical observation sets associated with the operating condition, determine a plurality of historical probability value sets, each of the plurality of historical probability value sets to correspond to a respective one of the plurality of historical observation sets, determine a plurality of historical collective probability values, each of the plurality of historical collective probability values to correspond to a respective one of the plurality of historical probability value sets, and determine the collective probability value threshold based on the plurality of historical collective probability values.
  • Example 67 is the at least one computer-readable storage medium of Example 66, each of the plurality of historical probability value sets to comprise a respective plurality of probability values.
  • Example 68 is the at least one computer-readable storage medium of Example 67, comprising instructions that, in response to being executed on the computing device, cause the computing device to determine the respective plurality of probability values in each of the plurality of historical probability value sets based on the probabilistic model for the operating condition.
  • Example 69 is the at least one computer-readable storage medium of any of Examples 67 to 68, comprising instructions that, in response to being executed on the computing device, cause the computing device to determine each of the plurality of historical collective probability values as a product of the plurality of probability values in a respective one of the plurality of historical probability value sets.
  • Example 70 is the at least one computer-readable storage medium of any of Examples 66 to 69, comprising instructions that, in response to being executed on the computing device, cause the computing device to determine the collective probability value threshold as a smallest historical collective probability value among the plurality of historical collective probability values.
  • Example 71 is the at least one computer-readable storage medium of any of Examples 66 to 69, comprising instructions that, in response to being executed on the computing device, cause the computing device to determine the collective probability value threshold as a value corresponding to a predefined percentile with respect to the plurality of historical collective probability values.
  • Example 72 is the at least one computer-readable storage medium of any of Examples 59 to 71, comprising instructions that, in response to being executed on the computing device, cause the computing device to determine that the observation set is not indicative of an anomaly in response to a determination that the collective probability value is greater than the collective probability value threshold.
  • Example 73 is the at least one computer-readable storage medium of any of Examples 59 to 72, comprising instructions that, in response to being executed on the computing device, cause the computing device to determine that the observation set is indicative of an anomaly in response to a determination that the collective probability value is less than the collective probability value threshold.
  • Example 74 is the at least one computer-readable storage medium of Example 73, comprising generating anomaly detection information in response to the determination that the observation set is indicative of an anomaly.
  • Example 75 is the at least one computer-readable storage medium of Example 74, the operating condition associated with a component of the data center, the anomaly detection information to identify the component with which the operating condition is associated.
  • Example 76 is the at least one computer-readable storage medium of Example 75, the component to comprise a component of a server of the data center.
  • Example 77 is the at least one computer-readable storage medium of Example 76, the component to comprise a fan of the server.
  • Example 78 is the at least one computer-readable storage medium of Example 76, the component to comprise a voltage regulator of the server.
  • Example 79 is the at least one computer-readable storage medium of Example 75, the component to comprise a climate control system of the data center.
  • Example 80 is the at least one computer-readable storage medium of any of Examples 59 to 79, the plurality of observations to comprise temperature measurements.
  • Example 81 is the at least one computer-readable storage medium of any of Examples 59 to 79, the plurality of observations to comprise voltage measurements.
  • Example 82 is the at least one computer-readable storage medium of any of Examples 59 to 79, the plurality of observations to comprise current measurements.
  • Example 83 is the at least one computer-readable storage medium of any of Examples 59 to 79, the plurality of observations to comprise magnetic field measurements.
  • Example 84 is the at least one computer-readable storage medium of any of Examples 59 to 79, the plurality of observations to comprise humidity measurements.
  • Example 85 is the at least one computer-readable storage medium of any of Examples 59 to 79, the plurality of observations to comprise air purity measurements.
  • Example 86 is an apparatus, comprising means for identifying an observation set comprising a plurality of observations associated with an operating condition of a data center, means for determining a probability value set for the observation set, the probability value set to comprise a plurality of probability values, each of the plurality of probability values to correspond to a respective one of the plurality of observations in the observation set, means for determining a collective probability value for the observation set as a product of the plurality of probability values in the probability value set, and means for determining whether the observation set is indicative of an anomaly by comparing the collective probability value to a collective probability value threshold for the operating condition.
  • Example 87 is the apparatus of Example 86, comprising means for configuring a probabilistic model for the operating condition, and means for determining the probability value set based on the probabilistic model for the operating condition.
  • Example 88 is the apparatus of Example 87, the probabilistic model to define a univariate probability distribution for observations associated with the operating condition.
  • Example 89 is the apparatus of Example 88, the univariate probability distribution to comprise a normal distribution.
  • Example 90 is the apparatus of Example 88, the univariate probability distribution to comprise a multinomial distribution.
  • Example 91 is the apparatus of any of Examples 87 to 90, comprising means for configuring the probabilistic model for the operating condition based on a plurality of historical observations associated with the operating condition.
  • Example 92 is the apparatus of Example 91, comprising means for defining the collective probability value threshold for the operating condition based on the plurality of historical observations associated with the operating condition.
  • Example 93 is the apparatus of Example 92, comprising means for identifying a plurality of historical observation sets associated with the operating condition, means for determining a plurality of historical probability value sets, each of the plurality of historical probability value sets to correspond to a respective one of the plurality of historical observation sets, means for determining a plurality of historical collective probability values, each of the plurality of historical collective probability values to correspond to a respective one of the plurality of historical probability value sets, and means for determining the collective probability value threshold based on the plurality of historical collective probability values.
  • Example 94 is the apparatus of Example 93, each of the plurality of historical probability value sets to comprise a respective plurality of probability values.
  • Example 95 is the apparatus of Example 94, comprising means for determining the respective plurality of probability values in each of the plurality of historical probability value sets based on the probabilistic model for the operating condition.
  • Example 96 is the apparatus of any of Examples 94 to 95, comprising means for determining each of the plurality of historical collective probability values as a product of the plurality of probability values in a respective one of the plurality of historical probability value sets.
  • Example 97 is the apparatus of any of Examples 93 to 96, comprising means for determining the collective probability value threshold as a smallest historical collective probability value among the plurality of historical collective probability values.
  • Example 98 is the apparatus of any of Examples 93 to 96, comprising means for determining the collective probability value threshold as a value corresponding to a predefined percentile with respect to the plurality of historical collective probability values.
  • Example 99 is the apparatus of any of Examples 86 to 98, comprising means for determining that the observation set is not indicative of an anomaly in response to a determination that the collective probability value is greater than the collective probability value threshold.
  • Example 100 is the apparatus of any of Examples 86 to 99, comprising means for determining that the observation set is indicative of an anomaly in response to a determination that the collective probability value is less than the collective probability value threshold.
  • Example 101 is the apparatus of Example 100, comprising means for generating anomaly detection information in response to the determination that the observation set is indicative of an anomaly.
  • Example 102 is the apparatus of Example 101, the operating condition associated with a component of the data center, the anomaly detection information to identify the component with which the operating condition is associated.
  • Example 103 is the apparatus of Example 102, the component to comprise a component of a server of the data center.
  • Example 104 is the apparatus of Example 103, the component to comprise a fan of the server.
  • Example 105 is the apparatus of Example 103, the component to comprise a voltage regulator of the server.
  • Example 106 is the apparatus of Example 102, the component to comprise a climate control system of the data center.
  • Example 107 is the apparatus of any of Examples 86 to 106, the plurality of observations to comprise temperature measurements.
  • Example 108 is the apparatus of any of Examples 86 to 106, the plurality of observations to comprise voltage measurements.
  • Example 109 is the apparatus of any of Examples 86 to 106, the plurality of observations to comprise current measurements.
  • Example 110 is the apparatus of any of Examples 86 to 106, the plurality of observations to comprise magnetic field measurements.
  • Example 111 is the apparatus of any of Examples 86 to 106, the plurality of observations to comprise humidity measurements.
  • Example 112 is the apparatus of any of Examples 86 to 106, the plurality of observations to comprise air purity measurements.
  • Example 113 is a system, comprising an apparatus according to any of Examples 86 to 112, and a display.
  • Coupled and “connected” along with their derivatives. These terms are not intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled, ” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
  • processing, ” “computing, ” “calculating, ” “determining, ” or the like refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within the computing system’s registers and/or memories into other data similarly represented as physical quantities within the computing system’s memories, registers or other such information storage, transmission or display devices.
  • physical quantities e.g., electronic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

L'invention concerne des techniques de détection d'anomalie pour des serveurs et des centres de données. Dans certains modes de réalisation, par exemple, un appareil peut comprendre un ensemble de circuits et un composant de détection d'anomalies destiné à être exécuté par l'ensemble de circuits pour identifier un ensemble d'observations comportant une pluralité d'observations associées à une condition de fonctionnement d'un centre de données et pour déterminer une valeur de probabilité définie pour l'ensemble d'observations, la valeur de probabilité étant établie de façon à comprendre une pluralité de valeurs de probabilité, chacune de la pluralité de valeurs de probabilité étant destinée à correspondre à un élément respectif parmi la pluralité d'observations, le composant de détection d'anomalies étant destiné à être exécuté par l'ensemble de circuits pour déterminer une valeur de probabilité collective pour l'ensemble d'observation en tant que produit de la pluralité de valeurs de probabilité et pour déterminer si l'ensemble d'observations indique une anomalie par comparaison de la valeur de probabilité collective à un seuil de valeur de probabilité collective pour l'état de fonctionnement. L'invention concerne également d'autres modes de réalisation.
PCT/CN2015/098925 2015-12-25 2015-12-25 Techniques de détection d'anomalie pour des serveurs et des centres de données WO2017107187A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/098925 WO2017107187A1 (fr) 2015-12-25 2015-12-25 Techniques de détection d'anomalie pour des serveurs et des centres de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/098925 WO2017107187A1 (fr) 2015-12-25 2015-12-25 Techniques de détection d'anomalie pour des serveurs et des centres de données

Publications (1)

Publication Number Publication Date
WO2017107187A1 true WO2017107187A1 (fr) 2017-06-29

Family

ID=59088668

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098925 WO2017107187A1 (fr) 2015-12-25 2015-12-25 Techniques de détection d'anomalie pour des serveurs et des centres de données

Country Status (1)

Country Link
WO (1) WO2017107187A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11622006B2 (en) 2020-11-04 2023-04-04 Panduit Corp. Single pair ethernet sensor device and sensor network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197445A1 (en) * 2011-01-31 2012-08-02 Yamatake Corporation Air-conditioner operation controlling device and method
US20140238656A1 (en) * 2013-02-28 2014-08-28 Hitachi, Ltd. Air-conditioning control apparatus for data center
CN104075751A (zh) * 2013-03-26 2014-10-01 北京百度网讯科技有限公司 互联网数据中心的温湿度预警方法及装置
WO2015090114A1 (fr) * 2013-12-17 2015-06-25 北京百度网讯科技有限公司 Système et procédé de commande de réfrigération pour centre de données

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197445A1 (en) * 2011-01-31 2012-08-02 Yamatake Corporation Air-conditioner operation controlling device and method
US20140238656A1 (en) * 2013-02-28 2014-08-28 Hitachi, Ltd. Air-conditioning control apparatus for data center
CN104075751A (zh) * 2013-03-26 2014-10-01 北京百度网讯科技有限公司 互联网数据中心的温湿度预警方法及装置
WO2015090114A1 (fr) * 2013-12-17 2015-06-25 北京百度网讯科技有限公司 Système et procédé de commande de réfrigération pour centre de données

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11622006B2 (en) 2020-11-04 2023-04-04 Panduit Corp. Single pair ethernet sensor device and sensor network
US11985196B2 (en) 2020-11-04 2024-05-14 Panduit Corp. Single pair ethernet sensor device and sensor network

Similar Documents

Publication Publication Date Title
US9606843B2 (en) Runtime optimization of multi-core system designs for increased operating life and maximized performance
US9940187B2 (en) Nexus determination in a computing device
US20150169363A1 (en) Runtime Optimization of Multi-core System Designs for Increased Operating Life and Maximized Performance
US10261875B2 (en) Runtime optimization of multi-core system designs for increased operating life and maximized performance
TWI723476B (zh) 異常檢測的解釋特徵確定方法、裝置和設備
US9514020B2 (en) Power profile diagnostic system
JP2018169989A (ja) 分散型コンピューティングにおける低速タスクの診断
EP3591552B1 (fr) Système de protection comprenant une évaluation d'instantané à apprentissage machine
US20180025289A1 (en) Performance Provisioning Using Machine Learning Based Automated Workload Classification
US20180300202A1 (en) System and Method for Information Handling System Boot Status and Error Data Capture and Analysis
JP7285187B2 (ja) 履歴及び時系列の共同分析に基づく異常の特性評価のためのシステム及び方法
WO2018017245A1 (fr) Dimensionnement d'exécution à l'aide d'une classification de charge de travail automatisée en fonction de l'apprentissage par machine
US20240160694A1 (en) Root cause analysis using granger causality
US20210262958A1 (en) System and method to create an air flow map and detect air recirculation in an information handling system
US10055366B2 (en) Method for data transmission and server for implementing the method
US20140088945A1 (en) System and method for an energy management system
US9645873B2 (en) Integrated configuration management and monitoring for computer systems
WO2019180778A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et support d'enregistrement
CN113961416A (zh) 用于准确确定工作集成电路各个位置的温度的技术
WO2017107187A1 (fr) Techniques de détection d'anomalie pour des serveurs et des centres de données
US9329923B2 (en) Diagnostic testing based on information handling system variables
EP4222599A1 (fr) Procédés et systèmes de détection d'indisponibilité de ressources multiples pour un système de dispositifs informatiques en réseau et d'identification de cause racine
EP3759949A1 (fr) Procédé et unité de détermination pour identifier une ou des positions optimales
US20210319004A1 (en) Techniques for creating and utilizing multidimensional embedding spaces
US20180113031A1 (en) Method and device for monitoring temperature of an electronic element

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15911185

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15911185

Country of ref document: EP

Kind code of ref document: A1