WO2019186243A1 - Global data center cost/performance validation based on machine intelligence - Google Patents

Global data center cost/performance validation based on machine intelligence Download PDF

Info

Publication number
WO2019186243A1
WO2019186243A1 PCT/IB2018/052214 IB2018052214W WO2019186243A1 WO 2019186243 A1 WO2019186243 A1 WO 2019186243A1 IB 2018052214 W IB2018052214 W IB 2018052214W WO 2019186243 A1 WO2019186243 A1 WO 2019186243A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
model
data center
operational parameter
operational
Prior art date
Application number
PCT/IB2018/052214
Other languages
French (fr)
Inventor
Yves Lemieux
Claude Gauthier
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/IB2018/052214 priority Critical patent/WO2019186243A1/en
Publication of WO2019186243A1 publication Critical patent/WO2019186243A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to methods and apparatuses for data center infrastructure management and, in particular, to methods and apparatuses for global data center (DC) cost/performance validation based on machine intelligence (MI).
  • DC global data center
  • MI machine intelligence
  • Cloud computing consists of transitioning computer services to offsite locations available on the Internet.
  • the computers making up the cloud system can be virtualized in order to maximize the resources of the available physical computers.
  • cloud-based models are being used as high economic opportunities to host public and private services by shifting applications to the cloud.
  • managing and optimizing data center infrastructure and operational parameters is a major challenge facing data centers.
  • one known method for managing energy consumption includes obtaining information, where the information includes calculated energy utilization for at least one application within a data center, the calculated energy consumption based on at least one trigger factor. This known method further includes identifying an energy optimization opportunity for at least one of the applications based on at least the obtained information and validating the energy optimization opportunity for at least one of the applications based at least in part on energy optimization for the data center.
  • Some embodiments of the present disclosure advantageously provide methods and apparatuses for using Machine Intelligence (MI) to implement intelligence into global management of DC operations, which includes at least energy management and resource balancing and may, in some embodiments, facilitate automated
  • MI Machine Intelligence
  • a one-size-fits-all approach to using MI models is insufficient in certain scenarios. Therefore, such embodiments may provide for different ML models to answer different situations (e.g., policy generation, DC operation pattern reporting, fault management clustering, etc.).
  • policy generation may be considered to be a definition of one condition, such as an operational parameter at least meeting a threshold, and the respective action to be taken in response to the occurrence of such condition, such as, a policy implementation (in other words, IF this condition, THEN this action).
  • a policy implementation in other words, IF this condition, THEN this action.
  • LSTM Long and Short-Term Memory
  • a DC Operational Pattern report may represent the general cost performance for certain applications. For example, an operational report may report all the video content running during the week with the number of resources used. This could show the context with time-of-the-day and day-of-the-week, for the media application and its associated relative cost in order to determine if the optimization is actually improving the situation.
  • fault management clustering may include an aggregation of faults together to counteract the propagation of one fault into e.g., thousands of faults at different layers (e.g. physical, logical, service, application, etc.) of the DC 12 system.
  • layers e.g. physical, logical, service, application, etc.
  • an inference switch may provide access to a plurality of ML models depending on the problem(s) to resolve, the type of analysis to be made, and/or the data set to be used as input.
  • Some embodiments of the present disclosure advantageously provide for direct dependency modelling identified between cloud resource entities. Some embodiments advantageously provide for access to data correlation using historical data from DC repositories. Some embodiments advantageously provide for DC automation using Validation (semi-Automation) as an intermediate step. Some embodiments provide an advantage of utilizing MI to counter-act human error to prevent, for example, prolonged shutdown times for web services.
  • embodiments may further provide a more complete“reporting” taking into account data correlation and DC operation patterns. Some embodiments may advantageously provide policy generation for preventive actions that may be adjusted as a function of time and also a function of important DC operational metrics such as space, power and/or cooling. Some embodiments provide programmability of DC operation as opposed to hardcoded algorithms.
  • Some embodiments advantageously provide for the ability to switch optimally (e.g., inferred switch) between different MI models (e.g., reinforcement learning, support vector machine, neural network (NN), Bayesian network, etc.) to obtain higher levels of optimization.
  • MI models e.g., reinforcement learning, support vector machine, neural network (NN), Bayesian network, etc.
  • Some embodiments provide for a new paradigm that brings new functions between a plurality of users (end-users, enterprise, etc.) with respect to Evaluation of Cost/Energy Efficiency (EE) Performance, many users pooling of applications, virtual-computing, etc.
  • the Cost/Performance ratio can apply to different performance metrics (e.g., latency, power consumption, etc.), one of which is EE.
  • Energy Efficiency may be considered the comparison of Energy Consumption once optimized with respect to Energy Consumption before
  • EE may also be considered an Energy Consumption reduction percentage. For example, if Energy Consumption was reduced by 40% with optimization techniques, the cost may be considered reduced since less energy was used.
  • Some embodiments determine how to balance cloud physical/logical resources to accommodate high performance (e.g., processing, storage delay, networking, etc.) as pooling becomes increasingly viable and profitable as a function of time.
  • an apparatus for an inference switch associated with a data center infrastructure manager includes processing circuitry and the processing circuitry is configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center.
  • the processing circuitry is further configured to receive an indication of a type of the data corresponding to the at least one operational parameter and recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input.
  • the processing circuitry may be further configured to select a machine intelligence (MI) model from a set of available MI models based on the type of the data and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
  • MI machine intelligence
  • the at least one data source includes at least one of at least one sensor configured to measure at least one physical property of the at least one resource and at least one memory storing measurements from the at least one sensor.
  • the at least one resource includes at least a processing resource, a storage resource, and a network resource.
  • the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center, at least a second type of data relating to an aspect of power management at the at least one data center, and at least a third type of data relating to an aspect of cooling at the at least one data center.
  • the MI model is configured to: receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and based at least on the input, output at least one recommendation for the at least one operational parameter inferred according to the MI model.
  • the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center.
  • the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center.
  • the apparatus further includes a container repository storing equipment specifications for a plurality of resources at the at least one data center; and the processing circuitry is configured to access the container repository and use at least a portion of the equipment
  • the container repository to select the MI model from the set of available MI models.
  • the plurality of types of data are configured to select the MI model from the set of available MI models.
  • recognizable by the processing circuitry includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature
  • At least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source.
  • the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN.
  • the processing circuitry is one of coupled to a data center infrastructure manager (DCIM) and included in the DCIM.
  • DCIM data center infrastructure manager
  • selection of the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, a data center operational pattern reporting, and a fault management clustering.
  • an apparatus for a machine intelligence (MI) optimizer associated with a data center infrastructure management (DCIM) includes processing circuitry configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center; identify an occurrence of a trigger based on the obtained data; and as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure.
  • MI machine intelligence
  • DCIM data center infrastructure management
  • the ML optimization procedure includes selecting a machine intelligence (MI) model; receiving, from a database, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
  • the processing circuitry is further configured to identify the occurrence of the trigger based on the obtained data by being configured to calculate a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to- performance ratio, determine whether to execute the ML optimization procedure.
  • the processing circuitry is further configured to obtain the data from the at least one data source, identify the occurrence of the trigger, and apply the obtained data to the trained MI model to produce the at least one recommendation periodically to provide dynamic recommendations for operation of the at least one data center.
  • the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling.
  • the at least one recommendation includes an indication of at least one of: a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center.
  • the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center.
  • the at least one balancing function includes a cost- performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio.
  • the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response.
  • the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
  • DCIM data center infrastructure manager
  • the method includes obtaining data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center and receiving an indication of a type of the data corresponding to the at least one operational parameter.
  • the method further includes recognizing the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input; selecting a machine intelligence (MI) model from a set of available MI models based on the type of the data; and outputting the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
  • MI machine intelligence
  • the at least one data source includes at least one of at least one sensor configured to measure at least one physical property of the at least one resource and at least one memory storing measurements from the at least one sensor.
  • the at least one resource includes at least a processing resource, a storage resource, and a network resource.
  • the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center, at least a second type of data relating to an aspect of power management at the at least one data center, and at least a third type of data relating to an aspect of cooling at the at least one data center.
  • the MI model is configured to receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and based at least on the input, output at least one
  • the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center.
  • the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center.
  • the method further comprises storing, at a container repository, equipment specifications for a plurality of resources at the at least one data center; and accessing the container repository to use at least a portion of the equipment specifications stored in the container repository to select the MI model from the set of available MI models.
  • the plurality of types of data that are recognizable includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature differential.
  • at least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source.
  • NN neural network
  • the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN.
  • selecting the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, a data center operational pattern reporting, and a fault management clustering.
  • a method for a machine intelligence (MI) optimizer associated with a data center infrastructure management (DCIM) comprises obtaining data from at least one data source of at least one data center, the data representing at least one value
  • the at least one operational parameter associated with at least one resource of the at least one data center
  • the ML optimization procedure includes selecting a machine intelligence (MI) model; receiving, from a database, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
  • MI machine intelligence
  • identifying the occurrence of the trigger based on the obtained data includes calculating a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to-performance ratio, determining whether to execute the ML optimization procedure.
  • obtaining data from at least one data source, identifying an occurrence of a trigger, and applying the obtained data to the trained MI model to produce at least one recommendation is performed periodically to provide dynamic recommendations for operation of the at least one data center.
  • the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling.
  • the at least one recommendation includes an indication of at least one of: a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center.
  • the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center.
  • the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio.
  • the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response.
  • the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
  • an apparatus for an inference switch associated with a data center infrastructure manager is configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center;
  • the interference switch module is further configured to recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input; select a machine intelligence (MI) model from a set of available MI models based on the type of the data; and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
  • MI machine intelligence
  • an apparatus for a machine intelligence (MI) optimizer associated with a data center infrastructure management (DCIM) comprises a data collection module, an identification module, and a machine learning (ML) optimization module.
  • the data collection module is configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center.
  • the identification module is configured to identify an occurrence of a trigger based on the obtained data.
  • the ML optimization module is configured to, as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure.
  • the ML optimization procedure includes selecting a machine intelligence (MI) model; receiving, from a database, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
  • MI machine intelligence
  • Cost/Performance optimization may also be considered in accordance with principles of the present disclosure.
  • FIG. 1 is a block diagram of an exemplary system for global data DC cost/performance validation based on MI in accordance with principles of the present disclosure
  • FIG. 2 is a block diagram of an alternate exemplary system for global data DC cost/performance validation based on MI in accordance with principles of the present disclosure
  • FIG. 3 is a block diagram of an exemplary inference switch in accordance with principles of the present disclosure
  • FIG. 4 is a block diagram of an exemplary MI optimizer in accordance with principles of the present disclosure
  • FIG. 5 is a block diagram of an exemplary alternative embodiment of the inference switch in accordance with principles of the present disclosure
  • FIG. 6 is a block diagram of an exemplary alternative embodiment of the MI optimizer in accordance with principles of the present disclosure
  • FIG. 7 is a flow diagram illustrating an exemplary method for providing an inference switch according to one embodiment of the present disclosure
  • FIG. 8 is flow diagram illustrating an exemplary method for providing MI optimization according to an alternative embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram illustrating an exemplary cost/performance validation balancing diagram according to one embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram illustrating another exemplary balancing diagram according to one embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram illustrating an exemplary phase balancing diagram according to one embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of an exemplary power distribution unit (PDU) configuration according to one embodiment of the present disclosure
  • PDU power distribution unit
  • FIG. 13 is a schematic diagram of an exemplary arrangement of the inference switch according to one embodiment of the present disclosure
  • FIG. 14 is a flow diagram illustrating an exemplary method for data center infrastructure management according to one embodiment of the present disclosure.
  • FIG. 15 is a flow diagram illustrating an exemplary method of an ML optimization procedure according to one embodiment of the present disclosure.
  • relational terms such as“first” and“second,”“top” and “bottom,” and the like, may be used solely to distinguish one entity or element from another entity or element without necessarily requiring or implying any physical or logical relationship or order between such entities or elements.
  • the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the concepts described herein.
  • the singular forms“a”,“an” and“the” are intended to include the plural forms as well, unless the context clearly indicates otherwise.
  • the joining term,“in communication with” and the like may be used to indicate electrical or data communication, which may be accomplished by physical contact, induction, electromagnetic radiation, radio signaling, infrared signaling or optical signaling, for example.
  • electrical or data communication may be accomplished by physical contact, induction, electromagnetic radiation, radio signaling, infrared signaling or optical signaling, for example.
  • the term“coupled,”“connected,” and the like may be used herein to indicate a connection, although not necessarily directly, and may include wired and/or wireless connections.
  • machine intelligence and“machine learning” are used interchangeably.
  • machine learning and/or machine intelligence may be used herein to indicate methods and/or devices/apparatuses that use specific mathematical models, functions and/or algorithms to e.g., make predictions, make decisions, provide recommendations, uncover hidden insights or anomalies through learning from historical relationships, trends, patterns, and the like, which may be obtained by, for example, analyzing large data sets over a period of time, etc.
  • DCIM data center infrastructure manager
  • a DCIM may be considered hardware and/or a set of software tools/programs configured to assist data center operators with organizing and managing information stored at a data center and information otherwise associated with the data center, such as, for example, facilities monitoring and access, asset/resource management, monitoring operational parameters of the data center, capacity planning, cable/connectivity planning, visualization, environmental and energy management, cost analytics, integration, etc.
  • the term“optimization” may be used to indicate methods and/or devices/apparatuses that are configured to attempt to improve at least one metric of the data center and/or balance metrics to improve or at least maintain operational efficiency in one or more categories of data center operation (e.g., cooling, space, power, cost, performance, network capacity, storage capacity, etc.).
  • the term“container repository” may be considered a database and/or a (virtual or physical) area of memory reserved for storing specific information.
  • methods and apparatuses of the present disclosure provide for access to DCIM dependency models, which may build between cloud resource entities.
  • Some embodiments advantageously provision the metrics correlations using historical data (e.g., for training, testing Machine Intelligence models, etc.) from DC repositories (e.g., Databases).
  • DC repositories e.g., Databases.
  • Some embodiments provide for validation as an intermediate step based on recommended policies, whereby semi automation of the DC operations may be performed.
  • policies may provide actions (e.g., a migration as an asset is updated, consolidation of already used resources, sleep mode of other resources, balancing metrics of DC resources per zone, etc.) that are adjusted as a function of time and also as a function of important operational metrics such as space, power and cooling (SPC).
  • SPC space, power and cooling
  • Some embodiments provide for reporting that takes into account correlation and operation patterns, and depends on policies recommended by an MI trained model (e.g., NN).
  • Some embodiments provide an apparatus in the form of a logical inferred switch in order to choose between different MI models (e.g., reinforcement learning, Support Vector Machine (SVM), NN, Bayesian network, etc.) to obtain higher levels of optimization (as compared to a one- size-fits all MI model approach) depending on the problem to be solved (e.g., phase balancing, temperature differentials, power consumption due to cooling systems, etc.).
  • MI models e.g., reinforcement learning, Support Vector Machine (SVM), NN, Bayesian network, etc.
  • Some embodiments advantageously provide an apparatus that supports the inference switch processing and also complements the DCIM with Mi-based automation.
  • Such apparatus may interface with the DCIM in order to treat the appropriate big data and run processing on capable processors (e.g. central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), etc.).
  • capable processors e.g. central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), etc.
  • such apparatus may include non-volatile memory for big data handling and may be able to return a recommended optimization action (e.g., balance phases) using different selectable MI models, depending on the particular DC operational parameters to be optimized.
  • a predicted value may determine what action should be taken when power consumption optimization is required.
  • Some embodiments provide for new use cases between a plurality of users (e.g., end-users, enterprise-users, etc.) with respect to Evaluation of Cost/EE performance, many users pooling of applications, virtual-computing, etc. Some embodiments provide for a method to balance cloud physical/logical resources in order to accommodate high performance (processing, storage delay, networking, etc.) as pooling of applications becomes more viable and profitable as a function of time. Some embodiments provide an Energy-method (E-method) that encompasses various scopes of the whole system; meaning that although there may be demarcations between scopes, requests can be sent between demarcation interfaces.
  • E-method Energy-method
  • system 10 may include a data center 12, a data center infrastructure manager (DCIM) 14, an inference switch 16, a machine intelligence (MI) optimizer 18, and a database (DB) 20 in communication with one another, via one or more communication links, paths, connections, and/or networks using one or more communication protocols, where the DC 12, the DCIM 14, the inference switch 16, the MI optimizer 18, and the DB 20 may be configured to perform one or more of the processes and/or techniques described herein.
  • DCIM data center infrastructure manager
  • MI machine intelligence
  • DB 20 database
  • FIG. 1 depicts a single DC 12, a single DCIM 14, a single inference switch 16, a single MI optimizer 18, and a single DB 20 it is contemplated that the system 10 may include any number of DCs 12, DCIMs 14, inference switches 16, MI optimizers 18, and/or DBs 20.
  • the connections illustrated in the system 10 in FIG. 1 are exemplary and it should be understood that the system 10 entities may be connected with one another, directly and/or indirectly, via more than one connection and/or over more than one network.
  • the DCIM 14, inference switch 16, MI optimizer 18, and DB 20 are shown in FIG. 1 (as well as FIG. 2) as external to the DC 12, in some embodiments, one or more of such entities may be internal to the DC 12 and/or the DC internal network.
  • the data center 12 may include multiple servers 22a, 22b, and 22c (referred to collectively herein as“servers 22”) running various applications and services.
  • servers 22 may include multiple servers 22a, 22b, and 22c (referred to collectively herein as“servers 22”) running various applications and services.
  • the data center 12 may also include multiple sensors 24a, 24b, and 24c (referred to collectively herein as“sensors 24”). Although three sensors 24a, 24b, and 24c are shown, the disclosure is not limited to three sensors.
  • the sensors 24 may be configured to measure at least one physical property (e.g., temperature, power, current, etc.) associated with the data center and, in some embodiments, the resources at the data center 12.
  • the sensors 24 may further include memory that stores measurements from the sensor 24 and may also be in communication, via a wired and/or wireless connection, to a container repository for data, such as, for example, the DB 20.
  • DB 20 be considered any type of database including but not limited to any combination of one or more of a relational database, an operational database, or a distributed database.
  • the servers 22 and sensors 24 may be communicatively coupled over an internal network within the DC 12.
  • the data center 12 may be segmented into a plurality of zones, such as, for example, zone 1, zone 2... zone n, depicted in FIG. 1, where n can be any number greater than 1.
  • Each zone may be considered a physical area or region of the data center 12 and, in some embodiments, certain operational parameters may be managed and/or optimized on a per- zone basis.
  • global DC optimization may include consideration of the per-zone operational parameters on the overall global operational efficiency of the DC.
  • the DCIM 14 may be considered an apparatus/device and/or system that monitors, manages, and/or controls data center utilization and energy consumption of resources (e.g., servers 22, storage, processors, network elements, etc.) and facility infrastructure (e.g., power distribution units (PDUs), computer room air handlers (CRAH), etc.).
  • the DCIM 14 may also refer to the software and/or computer instructions executable by processor(s) to monitor, manage, and/or control DC utilization, resources, and facility infrastructure.
  • the inference switch 16 may be considered an apparatus/device configured to perform the techniques described herein with respect to the inference switch 16 and/or may also refer to the software and/or computer instructions executable by processor(s) to perform such techniques.
  • the inference switch 16 may be configured to select one of a set of MI models based on a type of input data and may output the selected MI model for use to provide, for example, a recommendation for optimizing operational parameters of the DC 12.
  • the MI optimizer 18 may be considered an
  • the MI optimizer 18 may be configured to select an MI model and train the MI model in response to a triggering event and apply/input data to the trained MI model to produce DC optimization recommendations.
  • FIG. 2 illustrates an alternate embodiment of system 10.
  • DCIM 14 includes the combined functions of the inference switch 16 and MI optimizer 18 into one entity to provide global DC optimization using MI within data center 12 as discussed herein.
  • the inference switch 16 and MI optimizer 18 may be implemented in separate devices and may in other embodiments be combined so as to be implemented on a single device.
  • DCIM 14 may include a database 20 or have access to an external database 20 which may store sensor measurements and other the trigger information obtained for global DC optimization using MI.
  • the inference switch 16 includes a communication interface 26, processing circuitry 28, and memory 30.
  • the communication interface 26 may be configured to communicate with one or more of the elements in the system 10.
  • the communication interface 26 may be formed as or may include, for example, one or more radio frequency (RF) transmitters, one or more RF receivers, and/or one or more RF transceivers, and/or may be considered a radio interface.
  • the communication interface 26 may include a wired and/or a wireless interface. Wired connections associated with the
  • communication interface 26 may include, for example, a high-speed serial or parallel interface, a bus, an optical connection, an Ethernet connection, and the like.
  • the processing circuitry 28 may include one or more processors 31 and memory, such as, the memory 30.
  • the processing circuitry 28 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions.
  • processors 31 and memory such as, the memory 30.
  • the processing circuitry 28 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions.
  • FPGAs Field Programmable Gate Array
  • ASICs Application Specific Integrated Circuitry
  • the processor 31 and/or the processing circuitry 28 may be configured to access (e.g., write to and/or read from) the memory 30, which may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).
  • the memory 30 may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).
  • the inference switch 16 may further include software stored internally in, for example, memory 30, or stored in external memory (e.g., database 20) accessible by the inference switch 16 via an external connection.
  • the software may be executable by the processing circuitry 28.
  • the processing circuitry 28 may be configured to control any of the methods and/or processes described herein and/or to cause such methods, and/or processes to be performed, e.g., by the inference switch 16.
  • the memory 30 is configured to store data, programmatic software code and/or other information described herein.
  • the software may include instructions that, when executed by the processor 31 and/or processing circuitry 28, causes the processor 31 and/or processing circuitry 28 to perform the processes described herein with respect to the inference switch 16.
  • an apparatus may be configured to provide the inference switch 16 (e.g., DCIM 14 or one or more other entities).
  • the apparatus may include processing circuitry 28.
  • the processing circuitry 28 may configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center.
  • the processing circuitry 28 may also be configured to receive (via e.g., communication interface 26) an indication of a type of the data corresponding to the at least one operational parameter.
  • the processing circuitry 28 may be configured to recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input, select a MI model from a set of available MI models based on the type of the data, and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
  • the at least one data source includes at least one of at least one sensor 24 configured to measure at least one physical property of the at least one resource and at least one memory storing measurements from the at least one sensor 24.
  • the at least one resource includes at least a processing resource, a storage resource, and a network resource.
  • the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center 12, at least a second type of data relating to an aspect of power management at the at least one data center 12, and at least a third type of data relating to an aspect of cooling at the at least one data center 12.
  • the MI model is configured to receive the data (via e.g., communication interface 26) representing the at least one value
  • the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center 12.
  • the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center 12.
  • the apparatus may further include a container repository (e.g., memory 30, DB 20, etc.) storing equipment specifications for a plurality of resources at the at least one data center 12.
  • the processing circuitry 28 is configured to access the container repository and use at least a portion of the equipment specifications stored in the container repository to select the MI model from the set of available MI models.
  • the plurality of types of data recognizable by the processing circuitry 28 includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature differential.
  • CRAH computer room air handler
  • At least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source.
  • the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN.
  • the processing circuitry 28 is one of coupled to a data center infrastructure manager (DCIM) 14 and included in the DCIM 14.
  • selection of the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, 12 a data center 12 operational pattern reporting, and a fault management clustering.
  • the MI optimizer 18 includes a communication interface 32, processing circuitry 34, and memory 36.
  • the communication interface 32 may be configured to communicate with one or more elements in the system 10.
  • the communication interface 32 may be formed as or may include, for example, one or more radio frequency (RF) transmitters, one or more RF receivers, and/or one or more RF transceivers, and/or may be considered a radio interface.
  • the communication interface 32 may include a wired and/or a wireless interface. Wired connections associated with the communication interface 32 may include, for example, a high-speed serial or parallel interface, a bus, an optical connection, an Ethernet connection, and the like.
  • the processing circuitry 34 may include one or more processors 38 and memory, such as, the memory 36.
  • the processing circuitry 34 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions.
  • processors and/or processor cores and/or FPGAs Field Programmable Gate Array
  • ASICs Application Specific Integrated Circuitry
  • the processor 38 and/or the processing circuitry 34 may be configured to access (e.g., write to and/or read from) the memory 36, which may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).
  • the memory 36 may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).
  • the MI optimizer 18 may further include software stored internally in, for example, memory 36, or stored in external memory (e.g., database 20) accessible by the MI optimizer 18 via an external connection.
  • the software may be executable by the processing circuitry 34.
  • the processing circuitry 34 may be configured to control any of the methods and/or processes described herein and/or to cause such methods, and/or processes to be performed, e.g., by the MI optimizer 18.
  • the memory 36 is configured to store data, programmatic software code and/or other information described herein.
  • the software may include instructions that, when executed by the processor 38 and/or processing circuitry 34, causes the processor 38 and/or processing circuitry 34 to perform the processes described herein with respect to the MI optimizer 18.
  • an apparatus for DCIM 14 may be provided in some embodiment
  • the apparatus may include processing circuitry 34.
  • the processing circuitry 34 may be configured to obtain data (via e.g., communication interface 32) from at least one data source (e.g., DB 20, sensors 24, memory 36, etc.) of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12.
  • the processing circuitry 34 may be further configured to identify an occurrence of a trigger based on the obtained data; and, as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure.
  • ML machine learning
  • the ML optimization procedure may include selecting a machine intelligence (MI) model; receiving, from a database 20, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
  • the processing circuitry 34 is further configured to identify the occurrence of the trigger based on the obtained data by being configured to calculate a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to-performance ratio, determine whether to execute the ML optimization procedure.
  • the processing circuitry 34 is further configured to obtain the data (via e.g., communication interface 32) from the at least one data source (e.g., DB 20, sensors 24, memory 36, etc.), identify the occurrence of the trigger, and apply the obtained data to the trained MI model to produce the at least one recommendation periodically to provide dynamic recommendations for operation of the at least one data center 12.
  • the data source e.g., DB 20, sensors 24, memory 36, etc.
  • the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling.
  • the at least one recommendation includes an indication of at least one of a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center 12.
  • the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center 12.
  • the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio.
  • the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response.
  • the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
  • FIG. 5 depicts an alternative embodiment for an apparatus for an inference switch 16 associated with a DCIM 14, which apparatus may include an inference switch module 40.
  • the inference switch module 40 may be configured to obtain data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12.
  • the inference switch module 40 may also be configured to receive an indication of a type of the data corresponding to the at least one operational parameter; recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input; select a machine intelligence (MI) model from a set of available MI models based on the type of the data; and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
  • MI machine intelligence
  • FIG. 6 depicts an alternative embodiment for an apparatus for a machine intelligence (MI) optimizer 18 associated with a DCIM 14, which apparatus may include a data collection module 42 configured to obtain data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12.
  • the MI optimizer 18 may further include an identification module 44 configured to identify an occurrence of a trigger based on the obtained data, and a machine learning (ML) optimization module 46.
  • the ML optimization module 46 may be configured to, as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure.
  • the ML optimization procedure may include selecting a machine intelligence (MI) model; receiving, from a database 20, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
  • FIG. 7 is a flowchart illustrating an exemplary method for an inference switch 16 associated with a DCIM 14. The exemplary method may be implemented in the DCIM 14 or may, in some embodiments, be implemented in a device/apparatus separate from and in communication with the DCIM 14.
  • the method may include obtaining data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12 (block S50).
  • the method may further include receiving an indication of a type of the data corresponding to the at least one operational parameter (block S52) and recognizing the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input (block S54).
  • the method also includes selecting a machine intelligence (MI) model from a set of available MI models based on the type of the data (block S56); and outputting the selected MI model to be used to process the data for data center optimization based on at least the selected MI model (block S58).
  • MI machine intelligence
  • the at least one data source includes at least one of at least one sensor 24 configured to measure at least one physical property of the at least one resource and at least one memory (e.g., DB 20) storing measurements from the at least one sensor 24.
  • the at least one resource includes at least a processing resource, a storage resource, and a network resource.
  • the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center 12, at least a second type of data relating to an aspect of power management at the at least one data center 12, and at least a third type of data relating to an aspect of cooling at the at least one data center 12.
  • the MI model is configured to receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and based at least on the input, output at least one recommendation for the at least one operational parameter inferred according to the MI model.
  • the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center 12.
  • the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource (e.g., server 22) of the at least one data center 12.
  • the method further includes storing, at a container repository (e.g., DB 20), equipment specifications for a plurality of resources at the at least one data center 12; and accessing the container repository to use at least a portion of the equipment specifications stored in the container repository to select the MI model from the set of available MI models.
  • a container repository e.g., DB 20
  • the plurality of types of data that are recognizable includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature differential.
  • CRAH computer room air handler
  • At least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source (e.g., sensors 24, DB 20, etc.).
  • the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN.
  • selecting the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, a data center operational pattern reporting, and a fault management clustering.
  • FIG. 8 is a flowchart illustrating an exemplary method for an MI optimizer 18 associated with a DCIM 14.
  • the exemplary method includes obtaining data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12 (block S60).
  • the method further includes identifying an occurrence of a trigger based on the obtained data (block S62); and as a result of the occurrence of the trigger, executing a machine learning (ML) optimization procedure (block S64).
  • the ML optimization procedure includes selecting a machine intelligence (MI) model; and receiving, from a database 20, training data associated with the at least one operational parameter.
  • MI machine intelligence
  • the ML optimization procedure may further include training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
  • identifying the occurrence of the trigger based on the obtained data includes calculating a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to-performance ratio, determining whether to execute the ML optimization procedure.
  • obtaining data from at least one data source, identifying an occurrence of a trigger, and applying the obtained data to the trained MI model to produce at least one recommendation is performed periodically to provide dynamic recommendations for operation of the at least one data center 12.
  • the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling.
  • the at least one recommendation includes an indication of at least one of: a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center 12.
  • the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center 12.
  • the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio.
  • the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response.
  • the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
  • Some embodiments of the present disclosure provide a method and apparatus to measure power consumption of applications using DC 12 assets (e.g., servers, storage, networking, etc.), whereby the DCIM 14 controls DC 12 assets logically according to certain criteria and policies, and to recommend more economical operations.
  • DC 12 assets e.g., servers, storage, networking, etc.
  • the DCIM 14 controls DC 12 assets logically according to certain criteria and policies, and to recommend more economical operations.
  • the DCIM 14 may promote and/or recommend different actions such as, for example, a migration as the asset is updated; a consolidation of already used resources; a sleep mode of other resources; balancing metrics (e.g., balancing load, hot spot temperature areas, etc.) of DC 12 resources (e.g., servers, storage, networking, processing load, power, space, cooling, etc.) per zone, etc.
  • balancing metrics e.g., balancing load, hot spot temperature areas, etc.
  • DC 12 resources e.g., servers, storage, networking, processing load, power, space, cooling, etc.
  • some embodiments of the present disclosure may advantageously provide for an intelligent logical and/or physical inferred switch using a plurality of Machine Intelligence (MI) models (e.g., reinforcement learning, support vector machine, Bayesian network, etc.) that may be developed and that may consider one or more parameters and attributes in order to provide optimal predictive analytics for Cost/Energy Performance operation purposes.
  • MI Machine Intelligence
  • system specification library may be stored in, for example, a container repository and/or database 20, and may be used to assist in building the characteristics of the DC 12 systems for the MI models.
  • the system specification library may include global specifications for parameters at the data center 12 and/or for each zone of the data center 12.
  • the system specification library may include temperature and/or power thresholds, space-related thresholds for considering expansion (e.g., square footage available in the data center for additional equipment, racks, etc.), schematics for the DC 12 facility(ies)/building (e.g., floor space, raised floors, etc.), mechanical specifications (heating, ventilation, air conditioning, humidification equipment, pressurization etc.
  • the systems specifications library may also include equipment specifications from equipment manufacturers (e.g., processors, racks, boards, power supplies, etc.), such as, for example, cooling and energy requirements, processing speed, memory capacity, storage capacity, temperature thresholds, etc.
  • the systems specifications library may be modular, with each module representing specifications for a single type of equipment (e.g., all specs for a particular model of processor) and the library may be accessed and used by DC 12 personnel via, e.g., a software platform (e.g., software tool of the DCIM 14) where multiple instances of each module can be created and arranged to model the DC 12 for purposes of, for example, validating an MI model, validating a
  • embodiments of the present invention may utilize such data for data-driven modeling of various characteristics of the DC 12 globally and/or on a per zone basis.
  • Some embodiments may further use one or more MI models to provide operational recommendations in response to certain detected triggers or anomalies, in response to an autonomous identification (via e.g., a prediction by an MI model) of a potential weakness in the DC 12 design (e.g., hidden insights through learning from historical relationships, patterns, trends, etc. obtained from data inputs), and/or discovery (via e.g., an MI model) of potential areas of operational improvement beyond a minimum level of efficiency, etc.
  • an autonomous identification via e.g., a prediction by an MI model
  • discovery via e.g., an MI model of potential areas of operational improvement beyond a minimum level of efficiency, etc.
  • the DCIM 14 or an apparatus in communication with the DCIM 14 may obtain measurements of the energy performance based on consumption of the assets/resources at the DC 12. For example, energy performance based on power consumption associated with cooling may be obtained by the DCIM 14.
  • DCs 12 strive for optimal energy utilization.
  • a hypothetically efficient DC 12 is one where energy is used exclusively to power information technology (IT).
  • An index known as Power Usage Effectiveness (“PUE”) can be used to measure energy efficiency.
  • a PUE index of 1.0 represents an ideal data center where no energy is lost to the surrounding elements.
  • the DCIM 14 or an apparatus in communication with the DCIM 14 may determine and/or monitor the PUE index.
  • use and/or training of an MI model may be triggered by, for example, the PUE index at least meeting a threshold PUE index.
  • the MI model may be used to provide a recommended course of action (e.g., migrate resources/assets or power down resources/assets, etc.) to move the PUE index toward the ideal PUE index.
  • operational optimization actions may be provided for resources/assets at the DC 12 (e.g., a migration as the asset is updated; consolidation of already used resources may be proposed; sleep mode of other resources may be actuated; balancing operational metrics of DC 12 resources per zone may be monitored, etc.).
  • operational optimization actions/recommendations may be based on historical data and traffic and/or known or determined technological tendencies represented in an MI model.
  • the MI model may output a report of proposed migration(s)/resource allocation.
  • one or more Machine Learning techniques may be used to provide an inference for future re-allocation of resources.
  • Some machine learning techniques may include decision tree learning, association rule learning, artificial neural networks, deep learning, inductive logic programming, support vector machines, clustering, Bayesian networks, reinforcement learning, representation learning, learning classifier systems, etc.
  • One specific type of ML model, NN model will be described in more detail below; however, a multitude of different types of MI models are known to those of ordinary skill in the art and therefore all MI models that may be used with embodiments of the present disclosure will not be described in great detail.
  • a sample of related data may be obtained from the DCIM 14 at a given rate (or Sampling Rate).
  • Such sampling of data may be used to train an ML model and/or may be used as input into an ML model for outputting an inferred recommendation.
  • real-time streams of available information e.g., data from sensors 24 at the DC 12
  • a balancing diagram may be used for overall DC 12 operational optimization.
  • FIG. 9 is a visual representation of an exemplary balancing diagram for cost/performance validation.
  • a“balancing diagram” may be considered to be a representation of a multi-constraint optimization (e.g., cost ratio, PUE ratio, and cost/performance ratio) whereby a perfect triangle centered in the middle is the preferred optimization (or as close to being centered in the middle as can be reasonably obtained).
  • validation in MI may be considered an action to decide which MI model answers best which data set.
  • the balancing diagram may be configured to balance constraints specific to a particular zone in the DC 12.
  • one constraint may be a cost-to-performance ratio, which may, in some embodiments, be a ratio of an absolute monetary cost (e.g., US dollars) to a performance metric (e.g., power consumption).
  • another constraint may be a relative cost-to-performance ratio.
  • the relative operational cost may be 1,000,000 (optimized) as opposed to 1,300,000 (non-optimized). This may be considered an optimization of 30% or a 30% relative cost ratio. This may, of course, be different from 30% depending on, for example, the Power Grid Provider tariffing.
  • the PUE ratio may be defined as the total power consumption of a DC 12 divided by the Information and Communication Technology (ICT) only power consumption.
  • ICT Information and Communication Technology
  • a PUE of 1.4 means that the total power consumption is 1.4 x the power consumption of the ICT, since a DC 12 has ICT equipment and accessory equipment for power transport, cooling, heating, office equipment, etc.
  • a function represents a balancing diagram for re-allocation of resources.
  • Such balancing diagram may be considered to reflect the current and, in some embodiments, real-time network utilization, processing utilization and storage utilization of that particular zone.
  • FIG. 10 depicts an exemplary balancing diagram for network, processing, and storage constraints.
  • a zone may vary in size depending on the level of granularity required.
  • one or more threshold lines may be established to set a policy as to what consolidation of resources may produce a more centered triangle in the balancing diagram.
  • the function corresponding to the balancing diagram is configured to learn from past data (i.e., historical data) and/or expert knowledge that may be provided into a software platform by, for example, data entry or programming. Such function may learn from historical data in order to discover the optimal balanced combination between related constraints, such as, for example, network, processing, and storage.
  • the learning is a re-iterative process that fine-tunes as a function of time (learning time or rate) and may vary with respect to changing acquired historical data.
  • the function can be used as a preventive measure if, for example, a validation phase is included in the process.
  • the function may be used as a predictive measure in embodiments in which predictive analytics is used instead of validation in order to make a final decision based on learned patterns and trends.
  • the function may be the MI model or at least a portion of the MI model.
  • the function may be considered a balancing function that may be configured to balance multiple constraints.
  • one of the constraints for the exemplary balancing diagram shown is a network bandwidth, which may be defined in Megabits/per second (l0 6 bps) or Gigabits/per second (l0 9 bps), and may be considered to represent the speed of bits that flow on a given network link.
  • Another constraint shown is the processing effectiveness/utilization and may indicate how many instructions per second that a central processing unit (CPU), graphical processing unit (GPU), or network processing unit (NPU) can process. It is conventionally expressed in millions of instructions per second (MIPS). It may be considered a general measure of computing performance and, by implication, the amount of work a computer can do.
  • MIPS instructions per second
  • the storage response 10 is the storage response, which may be expressed in seconds and can vary with the type of storage technology used.
  • a hard disk drive HDD
  • Flash memory of about 100 usee (micro-seconds) and other types of non volatile memory of about 500 nsec (nano-seconds).
  • One primary consideration for the balancing diagram is to ensure that the three (3) main resources of the DC 12 (e.g., network, processing and storage) are well-configured relative to one another so that none of these resources becomes a bottleneck to the other.
  • the DC processing and storage may be determined to be relatively fast, but the DC network may be providing only 100 Mbps per link, which is not fast enough. Therefore, the other two resources (processing and storage) may suffer due to the sluggishness of the network performance since each these aspects of the DC 12 are dependent on one another (processing, storage, network).
  • an MI model may be used to consider so called what-if scenarios and to present the difference between current and future energy savings. In some embodiments, an MI model may be used to predict outcomes.
  • FIG. 11 depicts Power Phases I, II and III being balanced according to embodiments of the present disclosure.
  • an alarm threshold may be set to trigger a notification to installers to connect machines into a proper power phase so as to not overload a particular phase. For example, if the power on one phase exceeds an 80% threshold, the intelligent system (e.g., DCIM 14) may be triggered to analyze what can be done to re -balance the physical machines following rules and best practices.
  • DC 12 policies and associated objectives may be considered for the transition of applications onto new resources (logical or physical).
  • a balancing function e.g., network- storage-processing balancing function
  • a certain threshold e.g., balance metric not to exceed
  • This policy may correspond to a preventive action.
  • a policy may correspond to a certain threshold, which may be used to prevent undesired consequences (i.e., preventative) before they happen.
  • an MI model may include predictive analytics in order to identify patterns and trends for predictive action (e.g., discovering hidden patterns in data and using them to make predictions), rather than setting policies.
  • best practice rules are integrated into the machine learned models. In some embodiments, such best practice rules may be integrated into the DCIM 14 and/or into an MI model associated with the DCIM 14. In some embodiments, different model types may be added or modified based on operational expertise. The model type may provide all the policy rules that are part of the already known best practices. These best practice policy rules may be integrated into an MI model according to known techniques for building and training an MI model.
  • feedback to the tenants of the DC 12 may be provided by the DCIM 14, for current and future asset usage and/or modification.
  • the delta of economic improvement may be reported to tenants to demonstrate the effectiveness of using such techniques at the DC 12 as, for example, an advantage over other DCs that do not use such MI techniques to optimize operational parameters.
  • there may be a cost conversion function that may convert operational parameter improvements into dollar amounts saved (e.g., reduced power consumption due to recommendations predicted to result in reducing fan usage may be converted into dollar amounts saved due to the power consumption reduction). Even is the cost savings is an estimate, such data may be useful to tenants as well as decision-makers as to whether to implement the MI model
  • the MI model may output audit criteria that may result in improved DC 12 equipment maintenance and regulation compliance.
  • Some reporting nodes e.g., agents
  • Some reporting nodes may be defined within the DC 12 in order to align maintenance with regulation compliance. This may be performed by an auditing function that matches preplanned timely events for the maintenance of the assets.
  • One benefit may be to pass from a targeted operational efficiency to a future operational efficiency, based on the use of ML models.
  • at least one of the ML models may be considered an“E-Model”, where“E” means Efficiency.
  • Each of the example scenarios is provided to demonstrate at least three different areas of DC 12 operation in which monitoring certain data and triggering certain conditions, according to, for example, the method described below with respect to FIG. 14, can lead to using an MI model to provide a recommendation to, for example, improve or eliminate the triggering condition.
  • a load on one or more power distribution units (PDUs) at the DC 12 may be monitored.
  • PDUs power distribution units
  • an installer can read the display on the PDU itself and make a judgement call on which branch to use.
  • monitoring of the load may be used to provide a recommendation as to which outlet to use for assets connected to the PDUs.
  • it is considered a best practice to always use the same outlet for the same device.
  • an installer/technician has used the same outlet as proposed by the best practice, such technician should also ensure that the sum of the current for a particular circuit is less than a threshold current, e.g., 16A.
  • the triggering condition may be a threshold current value.
  • FIG. 12 depicts an example rack associated with at least 2 PDUs.
  • the system shown in FIG. 12 shows that branch 3 has a total power of 1,740 W, branch 2 has a total power of 2,100 W, and branch 1 has a total power of 2,340 W.
  • the system shown in FIG. 12 may have the following exemplary parameters:
  • a PDU should not exceed 40% of its maximum capacity, which may be, for example, 7.5 kW for the first two PDUs.
  • an alarm may be triggered by, for example, the DCIM 14, as a result of the PDU at least reaching 40% of its maximum capacity.
  • the sensors 24 may be used to detect a potential CRAH overload condition.
  • the CRAH overload condition may be considered a CRAH overload binary state associated with power consumption optimization between CRAH overload and server fans.
  • Data/measurements from the sensors 24 may be obtained, stored in for example the DB 20, and may be used to optimize the DC 12.
  • the data may be input into the inference switch 16 and an ML model selected based on the type of data.
  • the data may be analyzed by, for example, the MI optimizer 18, and may be used to identify a trigger that results in training an MI model and using the data as an input into the trained MI model.
  • an alarm e.g., audible, visual, etc.
  • an MI model may be used to infer a potential anomaly based on the sensor 24 data by, for example, inputting the sensor 24 data into the MI model and allowing the MI model to make intelligent inferences based on, for example, recognizing historical patterns or trends.
  • the MI model may recognize that processing loads are lowest at certain times of the day or year, and may therefore recommend that maintenance activities be scheduled at those times.
  • One or more metrics such as, for example, the rack cooling index, a quantifiable metric that measures the level of compliance with the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) standards, etc. may be used to determine a triggering condition.
  • ASHRAE publishes allowable temperature and humidity levels within data centers, setting clear operating
  • One advantage of installing environmental sensors 24 in the DC 12 is that, when judiciously placed, the sensors 24 can assist the facility operator with raising temperatures safely within certain compliance thresholds (e.g., 35 Centigrade). For example, if computing equipment is running too cold, there is a chance that the cooling equipment within the data center 12 is being over-used and thereby incurring a power overload. On the other hand, if equipment runs too hot, depending on the configuration of the service, energy consumption may become unnecessarily excessive due to the over-use of fans running in the servers. In other words, data from sensors 24 at the DC 12 (e.g., temperature sensors, humidity sensors, etc.) may be used to identify whether cooling equipment is being inefficiently used.
  • certain compliance thresholds e.g. 35 Centigrade
  • temperature differentials may be monitored by the sensors 24 (e.g., temperature sensors) at the racks/cabinets.
  • temperature differential may refer to an increase in temperature nearby DC 12 cabinets, which may be acceptable compared to real-time temperatures at the assets in the cabinets, but is demanding in terms of cooling the room.
  • data from temperature sensors 24 at the DC 12 may be obtained from each rack (e.g., there may be, for example, a set of 3 temperature sensors per rack), with the data from the set of rack sensors 24 representing an average temperature in each row. Such data may then be compared to information corresponding to the manufacturer temperatures for all assets/resources within each row. Such manufacturer information may be stored in, for example, the systems specifications library and/or the DB 20.
  • the data from rack sensors 24 may be continuously and/or periodically obtained and compared to identify the occurrence of a trigger.
  • the trigger may be, for example, when a measured temperature at least meets a temperature threshold in the row or room.
  • Information from the systems specifications library may be used to determine an appropriate temperature threshold value for the trigger.
  • the inference switch 16 may be in communication with an input identification unit and an MI model output unit.
  • the inference switch 16 may be considered a type of logical and/or physical selector switch configured to receive an input corresponding to a data type from e.g., the input identification unit, and identify the data type of the input (i.e., differentiate between different types of data that may be input into the switch 16).
  • the inference switch 16 may output an MI model to e.g., the MI model output unit.
  • the output MI model may be, for example, a Time Series + NN model, a support vector machine (SVM) model, a reinforcement learning model, etc.
  • the inference switch 16 may optimize operation of the DC 12 by providing an inference of a particular MI model that should be used for a particular scenario.
  • the inference switch 16 may be configured to identify at least one of the scenarios described herein above (e.g., power phase balancing, CRAH overload, and temperature differentials) by the type of data received as input and then select an MI model that matches (or most optimally fits) the scenario.
  • the MI model may already be trained and/or the MI model may require further training before being used by the DCIM 14.
  • the inference switch 16 may determine that the particular scenario is a phase balancing scenario and may therefore output an MI model suitable for analyzing current data to balance phases.
  • the inference switch 16 may determine that the particular scenario is a CRAH scenario and may therefore output an MI model suitable for analyzing temperature data.
  • an indication of the particular data type may be input into the inference switch 16.
  • the indicator may be, for example, an attribute, an index, a value, or a signal recognizable by the inference switch 16 as indicating a particular type of data of a set of available types of data recognizable by the inference switch 16 for responding with a particular MI model.
  • the three exemplary scenarios demonstrate that by using the inference switch 16, a suitable optimization MI model can be selected by looking at the data type available and switching to that specific MI model for analysis and optimization.
  • the particular MI model suitable for each of the scenarios of temperature differential, CRAH overload binary state, and balancing phases may be an RL model, an SVM model, and a Time Series/+ NN model, respectively. Once a suitable MI model is selected, the selected MI model may be executed and a data set input therein.
  • the inference switch 16 may include, for example, a“pull metric” set of inputs that samples on changes since the last optimization, memory (e.g., random access memory (RAM) memory, storage (e.g., a solid-state drive (SSD), etc.), a processor configured to execute instructions corresponding to one or more of the techniques described herein, and an output port that outputs the selected optimization trained model to process the data from a particular scenario.
  • memory e.g., random access memory (RAM) memory, storage (e.g., a solid-state drive (SSD), etc.
  • processor configured to execute instructions corresponding to one or more of the techniques described herein
  • an output port that outputs the selected optimization trained model to process the data from a particular scenario.
  • the inference switch 16 may be implemented in other ways.
  • the unit identification unit and MI model output unit shown in FIG. 13 may be implemented such that a portion of the unit is stored in a corresponding memory within a processing circuitry, or may be considered the processing circuitry.
  • the units as well as the inference switch 16 may be implemented in hardware or in a combination of hardware and software within processing circuitry of one or more devices/entities described herein.
  • the DCIM 14 may discover“n” servers 22 (block S70).
  • DCIM 14 discovery of“n” servers may be performed on a continuous and/or periodic basis in the background (“n” can be any number).
  • the DCIM 14 may request a metric from the DC 12 using, for example, signaling (e.g., Intelligent Platform Management Interface (IPMI)) (block S72).
  • IPMI Intelligent Platform Management Interface
  • the metric may include a CPU utilization percentage for each server 22.
  • the DCIM 14 may determine whether utilization is greater than or equal to a threshold (e.g., 80%) (block S74). If CPU utilization does not at least meet the threshold, the DCIM 14 may return to block S70 where the DCIM 14 continues to discover servers 22 at a sampling rate and repeats the process. If the CPU utilization does at least meet the threshold, the DCIM 14 may identify each of the assets (e.g., network and storage devices) that are associated with the overused CPU(s) (block S76). The DCIM 14 may then determine the power consumption of the identified assets of the overused CPU(s) and may convert such power consumption to cost (block S78).
  • a threshold e.g., 80%
  • the cost may be a monetary cost or a cost ratio.
  • the cost ratio may be a ratio of, for example, a cost associated with the determined power consumption and a cost associated with a target power consumption level.
  • the DCIM 14 may determine whether an MI optimization procedure should be triggered (block S80). It should be understood that different metrics can be used for the performance, and the cost may be the relative operating cost in terms of cost units. Optimization may be triggered if, for example, the cost-to- performance ratio at least meets a threshold ratio (e.g., 10% optimization).
  • Optimization actions may include e.g., workload balancing, adding additional processor cores, recommending balancing phases, etc. If optimization is not triggered, the process may return to block S72 (or block S70) to continue to sampling data. On the other hand, if optimization is triggered, the ML optimization procedure may be performed/executed (block S82). One example of the ML optimization procedure will be described in more detail with reference to FIG. 15. After the ML optimization procedure is executed, the DCIM 14 may determine the PUE as a result of the optimization and may store the PUE in the DB 20 (block S84). In some embodiments, a portion or all of the steps in the process may be executed on a per zone basis for each zone.
  • FIG. 15 an exemplary method for an ML optimization procedure, as discussed briefly in block S82 of FIG. 14, will now be described.
  • the steps shown in the flow chart of FIG. 15 may be one example implementation of block S82 of FIG. 14.
  • Data may be obtained from e.g., the DB 20 (block S90).
  • Clustering techniques may be applied to the obtained data (block S92). Based on the clustering, data may be divided into different categories (block S94). For example, data from different resources can be
  • Clustering techniques are known for building and training ML models and will therefore not be described in great detail herein.
  • emergency, cost, priority, and/or other informative data points may be identified in order to determine the accuracy needed for the ML model.
  • the data may be determined to be suitable for use with a NN model, which may be determined by, for example, the inference switch 16.
  • the particular data set obtained may be suitable for other types of MI models. It should be understood that many different types of MI models may be used with embodiments of the present disclosure and that the NN model is used as one example. Having identified an accuracy threshold required for the ML model, according to known techniques, the NN model may be trained and/or validated.
  • training the neural network may take a substantial amount of time depending on the type of neural network as well as on the configuration of the selected neural network. After a certain time period, the convergence is achieved and the neural network is trained and ready to use. Collecting samples and training a new NN model or update an existing NN model can be done at regular intervals, or when a number of samples have been accumulated, or according to any other suitable criteria. Examples of types of NN include Feedforward, Radial basis function, Recurrent, Modular, etc. and a person skilled in the art would know how to choose an
  • the NN model can be built based on a number of layer and a number of neurons in each layer. Some technical methods for dealing with overfitting or underfitting might also be applied. Overfitting occurs when the model is over specified. The outcome of overfitting is that the trained model is not general enough and catches too many details based on the training dataset. This leads to inaccurate prediction on a non-training dataset. Underfitting occurs when the model is under specified. The outcome of underfitting is that the trained model is too general and misses some details based on the training dataset. This also leads to inaccurate prediction on a non-training dataset.
  • NN models may include a plurality of layers and, more specifically, one or more“hidden layers” between at least one input layer and at least one output layer.
  • some NN models may be deep learning models and can comprise up to tens or hundreds of hidden layers depending on the complexity of the data samples to the process.
  • the coefficients (W, B) at each layer are tuned, during the learning process, to minimize a “loss function” during the training process.
  • the loss function is defined to measure how accurate the prediction is.
  • the coefficients in each layer may be given arbitrary numbers, at beginning. For simplicity, those arbitrary numbers might be given based on certain distribution around zero.
  • the neurons are trained to converge to certain values, which may form the model that can be used to perform the prediction on any non-training dataset.
  • the NN model may be qualified based on“n” hidden layers, where n can be any number, and a correlation between different data types may be performed (block S96). Weights and biases may also be configured for each hidden layer (block S98).
  • the DCIM 14 may determine whether the confidence margin/accuracy threshold is met by the NN model (block S100), e.g., validation of the NN model. If the accuracy threshold has not been met, the NN model may continue to train the NN model by“exercising” the NN layers (block, S102), according to known techniques for training NN model layers.
  • an inference network may be built (block S104).
  • a dynamic policy engine may be configured or built that includes a plurality of potential recommendations (e.g., migration resources, power down, re-balance power phases, etc.).
  • a recommendation may be provided (block S 106), e.g., recommend a re-balancing of 220 volt phases.
  • the recommendation may be provided by, for example, inputting obtained data from e.g., database 20 and/or sensors 24, into the trained and validated NN model and interpreting the output as a recommendation.
  • the data obtained to train the NN model may be considered historical data or training data; while the data obtained to output a recommendation or generate a prediction may be considered real time data or sampled data or a nontraining data set.
  • test data may be used for testing that the MI model has been trained properly. For example, a set of test data may be used to verify that the trained model meets certain predetermined criteria.
  • the training process for the MI model may occur in the background and may be considered“offline,” which may be distinguishable from monitored data that is provided continuously in the forefront.
  • the concepts described herein may be embodied as a method, data processing system, and/or computer program product. Accordingly, the concepts described herein may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects all generally referred to herein as a “circuit” or“module.” Furthermore, the disclosure may take the form of a computer program product on a tangible computer usable storage medium having computer program code embodied in the medium that can be executed by a computer. Any suitable tangible computer readable medium may be utilized including hard disks, CD-ROMs, electronic storage devices, optical storage devices, or magnetic storage devices.
  • These computer program instructions may also be stored in a computer readable memory or storage medium that can direct a computer or other
  • the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • Computer program code for carrying out operations of the concepts described herein may be written in an object oriented programming language such as Java® or C++.
  • the computer program code for carrying out operations of the disclosure may also be written in conventional procedural programming languages, such as the "C" programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer.
  • the remote computer may be connected to the user's computer through a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A method and apparatus are provided for an inference switch associated with a data center infrastructure manager (DCIM). The inference switch may be configured to: obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter associated with at least one resource of the at least one data center; receive an indication of a type of the data corresponding to the at least one operational parameter; recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input; select a machine intelligence (MI) model based on the type of the data; and output the selected MI model to be used to process the data for data center optimization.

Description

GLOBAL DATA CENTER COST/PERFORMANCE VALIDATION BASED
ON MACHINE INTELLIGENCE
TECHNICAL FIELD
The present disclosure relates to methods and apparatuses for data center infrastructure management and, in particular, to methods and apparatuses for global data center (DC) cost/performance validation based on machine intelligence (MI).
BACKGROUND
Cloud computing consists of transitioning computer services to offsite locations available on the Internet. The computers making up the cloud system can be virtualized in order to maximize the resources of the available physical computers. In today’s cloud computing industry, cloud-based models are being used as high economic opportunities to host public and private services by shifting applications to the cloud. However, managing and optimizing data center infrastructure and operational parameters is a major challenge facing data centers.
For example, one known method for managing energy consumption includes obtaining information, where the information includes calculated energy utilization for at least one application within a data center, the calculated energy consumption based on at least one trigger factor. This known method further includes identifying an energy optimization opportunity for at least one of the applications based on at least the obtained information and validating the energy optimization opportunity for at least one of the applications based at least in part on energy optimization for the data center.
However, this previous solution does not address overall/global DC cost/performance considerations. Further, existing methods of managing data center infrastructure typically rely on hardcoded algorithms for optimization. However, hardcoded algorithms are not dynamically adaptable to real-time operational parameters in the DC and do not provide for operational automation for DC infrastructure management. Existing DC infrastructure management techniques also do not include Machine Learning (ML), use of data mining for global DC
optimization, and/or predictive analytics for decision making. SUMMARY
Some embodiments of the present disclosure advantageously provide methods and apparatuses for using Machine Intelligence (MI) to implement intelligence into global management of DC operations, which includes at least energy management and resource balancing and may, in some embodiments, facilitate automated
cost/performance optimization procedures.
In some embodiments, a one-size-fits-all approach to using MI models is insufficient in certain scenarios. Therefore, such embodiments may provide for different ML models to answer different situations (e.g., policy generation, DC operation pattern reporting, fault management clustering, etc.).
In one embodiment, policy generation may be considered to be a definition of one condition, such as an operational parameter at least meeting a threshold, and the respective action to be taken in response to the occurrence of such condition, such as, a policy implementation (in other words, IF this condition, THEN this action). This can be generated by a Long and Short-Term Memory (LSTM), for example. In one embodiment, a DC Operational Pattern report may represent the general cost performance for certain applications. For example, an operational report may report all the video content running during the week with the number of resources used. This could show the context with time-of-the-day and day-of-the-week, for the media application and its associated relative cost in order to determine if the optimization is actually improving the situation. Thus, for example, if DC operational pattern reporting is desired, a particular MI model may be selected that is particularly suitable for identifying such patterns. In one embodiment, fault management clustering may include an aggregation of faults together to counteract the propagation of one fault into e.g., thousands of faults at different layers (e.g. physical, logical, service, application, etc.) of the DC 12 system. Thus, in one example, if fault management clustering is desired, a particular MI model suitable for such clustering may be selected. In a further aspect of some embodiments of the present disclosure, an inference switch may provide access to a plurality of ML models depending on the problem(s) to resolve, the type of analysis to be made, and/or the data set to be used as input. Some embodiments of the present disclosure advantageously provide for direct dependency modelling identified between cloud resource entities. Some embodiments advantageously provide for access to data correlation using historical data from DC repositories. Some embodiments advantageously provide for DC automation using Validation (semi-Automation) as an intermediate step. Some embodiments provide an advantage of utilizing MI to counter-act human error to prevent, for example, prolonged shutdown times for web services. Some
embodiments may further provide a more complete“reporting” taking into account data correlation and DC operation patterns. Some embodiments may advantageously provide policy generation for preventive actions that may be adjusted as a function of time and also a function of important DC operational metrics such as space, power and/or cooling. Some embodiments provide programmability of DC operation as opposed to hardcoded algorithms.
Some embodiments advantageously provide for the ability to switch optimally (e.g., inferred switch) between different MI models (e.g., reinforcement learning, support vector machine, neural network (NN), Bayesian network, etc.) to obtain higher levels of optimization. Some embodiments provide for a new paradigm that brings new functions between a plurality of users (end-users, enterprise, etc.) with respect to Evaluation of Cost/Energy Efficiency (EE) Performance, many users pooling of applications, virtual-computing, etc. The Cost/Performance ratio can apply to different performance metrics (e.g., latency, power consumption, etc.), one of which is EE. Energy Efficiency may be considered the comparison of Energy Consumption once optimized with respect to Energy Consumption before
optimization. EE may also be considered an Energy Consumption reduction percentage. For example, if Energy Consumption was reduced by 40% with optimization techniques, the cost may be considered reduced since less energy was used. Some embodiments determine how to balance cloud physical/logical resources to accommodate high performance (e.g., processing, storage delay, networking, etc.) as pooling becomes increasingly viable and profitable as a function of time.
According to one aspect of the disclosure, an apparatus for an inference switch associated with a data center infrastructure manager (DCIM) is provided. The apparatus includes processing circuitry and the processing circuitry is configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center. The processing circuitry is further configured to receive an indication of a type of the data corresponding to the at least one operational parameter and recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input. The processing circuitry may be further configured to select a machine intelligence (MI) model from a set of available MI models based on the type of the data and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
According to this aspect of the disclosure, in some embodiments, the at least one data source includes at least one of at least one sensor configured to measure at least one physical property of the at least one resource and at least one memory storing measurements from the at least one sensor. In some embodiments, the at least one resource includes at least a processing resource, a storage resource, and a network resource. In some embodiments, the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center, at least a second type of data relating to an aspect of power management at the at least one data center, and at least a third type of data relating to an aspect of cooling at the at least one data center. In some embodiments, the MI model is configured to: receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and based at least on the input, output at least one recommendation for the at least one operational parameter inferred according to the MI model. In some embodiments, the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center. In some embodiments, the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center. In some embodiments, the apparatus further includes a container repository storing equipment specifications for a plurality of resources at the at least one data center; and the processing circuitry is configured to access the container repository and use at least a portion of the equipment
specifications stored in the container repository to select the MI model from the set of available MI models. In some embodiments, the plurality of types of data
recognizable by the processing circuitry includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature
differential. In some embodiments, at least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source. In some embodiments, the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN. In some embodiments, the processing circuitry is one of coupled to a data center infrastructure manager (DCIM) and included in the DCIM.
In some embodiments, selection of the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, a data center operational pattern reporting, and a fault management clustering.
According to another aspect of the present disclosure, an apparatus for a machine intelligence (MI) optimizer associated with a data center infrastructure management (DCIM) is provided. The apparatus includes processing circuitry configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center; identify an occurrence of a trigger based on the obtained data; and as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure. The ML optimization procedure includes selecting a machine intelligence (MI) model; receiving, from a database, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model. According to this aspect, in some embodiments, the processing circuitry is further configured to identify the occurrence of the trigger based on the obtained data by being configured to calculate a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to- performance ratio, determine whether to execute the ML optimization procedure. In some embodiments, the processing circuitry is further configured to obtain the data from the at least one data source, identify the occurrence of the trigger, and apply the obtained data to the trained MI model to produce the at least one recommendation periodically to provide dynamic recommendations for operation of the at least one data center. In some embodiments, the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling. In some embodiments, the at least one recommendation includes an indication of at least one of: a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center. In some embodiments, the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center. In some embodiments, the at least one balancing function includes a cost- performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio. In some
embodiments, the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response. In some embodiments, the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization. According to yet another aspect of the disclosure, a method for an inference switch associated with a data center infrastructure manager (DCIM) is provided. The method includes obtaining data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center and receiving an indication of a type of the data corresponding to the at least one operational parameter. The method further includes recognizing the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input; selecting a machine intelligence (MI) model from a set of available MI models based on the type of the data; and outputting the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
According to this aspect, in some embodiments, the at least one data source includes at least one of at least one sensor configured to measure at least one physical property of the at least one resource and at least one memory storing measurements from the at least one sensor. In some embodiments, the at least one resource includes at least a processing resource, a storage resource, and a network resource. In some embodiments, the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center, at least a second type of data relating to an aspect of power management at the at least one data center, and at least a third type of data relating to an aspect of cooling at the at least one data center. In some embodiments, the MI model is configured to receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and based at least on the input, output at least one
recommendation for the at least one operational parameter inferred according to the MI model. In some embodiments, the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center. In some embodiments, the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center. In some embodiments, the method further comprises storing, at a container repository, equipment specifications for a plurality of resources at the at least one data center; and accessing the container repository to use at least a portion of the equipment specifications stored in the container repository to select the MI model from the set of available MI models. In some embodiments, the plurality of types of data that are recognizable includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature differential. In some embodiments, at least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source. In some embodiments, the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN. In some embodiments, selecting the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, a data center operational pattern reporting, and a fault management clustering.
According to another aspect of the disclosure, a method for a machine intelligence (MI) optimizer associated with a data center infrastructure management (DCIM) is provided. The method comprises obtaining data from at least one data source of at least one data center, the data representing at least one value
corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center;
identifying an occurrence of a trigger based on the obtained data; and as a result of the occurrence of the trigger, executing a machine learning (ML) optimization procedure. The ML optimization procedure includes selecting a machine intelligence (MI) model; receiving, from a database, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model. According to this aspect, in some embodiments, identifying the occurrence of the trigger based on the obtained data includes calculating a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to-performance ratio, determining whether to execute the ML optimization procedure. In some embodiments, obtaining data from at least one data source, identifying an occurrence of a trigger, and applying the obtained data to the trained MI model to produce at least one recommendation is performed periodically to provide dynamic recommendations for operation of the at least one data center. In some embodiments, the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling. In some embodiments, the at least one recommendation includes an indication of at least one of: a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center. In some embodiments, the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center. In some embodiments, the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio. In some embodiments, the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response. In some embodiments, the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
According to yet another alternative aspect of the present disclosure, an apparatus for an inference switch associated with a data center infrastructure manager (DCIM). The inference switch module is configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center;
and receive an indication of a type of the data corresponding to the at least one operational parameter. The interference switch module is further configured to recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input; select a machine intelligence (MI) model from a set of available MI models based on the type of the data; and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
According to another alternative aspect of the present disclosure, an apparatus for a machine intelligence (MI) optimizer associated with a data center infrastructure management (DCIM) is provided. The apparatus comprises a data collection module, an identification module, and a machine learning (ML) optimization module. The data collection module is configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center. The identification module is configured to identify an occurrence of a trigger based on the obtained data. The ML optimization module is configured to, as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure. The ML optimization procedure includes selecting a machine intelligence (MI) model; receiving, from a database, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
Other embodiments derived from combinations of other MI models, metrics and operational optimizations that do not depart from the general theme of
Cost/Performance optimization, may also be considered in accordance with principles of the present disclosure. BRIEF DESCRIPTION OF THE DRAWINGS
A more complete understanding of the present embodiments, and the attendant advantages and features thereof, will be more readily understood by reference to the following detailed description when considered in conjunction with the accompanying drawings wherein:
FIG. 1 is a block diagram of an exemplary system for global data DC cost/performance validation based on MI in accordance with principles of the present disclosure;
FIG. 2 is a block diagram of an alternate exemplary system for global data DC cost/performance validation based on MI in accordance with principles of the present disclosure;
FIG. 3 is a block diagram of an exemplary inference switch in accordance with principles of the present disclosure;
FIG. 4 is a block diagram of an exemplary MI optimizer in accordance with principles of the present disclosure;
FIG. 5 is a block diagram of an exemplary alternative embodiment of the inference switch in accordance with principles of the present disclosure;
FIG. 6 is a block diagram of an exemplary alternative embodiment of the MI optimizer in accordance with principles of the present disclosure;
FIG. 7 is a flow diagram illustrating an exemplary method for providing an inference switch according to one embodiment of the present disclosure;
FIG. 8 is flow diagram illustrating an exemplary method for providing MI optimization according to an alternative embodiment of the present disclosure;
FIG. 9 is a schematic diagram illustrating an exemplary cost/performance validation balancing diagram according to one embodiment of the present disclosure;
FIG. 10 is a schematic diagram illustrating another exemplary balancing diagram according to one embodiment of the present disclosure;
FIG. 11 is a schematic diagram illustrating an exemplary phase balancing diagram according to one embodiment of the present disclosure;
FIG. 12 is a schematic diagram of an exemplary power distribution unit (PDU) configuration according to one embodiment of the present disclosure; FIG. 13 is a schematic diagram of an exemplary arrangement of the inference switch according to one embodiment of the present disclosure;
FIG. 14 is a flow diagram illustrating an exemplary method for data center infrastructure management according to one embodiment of the present disclosure; and
FIG. 15 is a flow diagram illustrating an exemplary method of an ML optimization procedure according to one embodiment of the present disclosure.
DETAILED DESCRIPTION
Before describing in detail exemplary embodiments, it is noted that the embodiments reside primarily in combinations of apparatus components and processing steps related to global data center (DC) cost/performance validation based on MI. Accordingly, components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
As used herein, relational terms, such as“first” and“second,”“top” and “bottom,” and the like, may be used solely to distinguish one entity or element from another entity or element without necessarily requiring or implying any physical or logical relationship or order between such entities or elements. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the concepts described herein. As used herein, the singular forms“a”,“an” and“the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,”“comprising,”“includes” and/or“including” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In embodiments described herein, the joining term,“in communication with” and the like, may be used to indicate electrical or data communication, which may be accomplished by physical contact, induction, electromagnetic radiation, radio signaling, infrared signaling or optical signaling, for example. One having ordinary skill in the art will appreciate that multiple components may interoperate and modifications and variations are possible of achieving the electrical and data communication.
In some embodiments described herein, the term“coupled,”“connected,” and the like, may be used herein to indicate a connection, although not necessarily directly, and may include wired and/or wireless connections.
In some embodiments, the term“machine intelligence” and“machine learning” are used interchangeably. The terms machine learning and/or machine intelligence may be used herein to indicate methods and/or devices/apparatuses that use specific mathematical models, functions and/or algorithms to e.g., make predictions, make decisions, provide recommendations, uncover hidden insights or anomalies through learning from historical relationships, trends, patterns, and the like, which may be obtained by, for example, analyzing large data sets over a period of time, etc.
Note further, that functions described herein as being performed by an apparatus or an inference switch or an MI optimizer may be implemented in the data center infrastructure manager (DCIM) or may be distributed over a plurality of devices, which plurality of devices may include the DCIM and/or other devices. In other words, it is contemplated that the functions of the apparatuses described herein are not limited to performance by a single physical device and, in fact, can be distributed among several physical devices.
In some embodiments, a DCIM may be considered hardware and/or a set of software tools/programs configured to assist data center operators with organizing and managing information stored at a data center and information otherwise associated with the data center, such as, for example, facilities monitoring and access, asset/resource management, monitoring operational parameters of the data center, capacity planning, cable/connectivity planning, visualization, environmental and energy management, cost analytics, integration, etc.
In some embodiments, the term“optimization” may be used to indicate methods and/or devices/apparatuses that are configured to attempt to improve at least one metric of the data center and/or balance metrics to improve or at least maintain operational efficiency in one or more categories of data center operation (e.g., cooling, space, power, cost, performance, network capacity, storage capacity, etc.).
In some embodiments, the term“container repository” may be considered a database and/or a (virtual or physical) area of memory reserved for storing specific information.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In some embodiments, methods and apparatuses of the present disclosure provide for access to DCIM dependency models, which may build between cloud resource entities. Some embodiments advantageously provision the metrics correlations using historical data (e.g., for training, testing Machine Intelligence models, etc.) from DC repositories (e.g., Databases). Some embodiments provide for validation as an intermediate step based on recommended policies, whereby semi automation of the DC operations may be performed. In some embodiments, such policies may provide actions (e.g., a migration as an asset is updated, consolidation of already used resources, sleep mode of other resources, balancing metrics of DC resources per zone, etc.) that are adjusted as a function of time and also as a function of important operational metrics such as space, power and cooling (SPC). Some embodiments provide for reporting that takes into account correlation and operation patterns, and depends on policies recommended by an MI trained model (e.g., NN). Some embodiments provide an apparatus in the form of a logical inferred switch in order to choose between different MI models (e.g., reinforcement learning, Support Vector Machine (SVM), NN, Bayesian network, etc.) to obtain higher levels of optimization (as compared to a one- size-fits all MI model approach) depending on the problem to be solved (e.g., phase balancing, temperature differentials, power consumption due to cooling systems, etc.).
Some embodiments advantageously provide an apparatus that supports the inference switch processing and also complements the DCIM with Mi-based automation. Such apparatus may interface with the DCIM in order to treat the appropriate big data and run processing on capable processors (e.g. central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), etc.). In further embodiments, such apparatus may include non-volatile memory for big data handling and may be able to return a recommended optimization action (e.g., balance phases) using different selectable MI models, depending on the particular DC operational parameters to be optimized. In some embodiments, a predicted value may determine what action should be taken when power consumption optimization is required. Some embodiments provide for new use cases between a plurality of users (e.g., end-users, enterprise-users, etc.) with respect to Evaluation of Cost/EE performance, many users pooling of applications, virtual-computing, etc. Some embodiments provide for a method to balance cloud physical/logical resources in order to accommodate high performance (processing, storage delay, networking, etc.) as pooling of applications becomes more viable and profitable as a function of time. Some embodiments provide an Energy-method (E-method) that encompasses various scopes of the whole system; meaning that although there may be demarcations between scopes, requests can be sent between demarcation interfaces.
Referring now to the drawings, in which like reference designators refer to like elements, there is shown in FIG. 1, an exemplary system, and its related components, constructed in accordance with the principles of the present disclosure and designated generally as“10.” Referring to FIG. 1, system 10 may include a data center 12, a data center infrastructure manager (DCIM) 14, an inference switch 16, a machine intelligence (MI) optimizer 18, and a database (DB) 20 in communication with one another, via one or more communication links, paths, connections, and/or networks using one or more communication protocols, where the DC 12, the DCIM 14, the inference switch 16, the MI optimizer 18, and the DB 20 may be configured to perform one or more of the processes and/or techniques described herein. Although the system 10 shown in FIG. 1 depicts a single DC 12, a single DCIM 14, a single inference switch 16, a single MI optimizer 18, and a single DB 20 it is contemplated that the system 10 may include any number of DCs 12, DCIMs 14, inference switches 16, MI optimizers 18, and/or DBs 20. Furthermore, the connections illustrated in the system 10 in FIG. 1 are exemplary and it should be understood that the system 10 entities may be connected with one another, directly and/or indirectly, via more than one connection and/or over more than one network. Also, although the DCIM 14, inference switch 16, MI optimizer 18, and DB 20 are shown in FIG. 1 (as well as FIG. 2) as external to the DC 12, in some embodiments, one or more of such entities may be internal to the DC 12 and/or the DC internal network.
The data center 12 may include multiple servers 22a, 22b, and 22c (referred to collectively herein as“servers 22”) running various applications and services.
Although three servers 22a, 22b and 22c are shown, the disclosure is not limited to three servers. The data center 12 may also include multiple sensors 24a, 24b, and 24c (referred to collectively herein as“sensors 24”). Although three sensors 24a, 24b, and 24c are shown, the disclosure is not limited to three sensors. The sensors 24 may be configured to measure at least one physical property (e.g., temperature, power, current, etc.) associated with the data center and, in some embodiments, the resources at the data center 12. The sensors 24 may further include memory that stores measurements from the sensor 24 and may also be in communication, via a wired and/or wireless connection, to a container repository for data, such as, for example, the DB 20. It is within the scope of the present disclosure that DB 20 be considered any type of database including but not limited to any combination of one or more of a relational database, an operational database, or a distributed database. The servers 22 and sensors 24 may be communicatively coupled over an internal network within the DC 12.
The data center 12 may be segmented into a plurality of zones, such as, for example, zone 1, zone 2... zone n, depicted in FIG. 1, where n can be any number greater than 1. Each zone may be considered a physical area or region of the data center 12 and, in some embodiments, certain operational parameters may be managed and/or optimized on a per- zone basis. In some embodiments, global DC optimization may include consideration of the per-zone operational parameters on the overall global operational efficiency of the DC.
In some embodiments, the DCIM 14 may be considered an apparatus/device and/or system that monitors, manages, and/or controls data center utilization and energy consumption of resources (e.g., servers 22, storage, processors, network elements, etc.) and facility infrastructure (e.g., power distribution units (PDUs), computer room air handlers (CRAH), etc.). In some embodiments, the DCIM 14 may also refer to the software and/or computer instructions executable by processor(s) to monitor, manage, and/or control DC utilization, resources, and facility infrastructure.
In some embodiments, the inference switch 16 may be considered an apparatus/device configured to perform the techniques described herein with respect to the inference switch 16 and/or may also refer to the software and/or computer instructions executable by processor(s) to perform such techniques. In some embodiments, the inference switch 16 may be configured to select one of a set of MI models based on a type of input data and may output the selected MI model for use to provide, for example, a recommendation for optimizing operational parameters of the DC 12.
In some embodiments, the MI optimizer 18 may be considered an
apparatus/device configured to perform the techniques described herein with respect to the MI optimizer 18 and/or may also refer to the software and/or computer instructions executable by processor(s) to perform such techniques. In some embodiments, the MI optimizer 18 may be configured to select an MI model and train the MI model in response to a triggering event and apply/input data to the trained MI model to produce DC optimization recommendations.
FIG. 2 illustrates an alternate embodiment of system 10. In this embodiment, DCIM 14 includes the combined functions of the inference switch 16 and MI optimizer 18 into one entity to provide global DC optimization using MI within data center 12 as discussed herein. As can be seen in a comparison of FIGS. 1 and 2, the inference switch 16 and MI optimizer 18 may be implemented in separate devices and may in other embodiments be combined so as to be implemented on a single device. DCIM 14 may include a database 20 or have access to an external database 20 which may store sensor measurements and other the trigger information obtained for global DC optimization using MI.
Referring now to FIG. 3, with brief reference to FIGS. 1 and 2, in one embodiment, the inference switch 16 includes a communication interface 26, processing circuitry 28, and memory 30. The communication interface 26 may be configured to communicate with one or more of the elements in the system 10. In some embodiments, the communication interface 26 may be formed as or may include, for example, one or more radio frequency (RF) transmitters, one or more RF receivers, and/or one or more RF transceivers, and/or may be considered a radio interface. In some embodiments, the communication interface 26 may include a wired and/or a wireless interface. Wired connections associated with the
communication interface 26 may include, for example, a high-speed serial or parallel interface, a bus, an optical connection, an Ethernet connection, and the like.
The processing circuitry 28 may include one or more processors 31 and memory, such as, the memory 30. In particular, in addition to a traditional processor and memory, the processing circuitry 28 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions. The processor 31 and/or the processing circuitry 28 may be configured to access (e.g., write to and/or read from) the memory 30, which may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).
Thus, the inference switch 16 may further include software stored internally in, for example, memory 30, or stored in external memory (e.g., database 20) accessible by the inference switch 16 via an external connection. The software may be executable by the processing circuitry 28. The processing circuitry 28 may be configured to control any of the methods and/or processes described herein and/or to cause such methods, and/or processes to be performed, e.g., by the inference switch 16. The memory 30 is configured to store data, programmatic software code and/or other information described herein. In some embodiments, the software may include instructions that, when executed by the processor 31 and/or processing circuitry 28, causes the processor 31 and/or processing circuitry 28 to perform the processes described herein with respect to the inference switch 16.
For example, an apparatus may be configured to provide the inference switch 16 (e.g., DCIM 14 or one or more other entities). The apparatus may include processing circuitry 28. The processing circuitry 28 may configured to obtain data from at least one data source of at least one data center, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center. The processing circuitry 28 may also be configured to receive (via e.g., communication interface 26) an indication of a type of the data corresponding to the at least one operational parameter. The processing circuitry 28 may be configured to recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input, select a MI model from a set of available MI models based on the type of the data, and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model. In some embodiments, the at least one data source includes at least one of at least one sensor 24 configured to measure at least one physical property of the at least one resource and at least one memory storing measurements from the at least one sensor 24. In some embodiments, the at least one resource includes at least a processing resource, a storage resource, and a network resource. In some
embodiments, the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center 12, at least a second type of data relating to an aspect of power management at the at least one data center 12, and at least a third type of data relating to an aspect of cooling at the at least one data center 12. In some embodiments, the MI model is configured to receive the data (via e.g., communication interface 26) representing the at least one value
corresponding to the at least one operational parameter as an input, and based at least on the input, output at least one recommendation for the at least one operational parameter inferred according to the MI model. In some embodiments, the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center 12. In some embodiments, the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center 12.
In some embodiments, the apparatus may further include a container repository (e.g., memory 30, DB 20, etc.) storing equipment specifications for a plurality of resources at the at least one data center 12. In some embodiments, the processing circuitry 28 is configured to access the container repository and use at least a portion of the equipment specifications stored in the container repository to select the MI model from the set of available MI models. In some embodiments, the plurality of types of data recognizable by the processing circuitry 28 includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature differential. In some embodiments, at least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source. In some embodiments, the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN. In some embodiments, the processing circuitry 28 is one of coupled to a data center infrastructure manager (DCIM) 14 and included in the DCIM 14. In some embodiments, selection of the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, 12 a data center 12 operational pattern reporting, and a fault management clustering.
Referring now to FIG. 4, with brief reference to FIGS. 1 and 2, in another embodiment, the MI optimizer 18 includes a communication interface 32, processing circuitry 34, and memory 36. The communication interface 32 may be configured to communicate with one or more elements in the system 10. In some embodiments, the communication interface 32 may be formed as or may include, for example, one or more radio frequency (RF) transmitters, one or more RF receivers, and/or one or more RF transceivers, and/or may be considered a radio interface. In some embodiments, the communication interface 32 may include a wired and/or a wireless interface. Wired connections associated with the communication interface 32 may include, for example, a high-speed serial or parallel interface, a bus, an optical connection, an Ethernet connection, and the like.
The processing circuitry 34 may include one or more processors 38 and memory, such as, the memory 36. In particular, in addition to a traditional processor and memory, the processing circuitry 34 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions. The processor 38 and/or the processing circuitry 34 may be configured to access (e.g., write to and/or read from) the memory 36, which may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).
Thus, the MI optimizer 18 may further include software stored internally in, for example, memory 36, or stored in external memory (e.g., database 20) accessible by the MI optimizer 18 via an external connection. The software may be executable by the processing circuitry 34. The processing circuitry 34 may be configured to control any of the methods and/or processes described herein and/or to cause such methods, and/or processes to be performed, e.g., by the MI optimizer 18. The memory 36 is configured to store data, programmatic software code and/or other information described herein. In some embodiments, the software may include instructions that, when executed by the processor 38 and/or processing circuitry 34, causes the processor 38 and/or processing circuitry 34 to perform the processes described herein with respect to the MI optimizer 18.
For example, an apparatus for DCIM 14 may be provided in some
embodiments. The apparatus may include processing circuitry 34. The processing circuitry 34 may configured to obtain data (via e.g., communication interface 32) from at least one data source (e.g., DB 20, sensors 24, memory 36, etc.) of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12. The processing circuitry 34 may be further configured to identify an occurrence of a trigger based on the obtained data; and, as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure. The ML optimization procedure may include selecting a machine intelligence (MI) model; receiving, from a database 20, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model. In some embodiments, the processing circuitry 34 is further configured to identify the occurrence of the trigger based on the obtained data by being configured to calculate a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to-performance ratio, determine whether to execute the ML optimization procedure. In some embodiments, the processing circuitry 34 is further configured to obtain the data (via e.g., communication interface 32) from the at least one data source (e.g., DB 20, sensors 24, memory 36, etc.), identify the occurrence of the trigger, and apply the obtained data to the trained MI model to produce the at least one recommendation periodically to provide dynamic recommendations for operation of the at least one data center 12. In some
embodiments, the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling. In some embodiments, the at least one recommendation includes an indication of at least one of a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center 12. In some embodiments, the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center 12. In some embodiments, the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio. In some embodiments, the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response. In some embodiments, the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
FIG. 5 depicts an alternative embodiment for an apparatus for an inference switch 16 associated with a DCIM 14, which apparatus may include an inference switch module 40. In this embodiment, the inference switch module 40 may be configured to obtain data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12. The inference switch module 40 may also be configured to receive an indication of a type of the data corresponding to the at least one operational parameter; recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input; select a machine intelligence (MI) model from a set of available MI models based on the type of the data; and output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
FIG. 6 depicts an alternative embodiment for an apparatus for a machine intelligence (MI) optimizer 18 associated with a DCIM 14, which apparatus may include a data collection module 42 configured to obtain data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12. The MI optimizer 18 may further include an identification module 44 configured to identify an occurrence of a trigger based on the obtained data, and a machine learning (ML) optimization module 46. The ML optimization module 46 may be configured to, as a result of the occurrence of the trigger, execute a machine learning (ML) optimization procedure. The ML optimization procedure may include selecting a machine intelligence (MI) model; receiving, from a database 20, training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model. FIG. 7 is a flowchart illustrating an exemplary method for an inference switch 16 associated with a DCIM 14. The exemplary method may be implemented in the DCIM 14 or may, in some embodiments, be implemented in a device/apparatus separate from and in communication with the DCIM 14. The method may include obtaining data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12 (block S50). The method may further include receiving an indication of a type of the data corresponding to the at least one operational parameter (block S52) and recognizing the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input (block S54). The method also includes selecting a machine intelligence (MI) model from a set of available MI models based on the type of the data (block S56); and outputting the selected MI model to be used to process the data for data center optimization based on at least the selected MI model (block S58). In some embodiments, the at least one data source includes at least one of at least one sensor 24 configured to measure at least one physical property of the at least one resource and at least one memory (e.g., DB 20) storing measurements from the at least one sensor 24. In some embodiments, the at least one resource includes at least a processing resource, a storage resource, and a network resource. In some embodiments, the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center 12, at least a second type of data relating to an aspect of power management at the at least one data center 12, and at least a third type of data relating to an aspect of cooling at the at least one data center 12. In some embodiments, the MI model is configured to receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and based at least on the input, output at least one recommendation for the at least one operational parameter inferred according to the MI model. In some embodiments, the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center 12. In some embodiments, the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource (e.g., server 22) of the at least one data center 12. In some embodiments, the method further includes storing, at a container repository (e.g., DB 20), equipment specifications for a plurality of resources at the at least one data center 12; and accessing the container repository to use at least a portion of the equipment specifications stored in the container repository to select the MI model from the set of available MI models. In some embodiments, the plurality of types of data that are recognizable includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature differential. In some embodiments, at least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source (e.g., sensors 24, DB 20, etc.). In some embodiments, the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN. In some embodiments, selecting the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center, a data center operational pattern reporting, and a fault management clustering.
FIG. 8 is a flowchart illustrating an exemplary method for an MI optimizer 18 associated with a DCIM 14. The exemplary method includes obtaining data from at least one data source of at least one data center 12, the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center 12 (block S60). The method further includes identifying an occurrence of a trigger based on the obtained data (block S62); and as a result of the occurrence of the trigger, executing a machine learning (ML) optimization procedure (block S64). The ML optimization procedure includes selecting a machine intelligence (MI) model; and receiving, from a database 20, training data associated with the at least one operational parameter. The ML optimization procedure may further include training the MI model using the training data associated with the at least one operational parameter; and applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model. In some embodiments, identifying the occurrence of the trigger based on the obtained data includes calculating a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and based on the cost-to-performance ratio, determining whether to execute the ML optimization procedure. In some embodiments, obtaining data from at least one data source, identifying an occurrence of a trigger, and applying the obtained data to the trained MI model to produce at least one recommendation is performed periodically to provide dynamic recommendations for operation of the at least one data center 12. In some embodiments, the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling. In some embodiments, the at least one recommendation includes an indication of at least one of: a migration of at least one resource; a consolidation of a plurality of resources; a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center 12. In some embodiments, the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center 12. In some embodiments, the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio. In some embodiments, the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response. In some embodiments, the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization. Having described some embodiments of the present disclosure, a more detailed description of some of the embodiments will now be described below, with reference primarily to FIGS. 9-15.
Some embodiments of the present disclosure provide a method and apparatus to measure power consumption of applications using DC 12 assets (e.g., servers, storage, networking, etc.), whereby the DCIM 14 controls DC 12 assets logically according to certain criteria and policies, and to recommend more economical operations. In some embodiments, there may be a semi-automation stage whereby validation by DC operator is used as intermediate step. In some embodiments, the DCIM 14 may promote and/or recommend different actions such as, for example, a migration as the asset is updated; a consolidation of already used resources; a sleep mode of other resources; balancing metrics (e.g., balancing load, hot spot temperature areas, etc.) of DC 12 resources (e.g., servers, storage, networking, processing load, power, space, cooling, etc.) per zone, etc. For such purposes, some embodiments of the present disclosure may advantageously provide for an intelligent logical and/or physical inferred switch using a plurality of Machine Intelligence (MI) models (e.g., reinforcement learning, support vector machine, Bayesian network, etc.) that may be developed and that may consider one or more parameters and attributes in order to provide optimal predictive analytics for Cost/Energy Performance operation purposes.
Some embodiments of the present disclosure provide a system specification library that may be stored in, for example, a container repository and/or database 20, and may be used to assist in building the characteristics of the DC 12 systems for the MI models. In one embodiment, the system specification library may include global specifications for parameters at the data center 12 and/or for each zone of the data center 12. For example, the system specification library may include temperature and/or power thresholds, space-related thresholds for considering expansion (e.g., square footage available in the data center for additional equipment, racks, etc.), schematics for the DC 12 facility(ies)/building (e.g., floor space, raised floors, etc.), mechanical specifications (heating, ventilation, air conditioning, humidification equipment, pressurization etc. for the infrastructure of the DC 12), electrical specifications for the DC 12 facility(ies) (e.g., electrical configurations, voltage and other power requirements, information on back-up power sources, generators, switches, etc.), fire and other security systems specifications, environmental control systems specifications, etc. In some embodiments, further information derived from the systems specifications and other information may be included in the systems specification or another database 20, e.g., cable routing, identification of critical servers, network architecture, etc. The systems specifications library may also include equipment specifications from equipment manufacturers (e.g., processors, racks, boards, power supplies, etc.), such as, for example, cooling and energy requirements, processing speed, memory capacity, storage capacity, temperature thresholds, etc. Since there may be multiple instances of the same type of equipment in the DC 12, in one embodiment, the systems specifications library may be modular, with each module representing specifications for a single type of equipment (e.g., all specs for a particular model of processor) and the library may be accessed and used by DC 12 personnel via, e.g., a software platform (e.g., software tool of the DCIM 14) where multiple instances of each module can be created and arranged to model the DC 12 for purposes of, for example, validating an MI model, validating a
recommendation output by the MI model, selecting the MI model, and/or building the MI model.
As is apparent, there is a multitude of data associated with the DC 12 and embodiments of the present invention may utilize such data for data-driven modeling of various characteristics of the DC 12 globally and/or on a per zone basis. Some embodiments may further use one or more MI models to provide operational recommendations in response to certain detected triggers or anomalies, in response to an autonomous identification (via e.g., a prediction by an MI model) of a potential weakness in the DC 12 design (e.g., hidden insights through learning from historical relationships, patterns, trends, etc. obtained from data inputs), and/or discovery (via e.g., an MI model) of potential areas of operational improvement beyond a minimum level of efficiency, etc.
In a first embodiment of the present disclosure, the DCIM 14 or an apparatus in communication with the DCIM 14 may obtain measurements of the energy performance based on consumption of the assets/resources at the DC 12. For example, energy performance based on power consumption associated with cooling may be obtained by the DCIM 14. Generally, DCs 12 strive for optimal energy utilization. Ideally, a hypothetically efficient DC 12 is one where energy is used exclusively to power information technology (IT). An index known as Power Usage Effectiveness (“PUE”) can be used to measure energy efficiency. A PUE index of 1.0 represents an ideal data center where no energy is lost to the surrounding elements. The DCIM 14 or an apparatus in communication with the DCIM 14 may determine and/or monitor the PUE index. In some embodiments, use and/or training of an MI model may be triggered by, for example, the PUE index at least meeting a threshold PUE index. Essentially, in this manner, the MI model may be used to provide a recommended course of action (e.g., migrate resources/assets or power down resources/assets, etc.) to move the PUE index toward the ideal PUE index.
It should be understood that where the present disclosure discusses methods and techniques to be performed by the DCIM 14, such methods and techniques may also be performed in some embodiments by an apparatus/device in communication with the DCIM 14 (e.g., as depicted in FIG. 1), even if not expressly stated in the description.
In some embodiments of the present disclosure, operational optimization actions may be provided for resources/assets at the DC 12 (e.g., a migration as the asset is updated; consolidation of already used resources may be proposed; sleep mode of other resources may be actuated; balancing operational metrics of DC 12 resources per zone may be monitored, etc.). In some embodiments, such operational optimization actions/recommendations may be based on historical data and traffic and/or known or determined technological tendencies represented in an MI model.
In one embodiment, the MI model may output a report of proposed migration(s)/resource allocation. In some embodiments, one or more Machine Learning techniques may be used to provide an inference for future re-allocation of resources. Some machine learning techniques that may be used with some embodiments of the present disclosure may include decision tree learning, association rule learning, artificial neural networks, deep learning, inductive logic programming, support vector machines, clustering, Bayesian networks, reinforcement learning, representation learning, learning classifier systems, etc. One specific type of ML model, NN model, will be described in more detail below; however, a multitude of different types of MI models are known to those of ordinary skill in the art and therefore all MI models that may be used with embodiments of the present disclosure will not be described in great detail.
Based on features of, e.g., Cost/Performance, a sample of related data (e.g., processor utilization, bandwidth utilization, storage utilization, etc.) may be obtained from the DCIM 14 at a given rate (or Sampling Rate). Such sampling of data may be used to train an ML model and/or may be used as input into an ML model for outputting an inferred recommendation. In some embodiments, real-time streams of available information (e.g., data from sensors 24 at the DC 12) may be inputs into a selected MI model according to embodiments of the present disclosure, which may result in an output corresponding to an operational recommendation (e.g., migration, resource allocation, etc.).
In some embodiments, a balancing diagram may be used for overall DC 12 operational optimization. FIG. 9 is a visual representation of an exemplary balancing diagram for cost/performance validation. In some embodiments, a“balancing diagram” may be considered to be a representation of a multi-constraint optimization (e.g., cost ratio, PUE ratio, and cost/performance ratio) whereby a perfect triangle centered in the middle is the preferred optimization (or as close to being centered in the middle as can be reasonably obtained). In some embodiments, validation in MI may be considered an action to decide which MI model answers best which data set. As indicated in FIG. 9, in some embodiments, the balancing diagram may be configured to balance constraints specific to a particular zone in the DC 12. For example, one constraint may be a cost-to-performance ratio, which may, in some embodiments, be a ratio of an absolute monetary cost (e.g., US dollars) to a performance metric (e.g., power consumption). As shown in FIG. 9, another constraint may be a relative cost-to-performance ratio. For example, to get 30% “Power Consumption” reduction with respect to the usage without optimization (baseline), the relative operational cost may be 1,000,000 (optimized) as opposed to 1,300,000 (non-optimized). This may be considered an optimization of 30% or a 30% relative cost ratio. This may, of course, be different from 30% depending on, for example, the Power Grid Provider tariffing. Thus, in some cases, for example, a 30% Power Consumption Reduction may lead to a 20% relative operational cost reduction. Yet another constraint depicted in FIG. 9 is a Power Usage Effectiveness ratio (PUE ratio). In one embodiment, the PUE ratio may be defined as the total power consumption of a DC 12 divided by the Information and Communication Technology (ICT) only power consumption. For instance, a PUE of 1.4 means that the total power consumption is 1.4 x the power consumption of the ICT, since a DC 12 has ICT equipment and accessory equipment for power transport, cooling, heating, office equipment, etc.
In one embodiment, for a given zone at the DC 12, a function represents a balancing diagram for re-allocation of resources. Such balancing diagram may be considered to reflect the current and, in some embodiments, real-time network utilization, processing utilization and storage utilization of that particular zone. FIG. 10 depicts an exemplary balancing diagram for network, processing, and storage constraints. A zone may vary in size depending on the level of granularity required.
In some embodiments, as part of the balancing diagram, one or more threshold lines may be established to set a policy as to what consolidation of resources may produce a more centered triangle in the balancing diagram. In some embodiments, the function corresponding to the balancing diagram is configured to learn from past data (i.e., historical data) and/or expert knowledge that may be provided into a software platform by, for example, data entry or programming. Such function may learn from historical data in order to discover the optimal balanced combination between related constraints, such as, for example, network, processing, and storage. In some embodiments, the learning is a re-iterative process that fine-tunes as a function of time (learning time or rate) and may vary with respect to changing acquired historical data. In some embodiments, the function can be used as a preventive measure if, for example, a validation phase is included in the process. In some embodiments, the function may be used as a predictive measure in embodiments in which predictive analytics is used instead of validation in order to make a final decision based on learned patterns and trends. In some embodiments, the function may be the MI model or at least a portion of the MI model. In some embodiments, the function may be considered a balancing function that may be configured to balance multiple constraints.
Referring still to FIG. 10, one of the constraints for the exemplary balancing diagram shown is a network bandwidth, which may be defined in Megabits/per second (l06bps) or Gigabits/per second (l09bps), and may be considered to represent the speed of bits that flow on a given network link. Another constraint shown is the processing effectiveness/utilization and may indicate how many instructions per second that a central processing unit (CPU), graphical processing unit (GPU), or network processing unit (NPU) can process. It is conventionally expressed in millions of instructions per second (MIPS). It may be considered a general measure of computing performance and, by implication, the amount of work a computer can do. Yet another constraint of the balancing diagram shown in FIG. 10 is the storage response, which may be expressed in seconds and can vary with the type of storage technology used. For example, a hard disk drive (HDD) may give a response of about 4 msec; Flash memory of about 100 usee (micro-seconds) and other types of non volatile memory of about 500 nsec (nano-seconds). One primary consideration for the balancing diagram is to ensure that the three (3) main resources of the DC 12 (e.g., network, processing and storage) are well-configured relative to one another so that none of these resources becomes a bottleneck to the other. For example, the DC processing and storage may be determined to be relatively fast, but the DC network may be providing only 100 Mbps per link, which is not fast enough. Therefore, the other two resources (processing and storage) may suffer due to the sluggishness of the network performance since each these aspects of the DC 12 are dependent on one another (processing, storage, network).
In some embodiments, an MI model may be used to consider so called what-if scenarios and to present the difference between current and future energy savings. In some embodiments, an MI model may be used to predict outcomes. Yet another exemplary balancing diagram for another scenario at a DC 12 is depicted in FIG. 11, showing Power Phases I, II and III being balanced according to embodiments of the present disclosure. In order to balance phases, an alarm threshold may be set to trigger a notification to installers to connect machines into a proper power phase so as to not overload a particular phase. For example, if the power on one phase exceeds an 80% threshold, the intelligent system (e.g., DCIM 14) may be triggered to analyze what can be done to re -balance the physical machines following rules and best practices. A more detailed description of this scenario is provided below with reference to FIG. 12. In some embodiments, DC 12 policies and associated objectives may be considered for the transition of applications onto new resources (logical or physical). In one embodiment, once a balancing function (e.g., network- storage-processing balancing function) outputs a diagram, a certain threshold (e.g., balance metric not to exceed) is interpreted by a policy function to recommend a new consolidation of resources in an additional zone. This policy may correspond to a preventive action. In other words, a policy may correspond to a certain threshold, which may be used to prevent undesired consequences (i.e., preventative) before they happen. In some embodiments, an MI model may include predictive analytics in order to identify patterns and trends for predictive action (e.g., discovering hidden patterns in data and using them to make predictions), rather than setting policies. In some embodiments, best practice rules are integrated into the machine learned models. In some embodiments, such best practice rules may be integrated into the DCIM 14 and/or into an MI model associated with the DCIM 14. In some embodiments, different model types may be added or modified based on operational expertise. The model type may provide all the policy rules that are part of the already known best practices. These best practice policy rules may be integrated into an MI model according to known techniques for building and training an MI model.
In one embodiment, feedback to the tenants of the DC 12 may be provided by the DCIM 14, for current and future asset usage and/or modification. For example, the delta of economic improvement may be reported to tenants to demonstrate the effectiveness of using such techniques at the DC 12 as, for example, an advantage over other DCs that do not use such MI techniques to optimize operational parameters. In one embodiment, there may be a cost conversion function that may convert operational parameter improvements into dollar amounts saved (e.g., reduced power consumption due to recommendations predicted to result in reducing fan usage may be converted into dollar amounts saved due to the power consumption reduction). Even is the cost savings is an estimate, such data may be useful to tenants as well as decision-makers as to whether to implement the MI model
recommendations .
In one embodiment, the MI model may output audit criteria that may result in improved DC 12 equipment maintenance and regulation compliance. Some reporting nodes (e.g., agents) may be defined within the DC 12 in order to align maintenance with regulation compliance. This may be performed by an auditing function that matches preplanned timely events for the maintenance of the assets. One benefit may be to pass from a targeted operational efficiency to a future operational efficiency, based on the use of ML models. In one embodiment, at least one of the ML models may be considered an“E-Model”, where“E” means Efficiency.
Three example scenarios will now be generally described in which one or more techniques disclosed herein may be used for operational optimization of the DC 12. Each of the example scenarios is provided to demonstrate at least three different areas of DC 12 operation in which monitoring certain data and triggering certain conditions, according to, for example, the method described below with respect to FIG. 14, can lead to using an MI model to provide a recommendation to, for example, improve or eliminate the triggering condition.
In a first example scenario a load on one or more power distribution units (PDUs) at the DC 12 may be monitored. Generally, an installer can read the display on the PDU itself and make a judgement call on which branch to use. In one embodiment, monitoring of the load may be used to provide a recommendation as to which outlet to use for assets connected to the PDUs. Generally, it is considered a best practice to always use the same outlet for the same device. However, if an installer/technician has used the same outlet as proposed by the best practice, such technician should also ensure that the sum of the current for a particular circuit is less than a threshold current, e.g., 16A. In some scenarios, if the sum of the current is higher than a certain threshold, there is a risk of overloading the remaining PDUs if there is an outage of one of the PDUs. Thus, in some embodiments, the triggering condition may be a threshold current value.
FIG. 12 depicts an example rack associated with at least 2 PDUs. The system shown in FIG. 12 shows that branch 3 has a total power of 1,740 W, branch 2 has a total power of 2,100 W, and branch 1 has a total power of 2,340 W. In one embodiment, the system shown in FIG. 12 may have the following exemplary parameters:
• Circuit 1 of PDU A= 6 A
•Circuit 1 of PDU A= 5.5 A • Sum: 11.5A is less than 16 A, therefore redundancy is acceptably below the threshold current
•Circuit 1 of PDU A= 8 A
•Circuit 1 of PDU A= 8.5 A
• Sum: 16.5A is less than 16 A, therefore redundancy is in jeopardy due to the total current exceeding the threshold current.
When an installer is ready to connect an asset to a PDU, the installer should ideally equally spread the load on the PDU over the branches (e.g., 3 branches) in order to balance all three phases. This means that the installer should not start connecting the asset starting from outlet 1 and up. By doing so, there is a risk of only using the branch 1 and 2 and leaving the branch 3 empty, which could result in an imbalance of the 3 phases. For a 2N redundancy, in one embodiment, a PDU should not exceed 40% of its maximum capacity, which may be, for example, 7.5 kW for the first two PDUs. In one embodiment, an alarm may be triggered by, for example, the DCIM 14, as a result of the PDU at least reaching 40% of its maximum capacity.
In a second example scenario, the sensors 24 may be used to detect a potential CRAH overload condition. In one embodiment, the CRAH overload condition may be considered a CRAH overload binary state associated with power consumption optimization between CRAH overload and server fans. Data/measurements from the sensors 24 may be obtained, stored in for example the DB 20, and may be used to optimize the DC 12. In one embodiment, the data may be input into the inference switch 16 and an ML model selected based on the type of data. In another embodiment, the data may be analyzed by, for example, the MI optimizer 18, and may be used to identify a trigger that results in training an MI model and using the data as an input into the trained MI model. With CRAHs, overheating may occur if the sensors 24 are located in a hot spot above the racks where the average cooling can sometimes not propagate evenly. In one exemplary embodiment, while monitoring the CRAH according to one or more techniques described herein, an alarm (e.g., audible, visual, etc.) may be configured to be triggered and thereby notify DC 12 operators of any triggering conditions. In one embodiment, an MI model may be used to infer a potential anomaly based on the sensor 24 data by, for example, inputting the sensor 24 data into the MI model and allowing the MI model to make intelligent inferences based on, for example, recognizing historical patterns or trends. As one simplistic example, the MI model may recognize that processing loads are lowest at certain times of the day or year, and may therefore recommend that maintenance activities be scheduled at those times.
One or more metrics such as, for example, the rack cooling index, a quantifiable metric that measures the level of compliance with the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) standards, etc. may be used to determine a triggering condition. ASHRAE publishes allowable temperature and humidity levels within data centers, setting clear operating
parameters for computing equipment. One advantage of installing environmental sensors 24 in the DC 12 is that, when judiciously placed, the sensors 24 can assist the facility operator with raising temperatures safely within certain compliance thresholds (e.g., 35 Centigrade). For example, if computing equipment is running too cold, there is a chance that the cooling equipment within the data center 12 is being over-used and thereby incurring a power overload. On the other hand, if equipment runs too hot, depending on the configuration of the service, energy consumption may become unnecessarily excessive due to the over-use of fans running in the servers. In other words, data from sensors 24 at the DC 12 (e.g., temperature sensors, humidity sensors, etc.) may be used to identify whether cooling equipment is being inefficiently used.
In a third example scenario, temperature differentials may be monitored by the sensors 24 (e.g., temperature sensors) at the racks/cabinets. In one embodiment, temperature differential may refer to an increase in temperature nearby DC 12 cabinets, which may be acceptable compared to real-time temperatures at the assets in the cabinets, but is demanding in terms of cooling the room. In one example, data from temperature sensors 24 at the DC 12 may be obtained from each rack (e.g., there may be, for example, a set of 3 temperature sensors per rack), with the data from the set of rack sensors 24 representing an average temperature in each row. Such data may then be compared to information corresponding to the manufacturer temperatures for all assets/resources within each row. Such manufacturer information may be stored in, for example, the systems specifications library and/or the DB 20. In one embodiment, the data from rack sensors 24 may be continuously and/or periodically obtained and compared to identify the occurrence of a trigger. The trigger may be, for example, when a measured temperature at least meets a temperature threshold in the row or room. Information from the systems specifications library may be used to determine an appropriate temperature threshold value for the trigger. Improved efficiency in managing temperature differentials in the DC 12 can result in a significant cost savings.
Having described at least three example scenarios in which DC 12 operational optimization may be achieved (e.g., power phase balancing, CRAH overload, temperature differentials) by monitoring data and responding to a detected trigger, an exemplary inference switch 16 will now be described with reference primarily to FIG. 13. In one embodiment, the inference switch 16 may be in communication with an input identification unit and an MI model output unit. The inference switch 16 may be considered a type of logical and/or physical selector switch configured to receive an input corresponding to a data type from e.g., the input identification unit, and identify the data type of the input (i.e., differentiate between different types of data that may be input into the switch 16). Based on the data type of the input, the inference switch 16 may output an MI model to e.g., the MI model output unit. The output MI model may be, for example, a Time Series + NN model, a support vector machine (SVM) model, a reinforcement learning model, etc. The inference switch 16 may optimize operation of the DC 12 by providing an inference of a particular MI model that should be used for a particular scenario. In one embodiment, the inference switch 16 may be configured to identify at least one of the scenarios described herein above (e.g., power phase balancing, CRAH overload, and temperature differentials) by the type of data received as input and then select an MI model that matches (or most optimally fits) the scenario. The MI model may already be trained and/or the MI model may require further training before being used by the DCIM 14. As one simplified example, based on the input being Amps, the inference switch 16 may determine that the particular scenario is a phase balancing scenario and may therefore output an MI model suitable for analyzing current data to balance phases. As another example, based on the input being temperature data, the inference switch 16 may determine that the particular scenario is a CRAH scenario and may therefore output an MI model suitable for analyzing temperature data. In one embodiment, an indication of the particular data type may be input into the inference switch 16. The indicator may be, for example, an attribute, an index, a value, or a signal recognizable by the inference switch 16 as indicating a particular type of data of a set of available types of data recognizable by the inference switch 16 for responding with a particular MI model.
The three exemplary scenarios demonstrate that by using the inference switch 16, a suitable optimization MI model can be selected by looking at the data type available and switching to that specific MI model for analysis and optimization. In some embodiments, the particular MI model suitable for each of the scenarios of temperature differential, CRAH overload binary state, and balancing phases may be an RL model, an SVM model, and a Time Series/+ NN model, respectively. Once a suitable MI model is selected, the selected MI model may be executed and a data set input therein.
In one embodiment, the inference switch 16 may include, for example, a“pull metric” set of inputs that samples on changes since the last optimization, memory (e.g., random access memory (RAM) memory, storage (e.g., a solid-state drive (SSD), etc.), a processor configured to execute instructions corresponding to one or more of the techniques described herein, and an output port that outputs the selected optimization trained model to process the data from a particular scenario. In other embodiments, the inference switch 16 may be implemented in other ways.
Also, it is contemplated that the units described herein (e.g., input
identification unit and MI model output unit shown in FIG. 13) may be implemented such that a portion of the unit is stored in a corresponding memory within a processing circuitry, or may be considered the processing circuitry. In other words, the units as well as the inference switch 16 may be implemented in hardware or in a combination of hardware and software within processing circuitry of one or more devices/entities described herein.
Referring now to the flow chart depicted in FIG. 14, an exemplary method for data center infrastructure management is described. In the example method, CPU load is monitored; however, it should be understood that in other scenarios other operational parameters may be monitored and optimized according to the exemplary method (e.g., temperature, Amps, etc.). In the exemplary method, the DCIM 14 may discover“n” servers 22 (block S70). In one embodiment, DCIM 14 discovery of“n” servers may be performed on a continuous and/or periodic basis in the background (“n” can be any number). The DCIM 14 may request a metric from the DC 12 using, for example, signaling (e.g., Intelligent Platform Management Interface (IPMI)) (block S72). In one embodiment, the metric may include a CPU utilization percentage for each server 22. The DCIM 14 may determine whether utilization is greater than or equal to a threshold (e.g., 80%) (block S74). If CPU utilization does not at least meet the threshold, the DCIM 14 may return to block S70 where the DCIM 14 continues to discover servers 22 at a sampling rate and repeats the process. If the CPU utilization does at least meet the threshold, the DCIM 14 may identify each of the assets (e.g., network and storage devices) that are associated with the overused CPU(s) (block S76). The DCIM 14 may then determine the power consumption of the identified assets of the overused CPU(s) and may convert such power consumption to cost (block S78). In one embodiment, the cost may be a monetary cost or a cost ratio. The cost ratio may be a ratio of, for example, a cost associated with the determined power consumption and a cost associated with a target power consumption level. Having determined the CPU performance (e.g., power consumption) and a cost of the determined CPU performance, based on a cost-to- performance ratio, the DCIM 14 may determine whether an MI optimization procedure should be triggered (block S80). It should be understood that different metrics can be used for the performance, and the cost may be the relative operating cost in terms of cost units. Optimization may be triggered if, for example, the cost-to- performance ratio at least meets a threshold ratio (e.g., 10% optimization).
Optimization actions may include e.g., workload balancing, adding additional processor cores, recommending balancing phases, etc. If optimization is not triggered, the process may return to block S72 (or block S70) to continue to sampling data. On the other hand, if optimization is triggered, the ML optimization procedure may be performed/executed (block S82). One example of the ML optimization procedure will be described in more detail with reference to FIG. 15. After the ML optimization procedure is executed, the DCIM 14 may determine the PUE as a result of the optimization and may store the PUE in the DB 20 (block S84). In some embodiments, a portion or all of the steps in the process may be executed on a per zone basis for each zone.
Referring now to the flow chart depicted in FIG. 15, an exemplary method for an ML optimization procedure, as discussed briefly in block S82 of FIG. 14, will now be described. Stated another way, the steps shown in the flow chart of FIG. 15 may be one example implementation of block S82 of FIG. 14. Data may be obtained from e.g., the DB 20 (block S90). Clustering techniques may be applied to the obtained data (block S92). Based on the clustering, data may be divided into different categories (block S94). For example, data from different resources can be
categorized. Clustering techniques are known for building and training ML models and will therefore not be described in great detail herein. In one embodiment, from the clustered data categories, emergency, cost, priority, and/or other informative data points may be identified in order to determine the accuracy needed for the ML model.
In one embodiment, the data may be determined to be suitable for use with a NN model, which may be determined by, for example, the inference switch 16. In other embodiments, the particular data set obtained may be suitable for other types of MI models. It should be understood that many different types of MI models may be used with embodiments of the present disclosure and that the NN model is used as one example. Having identified an accuracy threshold required for the ML model, according to known techniques, the NN model may be trained and/or validated.
As a brief aside, training the neural network, may take a substantial amount of time depending on the type of neural network as well as on the configuration of the selected neural network. After a certain time period, the convergence is achieved and the neural network is trained and ready to use. Collecting samples and training a new NN model or update an existing NN model can be done at regular intervals, or when a number of samples have been accumulated, or according to any other suitable criteria. Examples of types of NN include Feedforward, Radial basis function, Recurrent, Modular, etc. and a person skilled in the art would know how to choose an
appropriate NN for a particular application, and how to configure, train and use these different types of NN. The NN model can be built based on a number of layer and a number of neurons in each layer. Some technical methods for dealing with overfitting or underfitting might also be applied. Overfitting occurs when the model is over specified. The outcome of overfitting is that the trained model is not general enough and catches too many details based on the training dataset. This leads to inaccurate prediction on a non-training dataset. Underfitting occurs when the model is under specified. The outcome of underfitting is that the trained model is too general and misses some details based on the training dataset. This also leads to inaccurate prediction on a non-training dataset. NN models may include a plurality of layers and, more specifically, one or more“hidden layers” between at least one input layer and at least one output layer. As would be recognized by a person of ordinary skill in the art, some NN models may be deep learning models and can comprise up to tens or hundreds of hidden layers depending on the complexity of the data samples to the process. Each of the layers may comprise one or more neurons, with each neuron being represented by a formula or function (e.g., y=w*x+b, where x is an input of the neuron, y is an output of the neuron, w is a weight coefficient and b is a bias). The coefficients (W, B) at each layer are tuned, during the learning process, to minimize a “loss function” during the training process. The loss function is defined to measure how accurate the prediction is. The coefficients in each layer may be given arbitrary numbers, at beginning. For simplicity, those arbitrary numbers might be given based on certain distribution around zero. Eventually, with the training dataset, the neurons are trained to converge to certain values, which may form the model that can be used to perform the prediction on any non-training dataset.
Having generally described on type of MI model, an NN model, the exemplary method for training the NN model depicted in FIG. 15 will be described. The NN model may be qualified based on“n” hidden layers, where n can be any number, and a correlation between different data types may be performed (block S96). Weights and biases may also be configured for each hidden layer (block S98). The DCIM 14 may determine whether the confidence margin/accuracy threshold is met by the NN model (block S100), e.g., validation of the NN model. If the accuracy threshold has not been met, the NN model may continue to train the NN model by“exercising” the NN layers (block, S102), according to known techniques for training NN model layers. If the accuracy threshold has been at least met, an inference network may be built (block S104). In one embodiment, a dynamic policy engine may be configured or built that includes a plurality of potential recommendations (e.g., migration resources, power down, re-balance power phases, etc.). Thus, a recommendation may be provided (block S 106), e.g., recommend a re-balancing of 220 volt phases. The recommendation may be provided by, for example, inputting obtained data from e.g., database 20 and/or sensors 24, into the trained and validated NN model and interpreting the output as a recommendation. In one embodiment, the data obtained to train the NN model may be considered historical data or training data; while the data obtained to output a recommendation or generate a prediction may be considered real time data or sampled data or a nontraining data set. In one embodiment, test data may be used for testing that the MI model has been trained properly. For example, a set of test data may be used to verify that the trained model meets certain predetermined criteria. In one embodiment, the training process for the MI model may occur in the background and may be considered“offline,” which may be distinguishable from monitored data that is provided continuously in the forefront.
Abbreviations that may be used in the preceding description include:
Abbreviations Explanation
DCIM Data Center Infrastructure Management
EME Energy Management Entity
E-Method Energy Method
MI Machine Intelligence
ML Machine Learning
As will be appreciated by one of skill in the art, the concepts described herein may be embodied as a method, data processing system, and/or computer program product. Accordingly, the concepts described herein may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects all generally referred to herein as a “circuit” or“module.” Furthermore, the disclosure may take the form of a computer program product on a tangible computer usable storage medium having computer program code embodied in the medium that can be executed by a computer. Any suitable tangible computer readable medium may be utilized including hard disks, CD-ROMs, electronic storage devices, optical storage devices, or magnetic storage devices. Some embodiments are described herein with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable memory or storage medium that can direct a computer or other
programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
It is to be understood that the functions/acts noted in the blocks may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that
communication may occur in the opposite direction to the depicted arrows.
Computer program code for carrying out operations of the concepts described herein may be written in an object oriented programming language such as Java® or C++. However, the computer program code for carrying out operations of the disclosure may also be written in conventional procedural programming languages, such as the "C" programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer. In the latter scenario, the remote computer may be connected to the user's computer through a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Many different embodiments have been disclosed herein, in connection with the above description and the drawings. It will be understood that it would be unduly repetitious and obfuscating to literally describe and illustrate every combination and subcombination of these embodiments. Accordingly, all embodiments can be combined in any way and/or combination, and the present specification, including the drawings, shall be construed to constitute a complete written description of all combinations and subcombinations of the embodiments described herein, and of the manner and process of making and using them, and shall support claims to any such combination or subcombination.
It will be appreciated by persons skilled in the art that the embodiments described herein are not limited to what has been particularly shown and described herein above. In addition, unless mention was made above to the contrary, it should be noted that all of the accompanying drawings are not to scale. A variety of modifications and variations are possible in light of the above teachings without departing from the scope of the following claims.

Claims

What is claimed is:
1. An apparatus for an inference switch (16) associated with a data center infrastructure manager, DCIM, (14) the apparatus comprising:
processing circuitry (28), the processing circuitry (28) configured to:
obtain data from at least one data source of at least one data center (12), the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center (12);
receive an indication of a type of the data corresponding to the at least one operational parameter;
recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input;
select a machine intelligence, MI, model from a set of available MI models based on the type of the data; and
output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
2. The apparatus according to Claim 1, wherein the at least one data source includes at least one of at least one sensor (24) configured to measure at least one physical property of the at least one resource and at least one memory (30) storing measurements from the at least one sensor (24).
3. The apparatus according to any of Claims 1 and 2, wherein the at least one resource includes at least a processing resource, a storage resource, and a network resource.
4. The apparatus according to any of Claims 1-3, wherein the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center (12), at least a second type of data relating to an aspect of power management at the at least one data center (12), and at least a third type of data relating to an aspect of cooling at the at least one data center (12).
5. The apparatus according to any of Claims 1-4, wherein the MI model is configured to:
receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and
based at least on the input, output at least one recommendation for the at least one operational parameter inferred according to the MI model.
6. The apparatus according to Claim 5, wherein the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center (12).
7. The apparatus according to any of Claims 1-6, wherein the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center (12).
8. The apparatus according to any of Claims 1-7, further comprising:
a container repository (20) storing equipment specifications for a plurality of resources at the at least one data center (12); and
the processing circuitry (28) configured to access the container repository (20) and use at least a portion of the equipment specifications stored in the container repository (20) to select the MI model from the set of available MI models.
9. The apparatus according to any of Claims 1-8, wherein the plurality of types of data recognizable by the processing circuitry (28) includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler (CRAH) overload condition, and a type of data representing a temperature differential.
10. The apparatus according to any of Claims 1-9, wherein at least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source.
11. The apparatus according to any of Claims 1-10, wherein the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN.
12. The apparatus according to any of Claims 1-11, wherein the processing circuitry (28) is one of coupled to the DCIM (14) and included in the DCIM (14).
13. The apparatus according to any of Claims 1-12, wherein selection of the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center (12), a data center operational pattern reporting, and a fault management clustering.
14. An apparatus for a machine intelligence, MI, optimizer (18) associated with a data center infrastructure management, DCIM, (14) the apparatus comprising processing circuitry (34) configured to:
obtain data from at least one data source of at least one data center (12), the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center (12);
identify an occurrence of a trigger based on the obtained data; and
as a result of the occurrence of the trigger, execute a machine learning, ML, optimization procedure, the ML optimization procedure including:
selecting a machine intelligence (MI) model;
receiving, from a database (20), training data associated with the at least one operational parameter; training the MI model using the training data associated with the at least one operational parameter; and
applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
15. The apparatus according to Claim 14, wherein the processing circuitry (34) is further configured to identify the occurrence of the trigger based on the obtained data by being configured to:
calculate a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and
based on the cost-to-performance ratio, determine whether to execute the ML optimization procedure.
16. The apparatus according to any of Claims 14-15, wherein the processing circuitry (34) is further configured to obtain the data from the at least one data source, identify the occurrence of the trigger, and apply the obtained data to the trained MI model to produce the at least one recommendation periodically to provide dynamic recommendations for operation of the at least one data center (12).
17. The apparatus according to any of Claims 14-16, wherein the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling.
18. The apparatus according to any of Claims 14-17, wherein the at least one recommendation includes an indication of at least one of:
a migration of at least one resource;
a consolidation of a plurality of resources;
a sleep mode of at least one resource; and a balancing of at least one data center operational metric for at least one zone of the at least one data center (12).
19. The apparatus according to any of Claim 14-18, wherein the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center (12).
20. The apparatus according to Claim 19, wherein the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio.
21. The apparatus according to Claim 19, wherein the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response.
22. The apparatus according to Claim 19, wherein the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
23. A method for an inference switch (16) associated with a data center
infrastructure manager, DCIM, (14) the method comprising:
obtaining data from at least one data source of at least one data center (12), the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center (12) (S50);
receiving an indication of a type of the data corresponding to the at least one operational parameter (S52); recognizing the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input (S54);
selecting a machine intelligence, MI, model from a set of available MI models based on the type of the data (S56); and
outputting the selected MI model to be used to process the data for data center optimization based on at least the selected MI model (S58).
24. The method according to Claim 23, wherein the at least one data source includes at least one of at least one sensor (24) configured to measure at least one physical property of the at least one resource and at least one memory (30) storing measurements from the at least one sensor (24).
25. The method according to any of Claims 23 and 24, wherein the at least one resource includes at least a processing resource, a storage resource, and a network resource.
26. The method according to any of Claims 23-25, wherein the plurality of types of data includes at least a first type of data relating to physical space available at the at least one data center (12), at least a second type of data relating to an aspect of power management at the at least one data center (12), and at least a third type of data relating to an aspect of cooling at the at least one data center (12).
27. The method according to any of Claims 23-26, wherein the MI model is configured to:
receive the data representing the at least one value corresponding to the at least one operational parameter as an input, and
based at least on the input, output at least one recommendation for the at least one operational parameter inferred according to the MI model.
28. The method according to Claim 27, wherein the at least one recommendation includes an indication of at least one action step that is inferred, according to the selected MI model, to at least one of improve, maintain, and balance at least one metric associated with the at least one operational parameter for at least one zone of the at least one data center (12).
29. The method according to any of Claims 23-28, wherein the MI model is trained using at least historical data corresponding to the at least one operational parameter associated with the at least one resource of the at least one data center (12).
30. The method according to any of Claims 23-29, further comprising:
storing, at a container repository (20), equipment specifications for a plurality of resources at the at least one data center (12); and
accessing the container repository (20) to use at least a portion of the equipment specifications stored in the container repository (20) to select the MI model from the set of available MI models.
31. The method according to any of Claims 23-30, wherein the plurality of types of data that are recognizable includes at least a type of data associated with phase balancing, a type of data associated with a computer room air handler, CRAH, overload condition, and a type of data representing a temperature differential.
32. The method according to any of Claims 23-31, wherein at least one of the set of available MI models is a neural network (NN) model capable of learning based on the data from the at least one data source.
33. The method according to any of Claims 23-32, wherein the set of available MI models includes at least a reinforcement learning model, a support vector machine, and a Time Series/+ Neural Network, NN.
34. The method according to any of Claims 23-33, wherein selecting the MI model from the set of available MI models is further based on at least one requested recommendation category of a set of available recommendation categories, the set of available recommendation categories including at least a policy generation for the at least one data center (12), a data center operational pattern reporting, and a fault management clustering.
35. A method for a machine intelligence, MI, optimizer (18) associated with a data center infrastructure management, DCIM, (14) the method comprising:
obtaining data from at least one data source of at least one data center (12), the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center (12) (S60);
identifying an occurrence of a trigger based on the obtained data (S62); and as a result of the occurrence of the trigger, executing a machine learning (ML) optimization procedure (S64), the ML optimization procedure including at least:
selecting a machine intelligence (MI) model;
receiving, from a database (20), training data associated with the at least one operational parameter;
training the MI model using the training data associated with the at least one operational parameter; and
applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
36. The method according to Claim 35, wherein identifying the occurrence of the trigger based on the obtained data includes:
calculating a cost-to-performance ratio associated with the data corresponding to the at least one operational parameter; and
based on the cost-to-performance ratio, determining whether to execute the ML optimization procedure.
37. The method according to any of Claims 35-36, wherein obtaining data from at least one data source, identifying an occurrence of a trigger, and applying the obtained data to the trained MI model to produce at least one recommendation is performed periodically to provide dynamic recommendations for operation of the at least one data center (12).
38. The method according to any of Claims 35-37, wherein the at least one recommendation includes an adjustable policy including an indication of at least one action step that is inferred, according to the selected MI model, and is adjustable as a function of time and a function of at least one data center operational metric, the at least one data center operational metric including at least one of space, power, and cooling.
39. The method according to any of Claims 35-38, wherein the at least one recommendation includes an indication of at least one of:
a migration of at least one resource;
a consolidation of a plurality of resources;
a sleep mode of at least one resource; and
a balancing of at least one data center operational metric for at least one zone of the at least one data center (12).
40. The method according to any of Claim 35-39, wherein the at least one recommendation is based at least on at least one balancing function, the at least one balancing function configured to balance at least three data center operational metrics for at least one zone of the at least one data center (12).
41. The method according to Claim 40, wherein the at least one balancing function includes a cost-performance balancing function, the at least three data center operational metrics to be balanced by the cost-performance balancing function including a cost-to-performance ratio, a relative cost ratio, and a power usage effectiveness ratio.
42. The method according to Claim 40, wherein the at least three data center operational metrics to be balanced by the at least one balancing function includes a network bandwidth, a processing effectiveness, and a storage response.
43. The method according to Claim 40, wherein the at least one balancing function includes a power phase balancing function, the at least three data center operational metrics to be balanced by the power phase balancing function including a phase I utilization, a phase II utilization, and a phase III utilization.
44. An apparatus for an inference switch associated with a data center
infrastructure manager, DCIM, (14) the apparatus comprising an inference switch module (40), the inference switch module (40) configured to:
obtain data from at least one data source of at least one data center (12), the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center (12);
receive an indication of a type of the data corresponding to the at least one operational parameter;
recognize the indication of the type of the data from a plurality of types of data available as inputs to the at least one data input;
select a machine intelligence, MI, model from a set of available MI models based on the type of the data; and
output the selected MI model to be used to process the data for data center optimization based on at least the selected MI model.
45. An apparatus for a machine intelligence, MI, optimizer (18) associated with a data center infrastructure management, DCIM, (14) the apparatus comprising:
a data collection module (42) configured to obtain data from at least one data source of at least one data center (12), the data representing at least one value corresponding to at least one operational parameter, the at least one operational parameter associated with at least one resource of the at least one data center (12); an identification module (44) configured to identify an occurrence of a trigger based on the obtained data; and a machine learning, ML, optimization module (46) configured to, as a result of the occurrence of the trigger, execute a machine learning, ML, optimization procedure, the ML optimization procedure including at least:
selecting a machine intelligence (MI) model;
receiving, from a database (20), training data associated with the at least one operational parameter;
training the MI model using the training data associated with the at least one operational parameter; and
applying the obtained data to the trained MI model to produce at least one recommendation for the at least one operational parameter inferred from the trained MI model.
PCT/IB2018/052214 2018-03-29 2018-03-29 Global data center cost/performance validation based on machine intelligence WO2019186243A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2018/052214 WO2019186243A1 (en) 2018-03-29 2018-03-29 Global data center cost/performance validation based on machine intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2018/052214 WO2019186243A1 (en) 2018-03-29 2018-03-29 Global data center cost/performance validation based on machine intelligence

Publications (1)

Publication Number Publication Date
WO2019186243A1 true WO2019186243A1 (en) 2019-10-03

Family

ID=62063113

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2018/052214 WO2019186243A1 (en) 2018-03-29 2018-03-29 Global data center cost/performance validation based on machine intelligence

Country Status (1)

Country Link
WO (1) WO2019186243A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110568754A (en) * 2019-10-24 2019-12-13 厦门华夏国际电力发展有限公司 DCS (distributed control system) -based automatic optimization searching control method and system for redundant equipment station
WO2022216375A1 (en) * 2021-04-05 2022-10-13 Nec Laboratories America, Inc. Anomaly detection in multiple operational modes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010096283A2 (en) * 2009-02-23 2010-08-26 Microsoft Corporation Energy-aware server management
US20110213508A1 (en) * 2010-02-26 2011-09-01 International Business Machines Corporation Optimizing power consumption by dynamic workload adjustment
US20150026108A1 (en) * 2013-03-15 2015-01-22 Citrix Systems, Inc. Managing Computer Server Capacity
US20170109205A1 (en) * 2015-10-20 2017-04-20 Nishi Ahuja Computing Resources Workload Scheduling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010096283A2 (en) * 2009-02-23 2010-08-26 Microsoft Corporation Energy-aware server management
US20110213508A1 (en) * 2010-02-26 2011-09-01 International Business Machines Corporation Optimizing power consumption by dynamic workload adjustment
US20150026108A1 (en) * 2013-03-15 2015-01-22 Citrix Systems, Inc. Managing Computer Server Capacity
US20170109205A1 (en) * 2015-10-20 2017-04-20 Nishi Ahuja Computing Resources Workload Scheduling

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110568754A (en) * 2019-10-24 2019-12-13 厦门华夏国际电力发展有限公司 DCS (distributed control system) -based automatic optimization searching control method and system for redundant equipment station
WO2022216375A1 (en) * 2021-04-05 2022-10-13 Nec Laboratories America, Inc. Anomaly detection in multiple operational modes

Similar Documents

Publication Publication Date Title
US11695649B1 (en) System, method, and computer program for determining a network situation in a communication network
US10481629B2 (en) Cognitive platform and method for energy management for enterprises
US8140682B2 (en) System, method, and apparatus for server-storage-network optimization for application service level agreements
US11283863B1 (en) Data center management using digital twins
EP3207432B1 (en) A method for managing subsystems of a process plant using a distributed control system
US11212173B2 (en) Model-driven technique for virtual network function rehoming for service chains
US10466686B2 (en) System and method for automatic configuration of a data collection system and schedule for control system monitoring
CN109324679A (en) A kind of server energy consumption control method and device
JP2023547849A (en) Method or non-transitory computer-readable medium for automated real-time detection, prediction, and prevention of rare failures in industrial systems using unlabeled sensor data
Friesen et al. Machine learning for zero-touch management in heterogeneous industrial networks-a review
Mahan et al. A novel resource productivity based on granular neural network in cloud computing
WO2019186243A1 (en) Global data center cost/performance validation based on machine intelligence
Najafizadegan et al. An autonomous model for self‐optimizing virtual machine selection by learning automata in cloud environment
Koukaras et al. Proactive buildings: A prescriptive maintenance approach
KR20200126766A (en) Operation management apparatus and method in ict infrastructure
CN117762644A (en) Resource dynamic scheduling technology of distributed cloud computing system
Taherizadeh et al. Incremental learning from multi-level monitoring data and its application to component based software engineering
US10756970B1 (en) System, method, and computer program for automatic reconfiguration of a communication network
KR20210058468A (en) Apparatus and method for artificial intelligence operator support system of intelligent edge networking
Liu Using neural network to establish manufacture production performance forecasting in IoT environment
KR20200063343A (en) System and method for managing operaiton in trust reality viewpointing networking infrastucture
KR20080087571A (en) Context prediction system and method thereof
Carlsson et al. Possibilistic bayes modelling for predictive analytics
Mohazabiyeh et al. Energy-aware adaptive four thresholds technique for optimal virtual machine placement
Vasilakos et al. iOn-Profiler: Intelligent Online Multi-Objective VNF Profiling With Reinforcement Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18720366

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18720366

Country of ref document: EP

Kind code of ref document: A1