CA2845341A1 - A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in data centers - Google Patents

A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in data centers Download PDF

Info

Publication number
CA2845341A1
CA2845341A1 CA2845341A CA2845341A CA2845341A1 CA 2845341 A1 CA2845341 A1 CA 2845341A1 CA 2845341 A CA2845341 A CA 2845341A CA 2845341 A CA2845341 A CA 2845341A CA 2845341 A1 CA2845341 A1 CA 2845341A1
Authority
CA
Canada
Prior art keywords
power
prt
blade server
tier
pmct
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA2845341A
Other languages
French (fr)
Inventor
Pierre Popovic
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CA2845341A priority Critical patent/CA2845341A1/en
Publication of CA2845341A1 publication Critical patent/CA2845341A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3293Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)

Abstract

A computer system, methods, apparatus and readable medium to reduce Data Centers (DCs) energy costs, improve their computing infrastructures green energy efficiency and workload performance of their application processes by executing them on a blade server system with embedded, nonhierarchical heterogeneous multi-processors, and scalable Processing Resource Tiers - PRT optimized for energy-efficiency and performance delivery. In this disclosure, green energy efficiency of a computing resource is defined as a function of reduced power, reduced heat dissipation, reduced processes execution times, and increased processes workload performance delivery. A low-power General-Purpose processor (GPP) in an active mode is the primary blade server's processor. The primary blade server, also called the Front-end Processing Tier - FPT, is responsible for the initial processing of tasks before transferring this computing process to the most green energy efficient PRT as identified and selected by a Master Command Controller -MCC module and powered ON by a Power Management Control Tier - PMCT.

Description

2 A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in Data Centers.
1. BACKGROLTND OF INVENTION
[on] The insatiable worldwide appetite for instant access to information, video, communication, cloud computing and social networking on any portable device vastly increases the amount of energy consumed by DCs. The accessibility and availability of DCs are becoming ever more important and their number, power densities, and size are growing fast.
1.1. The impact of "cloud computing"
pm] This growth has been further fuelled by newer concepts such as cloud computing and a plethora of new data center related services such as "software as a service" (SaaS), "platform as a service" (PaaS), "infrastructure as a service" (IaaS), "IT (Information Technology)" as a service (ITaaS), and "pay-as-you-go" IT services. DC
business has grown from basic searches, database queries, e-mail, and web-hosting to include a large variety of heterogeneous Internet-based business applications, Customer Relations, Enterprise Resource Planning, a multitude of office related software. The technical foundations of Cloud Computing include the "as-a-Service"
usage model, Service-Oriented Architecture (SOA) and Virtualization of hardware and software. The goal of Cloud Computing is to consolidate infrastructure, and share
3 resources among the cloud service consumers social networking sites, and multimedia applications that require more computing power and faster processes. In responding to the increased demands and wide scope of heterogeneous applications, Data Center Infrastructure Management (DCIM) development strategies are relying on low-cost homogeneous multi-core X-86-Instruction-Set-Architectures (ISA) based servers. These architectures, even though they may not be the optimum architectures for heterogeneous applications, have dominated the DC market and have largely become a de facto standard. To maintain a guaranteed level of service, DCs overprovision and build redundancy in their server farms, which leads to increased consumption of both materials and energy. The result is that DCs keep their server farms operating at all times even when there is little or no computing traffic. This wastes much energy and natural resources and is detrimental to sustainability.
2. TECHNICAL FIELD
pm] This invention pertains to the field of DCs, blade servers, and processing of heterogeneous applications in DCs. More precisely, this invention relates to a system of smart methods for managing, sequencing, and monitor energy and the consumption of computing resources in nonhierarchical multi-tier heterogeneous multi-processor blade server architectures with variable, scalable and selectable power schemes. It is designed to reduce DC
computing infrastructures' overall energy costs as well as optimize the performance of DCs heterogeneous application processes. This invention matches processes to specific optimum Processing Resource Tier selected according to
4 their greatest green energy efficiency. Every Processing Resource Tier - PRT is a server in itself with all the associated component. The Processing Resource Tier - PRT
is in an OFF state, as defined in "[0007] Glossary" in this invention, unless otherwise turned ON by the system.
The selected optimum Processing Resource Tier - PRT is one of the many Processing Resource Tiers that could be on the same blade server and alternately, should a blade server tiers be busy, the process would be directed to a similar Processing Resource Tier - PRT either on the same blade server or to another blade server tier in the same enclosure.
3. COMMONALITIES IN PRIOR ARTS
[004] The widespread use of common terms and technical generalizations in cited prior arts such as "heterogeneous," "homogeneous," "tiers,"
"power,"
"multiprocessors," "multi-core," "scheduler," "predictor,"
to name a few, can group substantially diverse inventions under the same heading. Fundamentally different inventions described in the same widely used terms can easily be misunderstood and the intent misinterpreted. When writing a patent application, inventors, such as the ones for the current invention, must highlight differences repeatedly to be certain that the design goals beneath the widely used terms are not misinterpreted.
4. GLOSSARY
[005] Various aspects of the illustrative embodiments will be described in terms commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art. However, it will be apparent to those skilled in the art that alternate embodiments may be practiced with only some of the characteristics described.
It is necessary, therefore, that the description of this invention be accompanied by a glossary of the most common terms used herein.
[006] As used in this document, the singular forms "a," "an,"
and "the" include plural references unless the context clearly dictates otherwise. Also, singular words should be read as plural and vice versa and masculine as feminine and vice versa, where appropriate, and alternative embodiments do not necessarily imply that the two are mutually exclusive. In addition, the terms "component,"
"module," "system," "processing component," "processing engine" and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
[win Glossary.
ADC device An analog-to-digital converter device BTS Base Transceiver Station Blade server system May be synonymous of computer system DC Data Center Front-end Processing Tier Hereinafter FPT also called the main server. In a most basic sense the FPT
represents the main blade server architecture as seen by the network and the network manager. The FPT is the processing system that handles the initial processes before transferring the process execution to a PRT. Its central processing unit is connected to the PRTs through a hub and a bus with which it shares a common main memory in the form of a RAM and may share storage medium. It may comprise a single processor or it may include a processing device comprising: a processing module capable of multitasking multiple tasks; one or more associated circuits, which may be selectively configured responsive to control signal, coupled to said processing module for supporting the processing module; and a memory storing a control words for configuring the associated circuits, additional memory, network controllers, and input/output (I/O) ports, and functions as a server.
Heterogeneous In prior arts, the definition of the term "heterogeneous" varies from prior art to prior art. As used in this invention, "heterogeneous"
refers to multi-processors that may have multi-core systems that may use different and sometimes incompatible Instruction Set Architecture (ISA) that may lead to binary incompatibility, that may interpret memory in different ways and may have different Application Binary Interfaces (ABI) or/and Application Programming Interfaces (API.) These multi-core systems may be of different and may be of different speeds Homogeneous Is the antipodal of "heterogeneous." Its definition varies from prior art to prior art but generally stays within the meaning of having
7 identical cumulative distribution function or values. For example, multi-processors that may have multi-core systems that may use different speeds, may have different size but remain compatible through their Instruction Set Architecture (ISA).
Monitor or Monitoring The term "monitor" has several implied connotations in prior arts. In general, the implied preposition "for" after the term "monitor"
often is omitted. This may lead to misinterpretations about what is being monitored and for what purpose. One can monitor for a condition and repair it or start a repair process, or one can monitor for a system health status and advise service managers. For this invention the term "Monitor" means "collecting and storing data".
The "Monitor" will monitor temperatures, energy used, or fan speeds for example and store the information in a given storage area.
NIC Network Interface Controller. Hereinafter NIC.
These devices span the most common physical media types from the simple CX4, low power copper wire designed for short distances (max 15 metres,) up to the complex Base-T. The average NIC idle power in Watts may vary between 4.6W (CX4) to 21.2W (Base-T). A 10Gps average may vary between 18.0W to 21.2W
in idle mode. Typically, a blade server would have at least two NICs , a high rate one dedicated to communicating processes results to the top-of-rack switch and on to the DC access layer network, and a lower rate one dedicated to the out-of-band service processor that provides DC managers with a remote
8 power control management, monitoring and enables remote console for servers.
Power Mode The term "Power Mode" is frequently used in all prior arts related to server power management.
It can, in some instances, describe a "state of readiness" while in another it can describe a state in which current or energy is drawn. In the latter the term is normally followed by an adjective such as "low" meaning, without specifying an amount, that the current energy draw is not as "significant" as another current draw such as "high". In prior arts, when they want to indicate that a small amount of current is being used, vendors/inventors use different adjectival forms such as "sleep mode", "deep sleep mode", and "standby mode" as alternate terms to avoid being accused of plagiarism. This imprecision is amplified as vendors/inventors interweave state of readiness and rates of energy consumption In the same prior art when referring, again without being specific, to "a plurality of components" that could mean either certain components, for example a processor or memory, or the entire system composed a plurality of components. In the current invention, the term "Power Mode" is associated with an entire processing tier unless otherwise and specifically specified.
Power Mode OFF/ON For the purpose of the current invention, "Power Mode OFF" or "Power Mode ON" is a state that means that no energy is consumed and the device is not using power of any kind, or energy is
9 consumed. The referenced prior arts have various and different definitions of the term "OFF" but not as defined herein. In addition to its true "Power Mode OFF/ON" state, the current invention also provides the other common consumption modes adjectives such as "low", "sleep" or "deep sleep"
modes the value of which dependents on processors manufacturers or other specifications Process A process is defined as an instance of a computer program that is being executed. It contains the program code and its current activity. A process may be made up of multiple threads of execution that execute instructions concurrently. A computer program is a passive collection of instructions; a process is the actual execution of those instructions.
Processing Resource Tier Hereinafter PRT. While the term "tier" implies a multi-levels or multi-layers hierarchy in a given order, in the current invention, a Processing Resource Tier, alternately referred to as a PRT, is defined as a nonhierarchical, independent, stand-alone, and complete processing resource system embedded in the main blade server architecture. The main blade server may comprise several PRTs each of them with their associated processor that may be heterogeneous (as defined herein,) multi-core, or homogeneous and with an individual integrated NIC
that connects each PRT directly to the TOR. The PRT
may include a non-volatile solid state storage system, memory and memory controller, component sensor, and a plurality of components necessary for its operation. Each PRT has its own bus structure that internally connects its components and a separate external bus that connects the PRT
architecture to the main blade server architecture hub. Each PRT has its own individual an independent power architecture.
"PRT" and "Processing Resources" As used in this document, these terms may be considered as synonymous.
Top-of-Rack switch Hereinafter TOR. DC servers typically connect to Ethernet switches installed inside the enclosure (rack). The term "top-of-rack"
is used (coined) as these switches actual physical location may often be in top of the server rack enclosure. However, the actual physical location of the switch does not necessarily need to be at the top of the rack and may be located in other position in the rack such as the bottom or the middle. Top of the rack position is the most common due to easier accessibility and cable management.
The Ethernet TOR switch directly links the rack to the DC common aggregation area connecting to redundant "Distribution" or "Aggregation" high density modular Ethernet switches.
5. GENERAL DESCRIPTION PRIOR ARTS
[008] The fundamental energy problems of DCs oblige technology suppliers to explore ways to reduce energy consumed by DCs without restricting the sprawling communication networks, cloud computing or internet business. This has given rise to several methods and systems that can save energy largely by leveraging processors' power modes.
[009] Prior art and patents in the field of power modes management and workload performance for DC servers primarily control and manage servers' power through various similar approaches and means that, in practically all embodiments cited in prior arts, share the same fundamental commonalities. These commonalities include areas such as, among others, homogeneous system architectures, tasks allocation, power level queuing, sleep tasks scheduling, workload sharing, and workload spreading.
[ono] Representative of these commonalities of means are the disclosures and narratives contained in one form or another in prior art descriptions cited from [0012] to [0039] below. One skilled in the art will readily recognize and appreciate that in these prior arts, a multiplicity of similar variations of power management and process performance themes that, depending upon the needs of the particular application, have been proposed and have been patented.
6. PRIOR ART REFERENCED
[Hu] The referenced prior arts, cited from [0011] to [0038], were selected by their application and usage profile as being the closest to the current invention. There are numerous modifications and variations of the same themes and these inventions are too numerous to be listed but that all fit within the scope of the invention.
[0012] US 8489904 92 "Allocating computing system power levels responsive to service level agreements"

[0013] WO 2013001576 Al "Multiprocessor system and method of saving energy therein"
[0014] WO 2012036954 A3 "Scheduling amongst multiple processors"
[0015] US 6901522 "System and method for reducing power consumption in multiprocessor system"
[0016] WO 2008021024 A2 "Multiprocessor architecture with hierarchical processor organization"
[0017] EP 1715405 Al"Processing method, system and computer program product for dynamic allocation of processing tasks in a multiprocessor cluster platforms with power adjustment", [0018] US 8176341 B2 "Platform power management based on latency guidance", (also published as CN101598969A, CN101598969B, DE102009015495A1, DE102009015495B4, US8631257, U520090249103, U520120198248) [0019] US 20020147932 Al "Controlling power and performance in a multiprocessing system".
[0020] WO 2012040684 A2 "Application scheduling in heterogeneous multiprocessor computing platforms"
[0021] EP 1653332 B1 "Multiprocessor computer for task distribution with heat emission levelling", [0022] W02004019194 A2 "Method and apparatus for adaptive power consumption"
[0023] WO 2007072458 A2 "Performance analysis based system level power management"
[0024] EP 1182552 A2 "Dynamic hardware configuration for energy management systems using task attributes".

[0025] US patent 20130042246 "Suspension and/or throttling of processes for connected standby", [0026] US 8214843 "Framework for distribution of computer workloads based on real-time energy costs"
[0027] US 7412609 "Handling wake events in a device driver"
[0028] W02013077972 Al "Thermally driven workload scheduling in a heterogeneous multi - processor system on a chip"
[0029] WO 2012175144 Al "Blade server system and method for power management"
[0030] US 8190939 B2 "Reducing power consumption of computing devices by forecasting computing performance needs"
[0031] US 8195859 B2 "Techniques for managing processor resource for a multi-processor server executing multiple operating systems".
[0032] US 7555666 B2 "Power profiling application for managing power allocation in an information handling system"
[0033] 978-1-4244-9721 IEEE 1269 Asilomar 2010"On prediction to dynamically assign heterogeneous microprocessors to the Minimum Joint Power State to Achieve Ultra low power cloud computing [0034] "ACM paper, ISLPED'12, 2012 ACM 978-1-4503-1249 Energy-Efficient Scheduling on Heterogeneous Multi-core Architectures"
[0035] IEICE Transactions, Center for Information Science, JAIST, Ishikawa-ken, 923-1292 Japan. DOI: 10.1587 "A Prediction-Based Green Scheduler for Datacenters in Clouds"
[0036] IEEE paper, 1087-4089/12 2012 IEEE DOI 10.1109/1-1 "EHA:
The Extremely Heterogeneous Architecture"

[0037] ACM Paper, "Scheduling for Heterogeneous Processors in =
Server Systems"
[0038] IEEE Paper, 2010 IEEE 10th Int'l Symposium on Quality Electronic Design "Minimizing the Power Consumption of a Chip Multiprocessor under an Average Throughput Constraint"
[0039] "ACM paper Eurosys 2006 ACM 1-59593-322-0/06/0004 Balancing Power Consumption in Multiprocessor Systems"
7. PRIOR ART COMMONALITIES AND DIFFERENCES WITH THE
INVENTION
7.1. Power mode "OFF" commonality [0040] A commonality in all prior arts, cited from [0012] to [0039] above, is the misconstrued use of the term "Power Mode" and "Power Mode OFF", as defined in at "[0007]
Glossary." This has given rise to several methods and systems that can save energy by leveraging processors' various consumption modes (not necessarily the entire system) called "low power," "sleep" or "deep sleep" modes and that are commonly and misleadingly referred to as "OFF" state modes. Under these conditions the term "OFF"
can be misconstrue as energy, regardless of the amount, is still being consumed.
[0041] As defined in "[0007] Glossary," in the current invention the term "Power Mode OFF" equates to a complete cessation of power. There is no energy consumed by the Processing Resource Tier - PRT while being in the "Power Mode OFF"
state. The current invention, in addition to its "Power Mode OFF" state, also provides the other common power consumption modes such as "low power," "sleep" or "deep sleep" modes. The referenced prior arts differ in their definitions as they have various and different energy consumption terms referred to as "OFF".
[0042] In some referenced embodiments, "low power" modes frequently referred to as "sleep" or "deep sleep" modes, are based on a various but similar approach where processing cores are in different power state when, depending on the clock frequency mode, power state register, the device registers has been up dated. In other cases the term "high power state" refers to a processing core that has been placed in a high clock frequency mode and the power state register has been updated to reflect this mode while other cores in the same silicon may not.
The "Off" state is a low-power consumption state in which processors may sleep fully or run in low speed. Another power management approach is to turn off unnecessary periodic services, or other management scenarios based on package thermal constraints. Many variants off these common technique can be found in referenced prior arts such as US Patent 8489904 B2, EP 1715405 Al, US
20020147932 Al, US 20130042246 Al, US 7412609 B2. WO
2013001576 Al, US 6901522 B2, W02004019194 A2, EP 1182552 A2. In addition these prior arts do not define any cessation of energy such as "Power Mode OFF" as defined in [0007] Glossary" of the current invention [0043] In other embodiments of the referenced prior arts, such as WO 2012040684 A2, WO 2007072458 A2, 2006 ACM 1-59593-322, EP 1653332 Bl, it should be realized by those skilled in the art that these publication are a variation of same power management or saving techniques. Some of these publications focus on the scheduling of applications in heterogeneous (not as defined herein at "[0007] Glossary") multiprocessor computing platforms or relates to dynamic adaptive power management for execution of data-dependent applications and the use of performance counters to collect run-time performance determining the energy characteristics of tasks by means of event monitoring counters to assign tasks to CPUs in a way that avoids overheating individual CPUs. Power management is limited to scheduling hot and cool tasks not to turning ON or OFF.
[0044] WO 2012036954 A3, Unrelated to power but to other claims in the present invention In WO 2008021024 A2, there is no power management as defined herein. US 8190939 B2, this patent is unrelated to power modes it aims at reducing power consumption of a system from historical system.
[0045] US 8176341 B2, describes a power management (PM) approach based on components maximum latency tolerances. The concept is based on a PM controller that receives latency guidelines from components associated with a system hosting the platform. The "OFF" modes are "temporary" and undefined and are for undefined components. Other undefined modes are based on application energy and performance needs and not the components latencies.
[0046] US 8214843, this patent describes a method to distribute and route tasks to the most energy efficient Data Center.
Server power management, as defined in "[0007] Glossary"
of the current invention, is not used in this patent.
[0047] W02013077972A1, this application was developed for portable computing devices that contain a single heterogeneous, multiprocessor system on a chip (SoC) and not a multiprocessor heterogeneous blade server. The invention works on the principles of thermally aware workload.
[0048] WO 2012175144 Al, this patent exploits a technique and system to a blade server controller configured to query power supplies to determine the available total power and the power limits of the blade server in the enclosure.
Power management is associated with distributing power according to statistical analysis and does not turn power OFF from any computing resources as the current invention does.
[0049] . US 8195859 B2, this disclosure is a technique mainly base on managing multiprocessor or a single server system executes operating systems each using a plurality of storage adapters and a plurality of network adapters.
Power modes per se are not addressed [0050] US 7555666 82, this grant is based on the general concept of the embedded server management technology which provides out-of-band management facilities (also known a Lights Out Manager) and permits turning servers OFF at the power supply or by a service manager (human) and allocates power to an information system. In this disclosure the term "information handling system" is an umbrella term that amounts to a blade server that may or may not include various components, in one form or another, of a server system. One skilled in the art will readily recognize from the description that, other than profiling application for various physical attributes, their no specific power modes directly associated to the actual processor chip other than the generic and general approach of throttling up or down mode or shutting off the power supply of the enclosure or the blade server.

[0051] 978-1-4244-9721 IEEE 1269 Asilomar 2010, this publication and method makes use of two types of processors, a high performance/energy version, as well as a medium/low power version. The power management here is limited to two powered-ON values not the multi power mode proposed by the current invention.
[0052] ISLPED'12, 2012 ACM 978-1-4503-1249, this paper studies the energy-efficient scheduling of heterogeneous (not as defined in "[0007] Glossary" of the current invention) prototype platform consisting of two Intel processors with the same ISA. There is no power management module discussed in this paper [0053] IEICE Transactions, this paper presents a Green Scheduler for energy savings in Cloud computing. It is composed of four algorithms one of which is a turning ON/OFF
algorithm. Similar to the aforementioned prior arts, this algorithm defines "power mode OFF" as a low-power state to which an entire server (not a processing component or resource) is sent for low energy used and not a cessation of energy.
[0054] IEEE paper, 1087-4089/12 2012 IEEE DOI 10.1109/1, this research project developed a combination of generic core and specialized cores to greatly improve performance while maintaining low energy consumption. Energy consumption, power management and power modes were not the subject.
[0055] ACM Paper CF'05, 5 ACM 1-59593-018-3, this research paper develops a task-to-frequency scheduler for a computing environment that uses processors operating at different frequencies and voltages. Power management, within the definition of the current invention, is not discussed in this paper.
[0056] 2010 IEEE 10th Int'l Symposium on Quality Electronic Design. This research paper studies the problem of minimizing the total power consumption of a Chip Multiprocessor (CMP) while maintaining a target average throughput. The term "powering OFF" is mentioned without a clear definition and as a precaution but not as a clearly defined procedure.
[0057] Although only a few exemplary embodiments have been described in detail in the aforementioned prior arts, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of the embodiments of the present disclosure.
7.2. DATA TRANSMISSION COMMONALITIES
[0059] Communication between blade servers and the DC access layer is a fundamental function of blade servers' NICs and the efficiency of any server is also linked to how quickly and efficiently data can be moved from blades to the TOR
switch and onto the DC access layer. Considering the volume of data traffic it follows that each blade server has a dedicated Network Interface Controller, hereinafter NIC, to transmit data provided by the blade server processing unit and multi-processing units to the TOR.
[0059] Multi-core and multi-processing blade servers generate a high rate of large segment of data traffic volume and, to compensate for the lack of physical NICs, manufacturers frequently use virtualization to replicate several NICs on one physical NIC. The NICs, depending on their transmission speeds, cabling structure, and voltage are powered continuously and consume energy even in their idle mode. These idle energy levels can vary between 4.6 and 21.2 watts and that could represent, including DC
infrastructure overhead, in the range of 12MW to 60MW for idle times only, per year and for an average DC.
[0060] Power efficiency and minimizing power usage are important issues in networked systems such as blade servers in DCs.
Programs which monitor the usage of various components of a computer system and shut down or minimize temporarily some of those components have been used in the past.
However, one area in which such power conservation has not been utilized is with respect to NIC units' power management. Optimizing NICs' power efficiency and power modes has not been previously addressed.
[0061] An aspect of the prior art cited from [0011] to [0038], is that there are no consideration given specifically to the power management of the actual NIC. In addition, the outgoing data flow traffic from a given blade server is directed to TOR switch through a dedicated blade server single NIC which may be virtualized in some embodiments.
[0062] There are other NICs on a server and these are dedicated for out-of-band management and not for communication with the TOR. This condition is such that performance improvements and power usage optimization that may have occurred through prior art embodiments may be offset by potential sequential dataf low bottlenecks and slowdown through the NIC, TOR switch or at the network level.
[00631 In the current invention, this potential bottleneck or slowdown of dataf low has been removed as every PRT in the blade server has its dedicated physical and non-virtualized NIC and hence direct high throughput accesses per PRT to the TOR switch and ultimately to the DC access layer and therefore a higher throughput per blade server.
[0064] Methods have been disclosed, such as US 8176341 B2 at [0047], in the past introducing power management controller of a platform and power management guidelines based on one or more plurality of components maximum latency tolerances. In this embodiment, the components may include a network adapter. The power management guidelines and policy of such a system determine a minimum of the maximum latency tolerances of components to determine one or more consumption levels such as "sleep" but not a "Power Mode" OFF state.
[0065] Another aspect of the prior art cited from [0011] to [0038], are methods and techniques of optimizing power efficiency with network processors (a.k.a network adapters or NIC) by using novel power saving algorithms for minimal energy operations. However, one area in which such power conservation has not been utilized is with respect to integrating the NIC in an embedded PRT power management scheme such as the current invention. This strategy, maintains the PRT and the integrated plurality of components, NIC included, in a "Power Mode" OFF state except when required by an application.
[0066] In the current invention each PRT has its dedicated and integrated NIC which is supplied in power by the PRT power plane which is in turn powered ON by the TPC device controlled by Power Management Controller unit in the blade server Front-end Processing Tier - FPT . Such that, when a PRT is in a "Power Mode OFF" state, the NIC
embedded in the PRT is in an "OFF" state as well.

7.3. COMMONALITIES WITH HETEROGENEOUS AND HOMOGENEOUS
[0067] As used in prior art cited from [0011] to [0038], the term homogeneous, as defined in "[0007] Glossary," specifies correctly architectures of multiprocessors that may be multi-cores of the same kind and based on the same instruction set architecture (ISA).
[0068] In other embodiments of the cited prior art, the term "heterogeneous" misleadingly defines architectures of ISA-compatible or same ISA processors of different core size, core speed, and number of cores that may be on the same or different silicon chips.
[0069] In prior arts, cited from [0011] to [0038], it is also understood, by those skilled in the art, that a single blade server is generally a system architecture comprised of a processing unit/units, memory, storage, a single NIC
specific to a TOR connection, and other out-of-band network interfaces, all of which are connected by a plurality of various components. In some cases these system architecture have several processors of same ISA
but a different number of cores, core size, speed, and different clock frequency. In some other variants, these processing units may be identical, but running different operating systems running identical binaries. These architectures, commonly referred and misconstrue as "heterogeneous" architecture, do not reflect the meaning of "heterogeneous" as defined in "[0007] Glossary," and intended in the current invention.
[0070] In another aspect, the current invention differs in its definition of the term "heterogeneous" from prior arts as it defines heterogeneous system architecture as a system that may use more than one kind of processor with different ISA. These processors may be multi-core, may be multi-size, may be on different silicon chips and may have different binary interfaces without leading to application incompatibilities.
8. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0071] Therefore, there is a need, as proposed by the current invention, for a DC multi-processor single board blade server technology that could be, as defined at "[0007]
Glossary," heterogeneous in some embodiments as well as homogeneous in others. Such a server would be associated with a system and methods that would result in a significant reduction Data Centers (DCs) energy costs, that would improve their computing infrastructures green energy efficiency and workload performance of their application processes, as well as a reduction of the overall power consumption of DCs servers and of DCs infrastructures. This invention would lead to a reduction of the overall energy bill of DCs and improve DCs' overall environmental energy efficiency as well as their sustainability.
[0072] While heterogeneous, as defined in "[0007] Glossary" of this invention, multi-processor system architectures are common in other computing fields, they appear not to be in use in DCs. Homogeneous X-86-based-ISA multi-processor servers dominate DC server architectures and in DCs there are no heterogeneous, as defined in "[0007] Glossary" of this invention, multi-processors on a single blade server board with a plurality of processors and sub-processors with such a sophisticated data transmission and energy control scheme. The object of this invention is to provide a fully satisfactory response to those needs.
8.1. THE TIER ARCHITECTURE
[0073] According to the present invention, a single blade server architecture is comprised of three major integrated and nonhierarchical tiers as follows:
- The main blade server per se, also called the Front-end Processing Tier - FPT , hereinafter FPT .
- A plurality of "Processing Resource Tiers", hereinafter PRTs;
- The "Power Management Control Tier, hereinafter PMCT.
[0074] This invention also relates to a corresponding system, as well as a related computer program product, loadable in the memory of at least one PRT, the FPT, the PMCT, and including software code portions for performing the steps of the method of the invention when the product is run on a blade server. As used herein, reference to such a PRT, FPT or PMCT program product is intended to be equivalent to reference to a computer-readable medium containing instructions that may control a PRT, FPT or PMCT that may control a blade server, that may control integrated elements, that may coordinate various management schemes and performance of the method of the invention. Reference to "at least one Processing Resource Tier - PRT " or "at least one PRT" is intended to highlight the possibility for the present invention to be implemented in a distributed and/or modular fashion.

8.2. THE MAIN BLADE SERVER, FRONT-END PROCESSING TIER -FPT
[0075] In the following blade server description, details are set forth in order to provide an understanding of various embodiments. However, various embodiments of the invention may be practiced without specific details. Well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments of the invention.
[0076] DC, cloud computing, or hybrid cloud computing systems include a plurality of servers that are typically rack mounted. Normally, the servers are blade-type servers that may share resources such as power, cooling, network connectivity, and storage. Each server as generally understood in the art, has on-board power and cooling. In order to manage power to the server an out-of-band power management entity, also known as a "resource processor/manager," is in communication with servers through a separate and independent low speed network.
[0077] In the current invention, the main server board, also called the Front-end Processing Tier - FPT , is a blade server with its own low-power consumption General Purpose Processor (GPP), memory, storage, network controllers (NICs), various Processing Resource Tier - PRT, a "Power Management Control Tier - PMCT," out-of-band server manager, operating system, and software tools.
8.3. PROCESSING RESOURCE TIERS - PRTs [0078] The current invention differs from other prior arts as every single blade server, in addition to the standard blade servers' plurality of components and processor, integrates a plurality of independent PRTs that may be scaleable on demand to be either a homogeneous or a heterogeneous multi-processor systems and each having their own storage, memory, and a dedicated independent NIC
directly connected to the TOR switch.
[0079] Each PRT has its dedicated power plane that has selectable "Power Mode" or "states" as defined in "[0007] Glossary"
in this invention. The PRT power plane is triggered ON or OFF by a Tier Power Control device, hereinafter TPC, on the "Power Management Control Tier - PMCT."
[um] By default, PRTs are in a "Power OFF" state. Powering ON
the PRT will power the plurality of its components including all gateways connecting the powered up PRT to the blade server FPT and the PMCT architecture, as well as heir buses, shared memory, storage devices and the associated NIC interface to the TOR switch. Once the PRT
has gone through its latency wake up cycles, the TPC will notify the PMCT, which will in turn notify the Master Control Module (MCC) software module running on the FPT , that PRT is ready to compute and transmit data.
[0081] The instruction that triggers the TPC to change the power state, ON or OFF, of a specific PRT power plane is issued by the MMC Monitor sub module to the PMCT that will in turn instruct the TPC to trigger the ON or OFF. Once this powering ON or OFF is completed, the "Power Function"
portion of the TPC goes on standby and waits for another power ON or OFF instruction. Once the PRT architecture power planes have been powered ON, the PRT can then have several other power modes that can vary according to the application requirements and are dependent on its integrated processor components specifications. These can be, as dictated by the application, through the MMC and PMCT, any of the following: "active mode" such as throttled up mode, thereby in a high-power mode or a "sleep mode" or a "deep sleep" mode or "standby" mode, thereby in a low-power mode.
[0082] The PRTs may have computing resources that may be multi-processors multi-cores systems, that may use different and sometimes incompatible Instruction Set Architecture (ISA), without leading to binary incompatibility, that may interpret memory in different ways and may have different Application Binary Interfaces (ABI) or/and Application Programming Interfaces (API.) These multi-core multi-processor PRTs may be of different or of the same processing capacity. Several embedded independent heterogeneous and homogeneous PRTs may be embedded in the blade server architecture. The blade server must include at least one PRT that it identifies as the default PRT.
[0083] As an example, a PRT computing resource may include a processor, a sub-processor; microprocessor; a multi-core processor; a digital signal processor; a digital logic; a field programmable gate array that executes operational instructions to perform a sequence of tasks; a "general-purpose processor" (GPP) device; and a graphic processing unit (CPU) having a fixed form and whose functionality is variable, wherein this variable functionality is defined by fetching instructions and executing those instructions (for example, an Intel processor, an AND processor, or a Texas Instruments DSP). The instructions can be stored in firmware or software and can represent anywhere from a very limited to a very general instruction set.
(0084] The PRT architecture configuration allows direct data transmission results from each individual PRT to the TOR

switch. This direct connection is independent of the main blade server NIC and results in a reduction of overhead and congestion associated with transmission between the main blade server FPT resource and the TOR switch.
8.4. THE "POWER MANAGEMENT CONTROL TIER - PMCT"
[0085] It is yet another object of the present invention to provide an embedded "Power Management Control Tier", hereinafter PMCT. The PMCT is an integral part of the blade server main board or FPT architecture. It is one of the blade server power management key devices and comprising a plurality of components and sub-devices. The PMCT power plane is the same as the FPT power plane and is a main computing device that fulfills the following functions:
- It is the hardware interface for the Master Control Module (MCC) software module running on the FPT.
- It dispenses power instructions to its components and to all PRTs through its integrated Tier Power Control TPC device.
- It manages bus interfaces between its architecture, the PRTs and the FPT.
- It monitors the blade server and all PRTs physical data.
- It manages the reconfigurable Virtual Processing Element VPE.
8.4.1. PMCT Interface to the MCC
[00861 The PMCT has hardware management functions as directed by the Master Control Command MCC software module, such as the "Scheduler" or the "Predictor" among others. It directs, as instructed by the MCC, application execution to the appropriate Tier and ensures that the Tier Power Control TPC device powers up or powers down the selected Tier. The PMCT also manages its Virtual Processing Element VPE power and provides access for all processors on the FPT and PRT access to the VPE when instructed by the MCC.
8.4.2. PMCT Power dispensing functions [0087] An exemplary embodiment of the present invention utilizes the PMCT to maintain PRTs in their default power OFF
states or, as instructed by the Master Control Command -MCC, to instruct its integrated Tier Power Control TPC
device to provide the selected PRT power plane with power from the FPT power plane. The single purpose of the TPC
"Power Function" is to power ON or OFF and supply power to the PRTs' architecture power planes as well as the VPE on the PMCT.
[0088] In yet another embodiment, the PMCT power management function is responsible to power up, as instructed by the MCC Scheduler sub-module, the Virtual Processing Element VPE integrated in the PMCT architecture.
8.4.3. Bus Interfaces [0089] In this exemplary embodiment, the PMCT is connected via a dedicated high speed bus to each of the following:
- All Processing Resource Tier - PRTs;
- The Smart Out-of-band Management controller;
- The blade server main architecture;
- The server main I/O hub.

8.4.4. Monitoring [0090] In yet another aspect of the PMCT architecture is that, through the ADC device connection, the PMCT would monitor (as the term is defined at "[00071 Glossary") PRT and FPT
temperatures, fan speed and voltage levels. The data resulting from such monitoring would be transmitted to the Smart Out-of-Band Management - SOM controller and to the operating system via the ACPI protocol or some similar protocols.
[0091] In accordance with another aspect of the invention, the PMCT architecture, through a clock generator sub-device, would generate clock signals to permit PRT clock frequency changes thereby allowing clock-gating power management schemes in subsequent versions of this invention.
8.4.5. The "Virtual Processing Element VPE".
[0092] In yet another aspect of the invention is the integration in the PMCT of a reconfigurable co-processor accelerator, the "Virtual Processing Element VPE". The VPE is used for the Reconfigurable Computing Accelerator function, hereinafter RCA function, that provides, through its system interface, a sub-processor computing resource that may be a co-processor accelerator, and/or "Application Specific Instruction Set Processor- ASIP" to other PRT's processors and/or to the blade server, i.e. FPT
processor. In its co-processor and accelerator role, the VPE may have one or several cores that are architecturally different processor cores both in numbers and in types, have different instruction sets, and have different bit address segmentations. It is therefore one of the purposes of the VPE in the co-processor exemplary embodiment to process standard relevance ranking algorithms for other sub-processors, such as PageRank and the rank boost algorithm, or other types of algorithms such as Fast Fourier Transforms and complex matrices, and highly repetitive algorithms used such as the one used by web search engines, or such as Data Encryption, Advanced Encryption, Stemming, and Tokenization. In this RCA
configuration, the VPE interface would select the high-speed bus I/O connecting the said VPE to the requesting PRT or FPT .
[0093] Another aspect of the invention is that by providing an additional processing computing resources such as a VPE
co-processor, the blade server would be a more effective computation resource, faster and have a higher green efficiency level when computing compute intensive functionalities. This also enables flexible addition, or reassignment of child processes to a co-processor such as the VPE.
[0094] The VPE is by default in a "Power OFF" state. Powering ON
the VPE is a function of the TPC and of an instruction from the MMC to the PMCT to the TPC.
8.5. Green Energy Predictor - GEP
[0095] In all embodiments of this invention, the term "greenest most favourable energy efficiency" is often interchanged with the term "green energy efficiency"; they are considered as having the same meaning.
[0096] In all embodiments of this invention the greenest most favourable energy efficiency "level" is a weighting value generated by a "Green Energy Predictor" hereafter GEP
scheme that defines the best possible and greenest energy efficiency performance of a given Processing Resource Tier PRT for a given process and its sub-processes.
[0097] In all embodiments of this invention it is understood that the greenest most favourable energy efficiency "level" is the result of a scheme based on an expression that combines the shortest length of time a said Processing Resource Tier - PRT executes a said process; the least heat dissipated by the said Processing Resource Tier - PRT
during the said process execution; and the least amount of electricity or electrical energy used by the said Processing Resource Tier - PRT during the said process execution.
[0098] It is therefore one of the objects of the present invention to dynamically assign DCs processes and sub-processes assignability levels that correspond to the greenest most favourable energy efficient PRT on the said blade server for the said process and associated sub-process.
[0099] Therefore it is the object of the present invention to provide a system and method that, when executing a DC
application in the said blade server, are capable of dynamically sequencing and disseminating to the appropriate Processing Resource Tier - PRT in the said blade server power and a power status according to the greenest most favourable energy efficiency level as determined by the GEP scheme.
8.6. MASTER COMMAND CONTROLLER - MCC
1001001 It is another object of the present invention to provide a maximum performance execution of DC applications while maintaining the best possible greenest energy efficiency during the said execution by providing a "Master Command Controller", hereafter MCC module, that manages and maintains energy efficiency through its sub-modules.
Min] The embodiment of the MCC includes three main modules as follows:
- A Monitor to monitor processes;
- A Predictor that predicts which PRT is the most green efficient PRT;
- A Scheduler that schedules powering sequence of the said PRT.
[00102] In an exemplary embodiment the MCC, through its Scheduler, would instruct the PMCT element to power ON a specific PRT
or the VPE. Upon receiving that instruction, the PMCT
would instruct the "Tier Power Control - TPC" device to power ON the selected PRT power plane or to power ON the VPE as requested by the MCC-Scheduler and PMCT.
[00103] What has been described above includes examples of the subject invention. It is, of course, not possible to describe every conceivable combination of Processing Resource Tiers or components or methodologies for purposes of describing the subject invention, but one of ordinary skill in the art may recognize that many further combinations and permutations of the subject invention are possible. Accordingly, the subject invention is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.
9. BRIEF DESCRIPTION OF THE DRAWINGS
[00104] Embodiments of the invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
FIG.1 - Illustrates a high-level system flow chart of an exemplary heterogeneous blade server system in accordance with an embodiment for which the invention applies.
FIG.2 - Represents and exploded view of the Master Command Controller-MCC module and the flow of the process through the MCC and into the PMCT.
FIG.3 - Illustrates the Power Management Control Tier - PMCT
architecture and its interfaces and common buses with the blade server architecture (FPT) and links to the Processing Resource Tier FIG.4 - Illustrates the Processing Resource Tier - PRTs architectures and their interfaces and common buses with the blade server architecture (FPT) and links to the Power Management Control Tier - PMCT.
Fig.5 Illustrates the complete blade server with its PRTs, PMCT, and its plurality of storage, memory, memory controller, buses, and connections to the top-of-rack switch and to the network.
Fig.6 Illustrates the final and complete blade server motherboard with all its Tiers such as PRTs, PMCT, and FPT (GPP) and its plurality of storage, memory, memory controller, buses, and connections to the top-of-rack switch and to the network.
10. DETAILED DESCRIPTION OF THE DRAWINGS
[00105] The details of figures in this patent application are progressive, that is they are increasingly detailed as components are added in figures and figures are added. A
component in one figure may not be detailed or numbered in subsequent figures. For example, the Front-end Processing Tier - FPT role would be explained in one figure, it would appear in subsequent figures with no explanation to avoid pointless repetition.
[00106] FIG.1 - Illustrates a high-level system flow chart of an exemplary heterogeneous blade server system in accordance with an embodiment for which the invention applies.
- 1001 - A TOR switch is connected to the IP network DC
core layer via the DC aggregation layer. In other embodiments, the network could be a base transceiver station (BTS) connected to a cellular or wireless network.
- 1002 - A blade server architecture designed on the current patent application receives a process from the network 1001 via a TOR switch.
- 1003 - The blade server (Front-end Processing Tier -FPT) starts with a Power-On-Self-Test (POST) and liveness test.

- 1004 - The Power Management Control Tier - PMCT is activated and executes a self-liveness test.
- 1005 - The blade server FPT system boots and the rest of the operating system (0/S) starts up.
- 1006 - The Master Command Controller MCC is initiated and loaded into memory.
- 1007 - The blade server FPT runs the initial operating system tasks.
- 1008 - The Green Energy Predictor (GEP) table is loaded into memory.
- 1009 - Process begins.
- 1010 - The blade server FPT starts processing processes initial tasks of the first process (Process 1).
- (Tasks from 1011 to 1015 take place inside the Master Command Controller-MCC module which is composed of three sub modules: the MCC-Monitor, the MCC-Predictor and the MCC-Scheduler.) - 1011 - The MCC-Monitor sub-module monitors the process (1) to identify a process name.
- 1012 - Once the MCC-Monitor sub-module has located the process name it notifies the MCC-Predictor sub-module of the information (process name) and continues to monitor the process.
- 1013 - The MCC-Predictor matches the process name to a value on the Green Energy Predictor (GEP) table to obtain the value that represents the greenest most favourable energy efficiency level for one of the Processing Resource Tier - PRT.
- 1014 - The PRT identity information associated to greenest most favourable energy efficiency level is then relayed by the MCC-Predictor to the MCC-Scheduler sub-module.
- 1015 - The MCC-Scheduler notifies the Power Management Control Tier - PMCT device which PRT is needed for the process (1) and should be powered ON.
- 1016 - The PMCT notifies the Tier Power Control TPC
device to power ON the selected PRT power plane and notifies the MCC main module that the PRT is powered ON and ready to process the process (1).
- 1017 - The MCC then lets the process (1) run on the powered up PRT, freeing up the blade server's General Purpose Processor GPP of the process (1) task.
- 1018 - The blade server (FPT) relinquishes to the PRT
running the process (1) and goes on idle waiting for another process (2).
- 1019 - When the selected PRT has completed the process (1) the 0/S and the MCC are notified.
- 1020 - Once the process is terminated, the MCC is notified that the process is terminated through the 0/S.

- 1021 - The MCC notifies the PMCT to power OFF the PRT.
- 1022 - The cycle ends unless the blade server (FPT) runs a second process and the cycle (1010) starts over again.
(00107) FIG.2 - Represents and exploded view of the Master Command Controller-MCC and the flow of the process through the MCC
and into the Power Management Control Tier - PMCT.
- 201 - Illustrates the entire Master Command Controller-MCC
with its three main modules (in decision diamonds) as follows:
203 The MCC-Monitor;
204 The MCC-Predictor, 213 The MCC-Scheduler.
The blade server (FPT) 202 starts to process a process and a decision diamond 201 in 3 points ("YES", "NO" and "DONE") is reached where MCC-Monitor monitors the FPT for a process name and takes one of the three decisions as follows:
- If "NO" process name is recognized 205 the MCC-Monitor bypasses the Predictor 204 and instructs the Scheduler) to notify the PMCT
that the default PRT should be used for this no-name-process.
- If "YES", a process name is identified, the MCC-Monitor transfers the process name information to the Predictor.
When "DONE" with its two main decisions, the MCC-Monitor returns to monitoring the blade server processes (206).

- 206 - The MCC-Monitor has completed his tasks ("DONE") and returns to monitoring the blade server for new processes.
- 207 - The MCC-Predictor receives process name from MCC-Monitor.
- 208 - The MCC-Predictor looks up the process name in the GEP- table.
- 209 - The MCC-Predictor matches the process name to the best weighting value in the GEP table.
- 210 - The MCC-Predictor matches the value to a PRT.
- 211 - The MCC-Predictor identifies the PRT.
- 212 - The MCC-Predictor notifies the MCC-Scheduler of the PRT ID and the MCC-predictor goes on idle waiting for further process from the MCC-Monitor.
- 213 - The MCC-Scheduler notifies the PMCT of the PRT ID.
- 214 -The PMCT receives from the MCC the PRT ID.
- 215 -The PMCT notifies TPC device to trigger the selected Processing Resource Tier to power ON.
[00108] FIG.3 - Illustrates the Power Management Control Tier -PMCT architecture and its interfaces and common buses with the blade server architecture (FPT) and links to the Processing Resource Tier The PMCT architecture 300 is integrated in the FPT
architecture and shares the same power plane as the FPT .
Its main computing device 301 is the Power Management Control that is connected to the FPT general purpose processor via a gateway 308 that connects it via a high speed bus 311 to the blade server I/O hub 312. In addition to the PMCT task of controlling its architecture internal and external buses, it is connected to a plurality of components such as 302 a clock generator, a 303 voltage regulator is as well as a 305 Analog to Digital converter that works with the PMCT main computing module for sensing temperature, fan speeds etc. as described in paragraph 8.4.4. One of its most important components is the 305 Virtual Processing Element VPE that fulfills the role of a co-processor accelerator as described in paragraph 8.4.5.
The VPE is powered by an instruction given to the 306 Tier Power Control - TPC device by the PMCT. The TPC also is responsible, upon instruction form the PMCT, to power ON
or OFF PRTs via its high speed bus 309. In its role of a co-processor, the VPE send and receives data from other PRTs via a gateway 307 connected to a high speed bus 310.
The VPE communicates with the FPT General Purpose Processor (GPP) 313 via the PMCT gateway 308 and the high speed bus 3/1 connected to the I/O hub. The I/O hub is also connected to other PRTs via similar high speed buses 311. Data is sent to the Smart Out-of-band Management -SOM controller 317 via another bus 316 connected to the I/O hub while outgoing data from the SOM controller flows out to the service manager via an Ethernet port 319. A RAM
323 is shared among all processing resources via a shared RAM bus 324. The GPP 3/3 has two high speed network connections, one 314, connects the FPT to the TOR, and the other connection 3/5 enables the blade server to share the Green Efficiency Predictor data stored in the non-volatile solid state drive (SSD) 322 of the blade server (FPT). The GPP also has access to its RAM 321 and shares a computer readable medium storage 320 with all computing resources and it can access this resource 320 through its connection 325 with the I/O hub.

[001091 FIG.4 - Illustrates the Processing Resource Tier - PRTs architectures and their interfaces and common buses with the blade server architecture (FPT) and links to the Power Management Control Tier - PMCT
The PRTs are similar in their architectures 400 with the exception of the type of processing resource 401, i.e. the processor. The processing resource may be a Graphic Processing Unit (GPU); it may be a Digital Signal Processors, a General Purpose Processor (GPP, or any type of heterogeneous or homogeneous processor as required by the application. PRTs architectures 400 are independent from the PMCT and blade server architecture FPT. While they draw their power from the blade server power planes, their power plan can only be powered ON by an instruction sent to the PMCT Tier Power Control -TPC device and forwarded on a high speed buss 409 (309 on Fig.3) to the 408 PRT tier power control connection. All PRTs, like all other processing resources on the FPT, have access to the PMCT VPE. This access is application dependent and can be done through the 406 gateway and the high speed 410 (310 on Fig.3) bus. Similarly, all PRTs have their dedicated Network interface controller (NIC) 402 and a direct access to the Top-of-Rack switch through a high speed port 403.
To facilitate communication with the blade server and the PMCT, every PRT has a gateway 404 that provide them access to the high speed connecting all PRTs to the blade server I/O hub. All computing resources share a system RAM 413 and in addition have their own RAM 406 and independent storage 412.

[00110] Fig.5 Illustrates the complete blade server with its PRTs, PMCT, and its plurality of storage, memory, memory controller, buses, and connections to the top-of-rack switch and to the network.
As illustrated all processing resources (i.e. PRTs, GPP, and PMCT) have either a direct bus 501 connection to the TOR switch 502 and on to the DC network 503 and a connection, via the I/O hub, to the GPP and its storage (NAND) 508. The GPP has a two Network Interface Controllers; one of them 504 is dedicated to the TOR
switch, while the other 505 is a connection to an inter-server network in the same enclosure. It is through the inter-server NIC 505, that the GPP connects to an enclosure network 506 that allows the sharing of GEP data stored in a non-volatile NAND (USB flash drives or solid-state drives) 508. Data collected for or by the Smart Out-of-band Management - SOM controller is sent to the service manager via a Ethernet connection 502 and bus 507.
[00111] Fig.6 Illustrates the final and complete blade server motherboard with all its Tiers such as PRTs, PMCT, and FPT
(GPP) and its plurality of storage, memory, memory controller, buses, and connections to the top-of-rack switch and to the network.
[00112] What has been described above includes examples of the subject invention. It is, of course, not possible to describe every conceivable combination of computing resources or components or methodologies for purposes of describing the subject invention, but one of ordinary skill in the art may recognize that many further combinations and permutations of the subject invention are possible. Accordingly, the subject invention is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.

References 1 US 8489904 Allocating computing system power levels responsive to service level agreements https://www.google.com/patents/US8489904?dq=Allocating+compu ting+system+power+levels+responsive+to+service+level+agreemen ts&h1=en&sa=X&ei=Q7SKUp6b.IsS72wX8uYHQDQ&ved=0CDk Q6AEwAA
2 WO 2013001576 Al "Multiprocessor system and method of saving energy therein"
https://www.google.ca/patents/W02013001576A I ?cl=en&dq=multi processor+system+-+SUN&h1=en&sa=X&ei=-JZVUpzeE-TL2QXE3YD4Dw&ved=0CGgQ6AEwBzgK
3 WO 2012036954 A3 "Scheduling amongst multiple processors"
http://www.google.com/patents/W02012036954A3?cl=fr 4 US 6901522 "System and method for reducing power consumption in multiprocessor system"
http://www.google.ca/patents/US6901522 5 WO 2008021024 A2 "Multiprocessor architecture with hierarchical processor organization"
https://www.google.it/patents/W02008021024A2?h1=it&dq=W0+2 008021024+A3,+%22+Multiprocessor+architecture+with+hierarchi cal+processor+organization%22&ei=BOGUUreCN4evsQTMmoHw Cg&cl=en "Processing method, system and computer program product for 6 EP 1715405 Al dynamic allocation of processing tasks in a multiprocessor cluster platforms with power adjustment".

hftps://www.google.itipatents/EP1715405A1?cl=en&dq=EP+17154 05+A1,+%22+Processing+method,+system+and+computer+progra m+product+for+dynamic+allocation+of+processing+tasks+in+a+m ultiprocessor+cluster+platforms+with+power+adjustment&h1=it&sa =X& ei=ReiLjUu78C_OssQ SichYDwB g&ved=0C D8Q6AEwAA
7 US 8176341 B2 "Platform power management based on latency guidance", haps ://www.google.it/patents/US8176341?dq=US+8176341+132,+
%22Platform+power+management+based+on+latency+guidance%2 2,&h1=it&sa=X&ei=i-iUUszTOKSxsATU24HgDw&ved=0CD8Q6AEwAA
8 US 20020147932 Al "Controlling power and performance in a multiprocessing system".
https://www.google.it/patents/US20020147932?dq=US+200201479 32+A1,+%22Controlling+power+and+performance+in+a+multiproc essing+system%22.&h1=it&sa=X&ei=DPmUUqymOLDdsASnpYL, gCg&ved=0CD8Q6AEwAA
9 WO 2012040684 A2 "Application scheduling in heterogeneous multiprocessor computing platforms"
http://www.google.corn/patents/W02012040684A2 EP 1653332 B! "Multiprocessor computer for task distribution with heat emission levelling"
https://www.google.com/patents/EP1653332B1?cl=en&dq=EP+165 3332+B1,+ /022Multiprocessor+computer+for+task+distribution+wi th+heat+emission+levelling%22,&h1=en&sa=X&ei=KvqUlloX6Ae X.IsQSLxIH4Cg&ved=0CDkO6AEwAA
11 W02004019194 A2 "Method and apparatus for adaptive power consumption".
http://www. google.nl/patents/W02004019194A2? cl=en
12 WO 2007072458 A2 "Performance analysis based system level power management".

https://www.google.com/patents/W02007072458A2?cl=en&dq=Per formance+analysis+based+system+level+power+management&h1=e n&sa=X&ei=f01-UufqE4nOyAGwsYH4DQ&ved=0CDkQ6AEwAA
13 EP 1182552 "Dynamic hardware configuration for energy management systems using task attributes"
http://www.google.com/patents/EF1182552A2
14 US 20130042246 "Suspension and/or throttling of processes for connected standby"
https://www.google.ca/patents/US20130042246?dq=2013/0042246 &h1=en&sa=X&ei=0AWAUuXwBZT7yAGn3ICgCQ&ved=0CDo Q6AEwAA
15 US 8214843 "Framework for distribution of computer workloads based on real-time energy costs"
https://vvww.google.ca/patents/US8214843?dq=Framework+for+dist ribution+of+computer+workloads+based+on+real-time+energy+costs&h1=en&sa=X&ei.----ecKPUv69NcSZrgGKr4HoC
A&ved=0CDoQ6AEwAA
16 US 7412609 "Handling wake events in a device driver"
https://www.google.ca/patents/US7412609?dq=Handling+wake+eve nts+in+a+device+driver&h1=en&sa=X&ei=3zCCUqaTAsSuyQGDx 4DoBg&ved=OCEEQ6AEwAQ
17 W02013077972 "Thermally driven workload scheduling in a heterogeneous multi -processor system on a chip"
http://www.google.com/patents/W02013077972A1
18 WO 2012175144 Al "Blade server system and method for power management http://www.google.com/patents/W02012175144A1
19 US 8190939 B2 " Reducing power consumption of computing devices by forecasting computing performance needs"

https://www.google.ca/patents/US8190939?pg=PA1&dq=US+8190 939+B2&h1=en&sa=X&ei=tBqKUuT8GeGbygGc_4GQCw&ved=0 CDoQ6AEwAA
20 US 8195859 B2 "Techniques for managing processor resource for a multi-processor server executing multiple operating systems" Also https://www.google.ca/patents/US8195859?dq=US+8,195,859&hl=
en&sa=X&ei=3wWAUoazDcGSyQGR7oHIAg&ved=0CDgQ6AE
wAA
21 US 7555666 B2 Power profiling application for managing power allocation in an information handling system https://www.google.ca/patents/US7555666?pg=PA7&dq=power+m onitoring+AND+blade+server&h1=en&sa=X&ei=jmA7UrCWAser4 AP2y1GADg&ved=0CF0Q6AEwBQ

IEEE- On prediction to dynamically assign heterogeneous
22 microprocessors to the Minimum Joint Power State to Achieve Ultra low power cloud computing http://ieeexplore.ieee.org/stamp/stamp.jsp?amumber=05757735
23 2012 ACM 978-1- Energy-Efficient Scheduling on Heterogeneous Multi-Core 4503-1249-3/12/07 Architectures http://cadlab.cs.ucla.edut¨cong/papers/islped12_y.pdf IEICE Transactions
24 94-D(9):1731-1741 Prediction-Based Green Scheduler for Datacenters in Clouds (2011)A
http://webcache.googleusercontent.com/search?q=cache:http://ino-www.jaist.ac.jp/members/duytvt/lEICE2011.pdf 1087-4089/12 $26.00
25 10.1109/1-EHA: The Extremely Heterogeneous Architecture SPAN.2012.11 http://ieeexplore.ieee.org/stamp/stamp.jsp?amurnber=06428802
26 Scheduling for Heterogeneous Processors in Server Systems http://delivery.acm.org/10.1145/1070000/1062295/p199-ghiasi.pdf?ip=132.207.4.76&id=1062295&acc=ACTIVE%20SERV
ICE&key=C27 16 FEBFA981EF1F531C8F1F78C0A4995BAA35 DE
8CB052A&CFID=381886141&CFTOKEN=38030242& acm =
1385330501_7e189b3971ad176d343712cd52e2bce6 1-xxxx-xxxx-4/10/

[00034] Minimizing the Power Consumption of a Chip
27 10th Intl Symposium Multiprocessor under an Average Throughput Constraint on Quality Electronic Design http://atrak.usc.edut-massoud/Papers/CMP-minpower-isqed10.pdf
28 Balancing Power Consumption in Multiprocessor Systems http://wvvw.cs.kuleuven.ac.be/conference/EuroSys2006/papers/p403 -merkel.pdf =
CA2845341A 2014-03-11 2014-03-11 A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in data centers Abandoned CA2845341A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA2845341A CA2845341A1 (en) 2014-03-11 2014-03-11 A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in data centers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA2845341A CA2845341A1 (en) 2014-03-11 2014-03-11 A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in data centers

Publications (1)

Publication Number Publication Date
CA2845341A1 true CA2845341A1 (en) 2015-09-11

Family

ID=54065553

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2845341A Abandoned CA2845341A1 (en) 2014-03-11 2014-03-11 A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in data centers

Country Status (1)

Country Link
CA (1) CA2845341A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028207A (en) * 2022-05-16 2023-04-28 荣耀终端有限公司 Scheduling policy determination method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028207A (en) * 2022-05-16 2023-04-28 荣耀终端有限公司 Scheduling policy determination method, device, equipment and storage medium
CN116028207B (en) * 2022-05-16 2024-04-12 荣耀终端有限公司 Scheduling policy determination method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Chun et al. An energy case for hybrid datacenters
Rodero et al. Energy-efficient application-aware online provisioning for virtualized clouds and data centers
Beloglazov et al. A taxonomy and survey of energy-efficient data centers and cloud computing systems
KR101476568B1 (en) Providing per core voltage and frequency control
US10355966B2 (en) Managing variations among nodes in parallel system frameworks
US8381215B2 (en) Method and system for power-management aware dispatcher
Cai et al. SLA-aware energy-efficient scheduling scheme for Hadoop YARN
US9329664B2 (en) Power management for a computer system
Saxe Power-efficient software
Devadas et al. Coordinated power management of periodic real-time tasks on chip multiprocessors
JP2021531526A (en) Power control arbitration
TW201217954A (en) Power management in a multi-processor computer system
CN107111349B (en) Low power control and management of driving access-based resources for multi-core system on chip
Lim et al. A dynamic energy management scheme for multi-tier data centers
EP4027241A1 (en) Method and system for optimizing rack server resources
US10185384B2 (en) Reducing power by vacating subsets of CPUs and memory
Raj et al. Power management in virtualized datacenter–A survey
US20150293582A1 (en) Energy Efficient Blade Server and Method for Reducing the Power Consumption of a Data Center Using the Energy Efficient Blade Server
Ruiu et al. Workload management for power efficiency in heterogeneous data centers
Zheng et al. Optimal server provisioning and frequency adjustment in server clusters
CA2845341A1 (en) A computer system, methods, apparatus for processing applications, dispensing workloads, monitor energy and sequence power to nonhierarchical multi-tier blade servers in data centers
Friis et al. Strategies for minimization of energy consumption in data Centers
Rodero et al. Energy efficiency in HPC systems
Puneetha et al. Energy efficient power aware VM scheduling in decentralized cloud
Lifa et al. On-the-fly energy minimization for multi-mode real-time systems on heterogeneous platforms

Legal Events

Date Code Title Description
FZDE Dead

Effective date: 20161019