CN111427744A - Power consumption management method, equipment and medium for server - Google Patents

Power consumption management method, equipment and medium for server Download PDF

Info

Publication number
CN111427744A
CN111427744A CN202010176261.4A CN202010176261A CN111427744A CN 111427744 A CN111427744 A CN 111427744A CN 202010176261 A CN202010176261 A CN 202010176261A CN 111427744 A CN111427744 A CN 111427744A
Authority
CN
China
Prior art keywords
power consumption
bmc
cpu
gpu
temperature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010176261.4A
Other languages
Chinese (zh)
Inventor
杨洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010176261.4A priority Critical patent/CN111427744A/en
Publication of CN111427744A publication Critical patent/CN111427744A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • G06F11/3062Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations where the monitored property is the power consumption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/20Cooling means
    • G06F1/206Cooling means comprising thermal management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Power Sources (AREA)

Abstract

The invention discloses a power consumption management method of a server, which comprises the following steps: establishing connection between a first BMC in the computing node and a second BMC in the GPU chassis; collecting the power consumption of the computing node and the temperature of a CPU by using the first BMC and collecting the power consumption of the GPU case by using the second BMC; the second BMC uploads the collected power consumption of the GPU chassis to the first BMC; and the first BMC generates corresponding control signals according to the temperature of the CPU, the power consumption of the computing node and the power consumption of the GPU case so as to adjust the power consumption of the computing node and/or the GPU case. The invention also discloses a computer device and a readable storage medium. The scheme provided by the invention can realize integrated dynamic power consumption management between the computing node and the GPUBOX, and can realize the optimal performance power consumption ratio between the computing node and the GPUBOX.

Description

Power consumption management method, equipment and medium for server
Technical Field
The present invention relates to the field of servers, and in particular, to a power consumption management method, device, and storage medium for a server.
Background
The rapid development of artificial intelligence makes the industry demand for computing power higher and higher, and for applications such as deep learning, the improvement of computing power benefits from the application of GPU and NNP (dedicated neural network processor), and the mainstream form in the industry is computing node + JBOG, in which case, one computing node may be collocated with one or more JBOG. At present, the mainstream GPUs are high-power-consumption components, the power consumption of a single GPU can reach more than 350W generally, and the GPUs of some manufacturers still have EDPP (enhanced distributed processing), namely the power consumption of the GPU can reach 2 times or even higher than that of the TDP (time dependent programming protocol) in the moment, the power consumption of the GPU is improved, so that the power consumption of a server applied by an AI (architecture automation) is more than twice that of a common rack server, most of cabinets of the existing data centers are designed for the traditional rack server, the supportable maximum power consumption of the cabinets is relatively low, and the whole machine power supply and the whole cabinet power supply of the AI server are
Most of the existing server power consumption management strategies only carry out dynamic power consumption control on the power consumption of a CPU and a memory so as to achieve a comparative performance power consumption ratio between the CPU and the memory, because for a traditional rack-mounted server and the CPU and the memory are main high-power consumption components in the server, for an Intel X86 server and a built-in ME of the server have the function of dynamically controlling the power consumption of the CPU and the memory, a designer only needs to activate the power consumption in the ME.
For the AI server, the power consumption of the CPU and the memory only accounts for a small portion of the power consumption, and the dynamic power consumption of the AI server cannot be effectively reduced by only dynamically adjusting the power consumption of the CPU and the memory. Especially for GPUBOX, the practical role of the existing solution is greatly compromised when a single compute node is collocated with multiple GPUBOX.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problem, an embodiment of the present invention provides a power consumption management method for a server, including the following steps:
establishing connection between a first BMC in the computing node and a second BMC in the GPU chassis;
collecting the power consumption of the computing node and the temperature of a CPU by using the first BMC and collecting the power consumption of the GPU case by using the second BMC;
the second BMC uploads the collected power consumption of the GPU chassis to the first BMC;
and the first BMC generates corresponding control signals according to the temperature of the CPU, the power consumption of the computing node and the power consumption of the GPU case so as to adjust the power consumption of the computing node and/or the GPU case.
In some embodiments, the first BMC generates a corresponding control signal to adjust the power consumption of the compute node and/or the GPU chassis according to the temperature of the CPU and the power consumption of the compute node and the GPU chassis, further comprising:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
and responding to the temperature of the CPU being larger than the temperature threshold value, the first BMC generates a PWM signal for increasing the rotating speed of the fan of the computing node and sends a temperature control frequency reduction signal to the CPU so as to carry out frequency reduction processing on the CPU.
In some embodiments, the first BMC generates a corresponding control signal to adjust the power consumption of the compute node and/or the GPU chassis according to the temperature of the CPU and the power consumption of the compute node and the GPU chassis, further comprising:
judging whether the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value;
and in response to the fact that the sum of the power consumptions is larger than a power consumption threshold value, the first BMC sends power consumption down-conversion signals to the memory of the computing node and the CPU respectively so as to reduce the power consumptions of the memory and the CPU respectively.
In some embodiments, further comprising:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
in response to the temperature of the CPU being greater than the temperature threshold, the first BMC generating a PWM signal that increases a rotational speed of a fan of the compute node;
in response to the temperature of the CPU being less than the temperature threshold, the first BMC generates a PWM signal that decreases a rotational speed of a fan of the compute node to decrease power consumption of the fan.
In some embodiments, further comprising:
responding to that the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value again in a preset time period after the internal memory and the CPU are subjected to frequency reduction processing, and generating and sending a frequency reduction signal of a corresponding level to the second BMC by the first BMC according to the frequency that the sum of the power is larger than the power consumption threshold value again;
and the second BMC performs frequency reduction processing of corresponding levels on the GPU and the fan in the GPU case according to the frequency reduction signals of corresponding levels.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor executes the program and executes the program to perform the steps of:
establishing connection between a first BMC in the computing node and a second BMC in the GPU chassis;
collecting work of the compute node with the first BMC
The temperature of the CPU and the power consumption of the GPU chassis are collected by the second BMC;
the second BMC uploads the collected power consumption of the GPU chassis to the first BMC
BMC;
And the first BMC generates corresponding control signals according to the temperature of the CPU, the power consumption of the computing node and the power consumption of the GPU case so as to adjust the power consumption of the computing node and/or the GPU case.
In some embodiments, the first BMC generates a corresponding control signal to adjust the power consumption of the compute node and/or the GPU chassis according to the temperature of the CPU and the power consumption of the compute node and the GPU chassis, further comprising:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
and responding to the temperature of the CPU being larger than the temperature threshold value, the first BMC generates a PWM signal for increasing the rotating speed of the fan of the computing node and sends a temperature control frequency reduction signal to the CPU so as to carry out frequency reduction processing on the CPU.
In some embodiments, the first BMC generates a corresponding control signal to adjust the power consumption of the compute node and/or the GPU chassis according to the temperature of the CPU and the power consumption of the compute node and the GPU chassis, further comprising:
judging whether the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value;
and in response to the fact that the sum of the power consumptions is larger than a power consumption threshold value, the first BMC sends power consumption down-conversion signals to the memory of the computing node and the CPU respectively so as to reduce the power consumptions of the memory and the CPU respectively.
In some embodiments, the steps further comprise:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
in response to the temperature of the CPU being greater than the temperature threshold, the first BMC generating a PWM signal that increases a rotational speed of a fan of the compute node;
in response to the temperature of the CPU being less than the temperature threshold, the first BMC generating a PWM signal that reduces a rotational speed of a fan of the compute node to reduce power consumption of the fan;
responding to that the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value again in a preset time period after the internal memory and the CPU are subjected to frequency reduction processing, and generating and sending a frequency reduction signal of a corresponding level to the second BMC by the first BMC according to the frequency that the sum of the power is larger than the power consumption threshold value again;
and the second BMC performs frequency reduction processing of corresponding levels on the GPU and the fan in the GPU case according to the frequency reduction signals of corresponding levels.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of any of the above-described power consumption management methods of the server.
The invention has one of the following beneficial technical effects: the scheme provided by the invention can realize integrated dynamic power consumption management between the computing node and the GPUBOX, and can realize the optimal performance power consumption ratio between the computing node and the GPUBOX.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a power consumption management method of a server according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a server according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
According to an aspect of the present invention, an embodiment of the present invention provides a power consumption management method for a server, as shown in fig. 1, which may include the steps of: s1, establishing a connection between a first BMC in the computing node and a second BMC in the GPU chassis; s2, collecting the power consumption of the computing node and the temperature of the CPU by using the first BMC, and collecting the power consumption of the GPU case by using the second BMC; s3, the second BMC uploads the collected power consumption of the GPU chassis to the first BMC; s4, the first BMC generates corresponding control signals according to the temperature of the CPU, the power consumption of the computing node and the power consumption of the GPU case so as to adjust the power consumption of the computing node and/or the GPU case.
The scheme provided by the invention can realize integrated dynamic power consumption management between the computing node and the GPUBOX, and can realize the optimal performance power consumption ratio between the computing node and the GPUBOX.
In some embodiments, as shown in fig. 2, the system proposed by the present invention may include one or more GPUBOX (GPU chassis), where the power consumption Control core of the entire system is HOST BMC (first BMC) located on the compute node, and the power consumption Control policy acts on the HOST BMC except for the HOST BMC, there is one SWBMC (second BMC) on each GPUBOX, and the role of the SW BMC is mainly two, and 1 is responsible for collecting and monitoring the power consumption of the entire GPUBOX, and is reported to the HOST BMC through Cable via I2C interface, and the SW BMC is configured to execute a power consumption Control command, and the SW BMC may cause FAN Control to trigger the down speed of the GPU BMC FAN through I2C command, so as to achieve reduction of the power consumption of the FAN, and the SW BMC may also Control CP L D of the GPU Board through I2C command, so that CP L D issues a GPU power consumption reduction command, so that the GPU triggers down-reduction of different power consumption levels to reduce the power consumption, and at the SW-BMC may trigger down the GPU bus holddown by GPIO L to avoid the GPU.
The HOST BMC is responsible for monitoring and collecting power consumption of the computing nodes and receiving power consumption reported by each GPUBOX, so that the HOST BMC can obtain the sum of total power consumption of the whole computing nodes and all the GPUBOX, after the HOST BMC obtains the sum of the total power consumption, power consumption management commands can be triggered on the computing nodes and the GPUBOX, for the computing nodes, the HOST BMC can trigger the frequency reduction of a CPU and a memory through controlling a CP L D on a main board to achieve the reduction of the power consumption, the HOST BMC can also regulate and control fan rotating speed of the computing nodes through PWM signals to reduce the power consumption of the system, meanwhile, the HOST BMC can control the SWBMC through a Cable through an I2C command, and then the SW BMC triggers a power consumption management strategy of the computing nodes.
For example, for a cabinet with 12KW power consumption of a single cabinet, it is assumed that a customer needs to place 4 sets of computing nodes + GPUBOX, and the power consumption peak value of each set of computing nodes + GPUBOX may reach 4000W, so that theoretically the peak value of 4 sets of computing nodes + GPUBOX may reach 16KW, most cabinets do not have RMC at present, and since the computing nodes and GPUBOX do not reach the maximum power consumption at the same time under normal conditions, under such a condition, an integrated dynamic power consumption management scheme between the computing nodes and the GPUBOX can be realized through the scheme provided by the invention, so that the optimal performance power consumption ratio between the computing nodes and the GPUBOX is realized.
In some embodiments, the first BMC generates a corresponding control signal to adjust the power consumption of the compute node and/or the GPU chassis according to the temperature of the CPU and the power consumption of the compute node and the GPU chassis, further comprising:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
and responding to the temperature of the CPU being larger than the temperature threshold value, the first BMC generates a PWM signal for increasing the rotating speed of the fan of the computing node and sends a temperature control frequency reduction signal to the CPU so as to carry out frequency reduction processing on the CPU.
Specifically, as shown in fig. 2, when the temperature of the CPU of the computing node is greater than the threshold, no matter how much power is consumed at this time, the CPU is subjected to down-conversion, that is, the BMC sends a signal of thrett L E _ N: GPIO-I3, and simultaneously generates a PWM signal that increases the rotation speed of the fan of the computing node, thereby avoiding the CPU from being over-temperature.
In some embodiments, the first BMC generates a corresponding control signal to adjust the power consumption of the compute node and/or the GPU chassis according to the temperature of the CPU and the power consumption of the compute node and the GPU chassis, further comprising:
judging whether the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value;
and in response to the fact that the sum of the power consumptions is larger than a power consumption threshold value, the first BMC sends power consumption down-conversion signals to the memory of the computing node and the CPU respectively so as to reduce the power consumptions of the memory and the CPU respectively.
Specifically, as shown in fig. 2, when the sum of the power consumptions is greater than the power consumption threshold, the HOST BMC generates PROCHOT _ N and MEMHOT _ N signals to respectively reduce the power consumptions of the CPU and the memory.
In some embodiments, the method further comprises:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
in response to the temperature of the CPU being greater than the temperature threshold, the first BMC generating a PWM signal that increases a rotational speed of a fan of the compute node;
in response to the temperature of the CPU being less than the temperature threshold, the first BMC generates a PWM signal that decreases a rotational speed of a fan of the compute node to decrease power consumption of the fan.
Specifically, when the power consumption is greater than a power consumption threshold, the temperature of the CPU is determined, if the temperature of the CPU is greater than the temperature threshold, the HOST BMC is required to generate a PWM signal for increasing the rotation speed of the fan of the compute node, and if the temperature of the CPU is less than the temperature threshold, the HOST BMC is required to generate a PWM signal for decreasing the rotation speed of the fan of the compute node, so as to decrease the power consumption of the fan.
In some embodiments, further comprising:
responding to that the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value again in a preset time period after the internal memory and the CPU are subjected to frequency reduction processing, and generating and sending a frequency reduction signal of a corresponding level to the second BMC by the first BMC according to the frequency that the sum of the power is larger than the power consumption threshold value again;
and the second BMC performs frequency reduction processing of corresponding levels on the GPU and the fan in the GPU case according to the frequency reduction signals of corresponding levels.
Specifically, when the Host BMC monitors that the total power consumption of a computing node and GPUBOX exceeds 3000W, the CPU and the memory of the computing node are triggered to reduce the power consumption, meanwhile, when the temperature of the CPU is smaller than a temperature threshold value, the Fan Duty of the computing node is limited to be below 60%, when the Host BMC monitors that the total power consumption of the computing node and the GPUBOX exceeds 3000W again, the GPUBOX L1 is triggered to reduce the power consumption, when the Host BMC monitors that the total power consumption of the computing node and the GPUBOX exceeds 3000W again, the GPUBOX L2 is triggered to reduce the power consumption, when the Host BMC monitors that the total power consumption of the computing node and the GPUBOX exceeds 3000W again, the GPUBOX L3 is triggered to reduce the power consumption, and when the Host BMC monitors that the total power consumption of the computing node and the GPUBOX exceeds 3000W again, the PWUBOX is triggered to reduce the power consumption.
The scheme provided by the invention can realize integrated dynamic power consumption management between the computing node and the GPUBOX, and can realize the optimal performance power consumption ratio between the computing node and the GPUBOX.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:
at least one processor 520; and
the memory 510, the memory 510 stores a computer program 511 that is executable on the processor, and the processor 520 executes the program to perform the steps of any of the above power consumption management methods of the server.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the steps of the power consumption management method of any one of the servers as above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct related hardware to implement the methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
In addition, the apparatuses, devices, and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal device, such as a server, and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed by the embodiment of the invention can be applied to any one of the electronic terminal devices in the form of electronic hardware, computer software or a combination of the electronic hardware and the computer software.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be understood that the computer-readable storage media (e.g., memory) herein may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory, by way of example and not limitation, nonvolatile memory may include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory volatile memory may include Random Access Memory (RAM), which may serve as external cache memory, by way of example and not limitation, RAM may be available in a variety of forms, such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link DRAM (S L DRAM, and Direct Rambus RAM (DRRAM).
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof.A computer readable medium includes a computer storage medium and a communication medium including any medium that facilitates transfer of a computer program from one location to another.A storage medium may be any available medium that can be accessed by a general purpose or special purpose computer.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A power consumption management method of a server is characterized by comprising the following steps:
establishing connection between a first BMC in the computing node and a second BMC in the GPU chassis;
collecting the power consumption of the computing node and the temperature of a CPU by using the first BMC and collecting the power consumption of the GPU case by using the second BMC;
the second BMC uploads the collected power consumption of the GPU chassis to the first BMC;
and the first BMC generates corresponding control signals according to the temperature of the CPU, the power consumption of the computing node and the power consumption of the GPU case so as to adjust the power consumption of the computing node and/or the GPU case.
2. The method of claim 1, wherein the first BMC generates corresponding control signals to adjust power consumption of the compute node and/or the GPU chassis based on the temperature of the CPU and power consumption of the compute node and power consumption of the GPU chassis, further comprising:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
and responding to the temperature of the CPU being larger than the temperature threshold value, the first BMC generates a PWM signal for increasing the rotating speed of the fan of the computing node and sends a temperature control frequency reduction signal to the CPU so as to carry out frequency reduction processing on the CPU.
3. The method of claim 1, wherein the first BMC generates corresponding control signals to adjust power consumption of the compute node and/or the GPU chassis based on the temperature of the CPU and power consumption of the compute node and power consumption of the GPU chassis, further comprising:
judging whether the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value;
and in response to the fact that the sum of the power consumptions is larger than a power consumption threshold value, the first BMC sends power consumption down-conversion signals to the memory of the computing node and the CPU respectively so as to reduce the power consumptions of the memory and the CPU respectively.
4. The method of claim 3, further comprising:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
in response to the temperature of the CPU being greater than the temperature threshold, the first BMC generating a PWM signal that increases a rotational speed of a fan of the compute node;
in response to the temperature of the CPU being less than the temperature threshold, the first BMC generates a PWM signal that decreases a rotational speed of a fan of the compute node to decrease power consumption of the fan.
5. The method of claim 3, further comprising:
responding to that the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value again in a preset time period after the internal memory and the CPU are subjected to frequency reduction processing, and generating and sending a frequency reduction signal of a corresponding level to the second BMC by the first BMC according to the frequency that the sum of the power is larger than the power consumption threshold value again;
and the second BMC performs frequency reduction processing of corresponding levels on the GPU and the fan in the GPU case according to the frequency reduction signals of corresponding levels.
6. A computer device, comprising:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of:
establishing connection between a first BMC in the computing node and a second BMC in the GPU chassis;
collecting the power consumption of the computing node and the temperature of a CPU by using the first BMC and collecting the power consumption of the GPU case by using the second BMC;
the second BMC uploads the collected power consumption of the GPU chassis to the first BMC;
and the first BMC generates corresponding control signals according to the temperature of the CPU, the power consumption of the computing node and the power consumption of the GPU case so as to adjust the power consumption of the computing node and/or the GPU case.
7. The device of claim 6, wherein the first BMC is to generate corresponding control signals to adjust power consumption of the compute node and/or the GPU chassis based on the temperature of the CPU and the power consumption of the compute node and the power consumption of the GPU chassis, further comprising:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
and responding to the temperature of the CPU being larger than the temperature threshold value, the first BMC generates a PWM signal for increasing the rotating speed of the fan of the computing node and sends a temperature control frequency reduction signal to the CPU so as to carry out frequency reduction processing on the CPU.
8. The device of claim 6, wherein the first BMC is to generate corresponding control signals to adjust power consumption of the compute node and/or the GPU chassis based on the temperature of the CPU and the power consumption of the compute node and the power consumption of the GPU chassis, further comprising:
judging whether the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value;
and in response to the fact that the sum of the power consumptions is larger than a power consumption threshold value, the first BMC sends power consumption down-conversion signals to the memory of the computing node and the CPU respectively so as to reduce the power consumptions of the memory and the CPU respectively.
9. The apparatus of claim 8, wherein the steps further comprise:
judging whether the temperature of the CPU is greater than a temperature threshold value or not;
in response to the temperature of the CPU being greater than the temperature threshold, the first BMC generating a PWM signal that increases a rotational speed of a fan of the compute node;
in response to the temperature of the CPU being less than the temperature threshold, the first BMC generating a PWM signal that reduces a rotational speed of a fan of the compute node to reduce power consumption of the fan;
responding to that the sum of the power consumption of the computing node and the power consumption of the GPU case is larger than a power consumption threshold value again in a preset time period after the internal memory and the CPU are subjected to frequency reduction processing, and generating and sending a frequency reduction signal of a corresponding level to the second BMC by the first BMC according to the frequency that the sum of the power is larger than the power consumption threshold value again;
and the second BMC performs frequency reduction processing of corresponding levels on the GPU and the fan in the GPU case according to the frequency reduction signals of corresponding levels.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1-5.
CN202010176261.4A 2020-03-13 2020-03-13 Power consumption management method, equipment and medium for server Withdrawn CN111427744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010176261.4A CN111427744A (en) 2020-03-13 2020-03-13 Power consumption management method, equipment and medium for server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010176261.4A CN111427744A (en) 2020-03-13 2020-03-13 Power consumption management method, equipment and medium for server

Publications (1)

Publication Number Publication Date
CN111427744A true CN111427744A (en) 2020-07-17

Family

ID=71546298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010176261.4A Withdrawn CN111427744A (en) 2020-03-13 2020-03-13 Power consumption management method, equipment and medium for server

Country Status (1)

Country Link
CN (1) CN111427744A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064479A (en) * 2021-03-03 2021-07-02 山东英信计算机技术有限公司 Power supply redundancy control system, method and medium of GPU server
CN114035662A (en) * 2021-10-15 2022-02-11 苏州浪潮智能科技有限公司 AI server heat dissipation regulation and control method, system, terminal and storage medium
CN115877938A (en) * 2022-12-23 2023-03-31 摩尔线程智能科技(北京)有限责任公司 Control method, device, equipment, storage medium and program product of GPU
CN117369612A (en) * 2023-12-08 2024-01-09 电子科技大学 Server hardware management system and method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064479A (en) * 2021-03-03 2021-07-02 山东英信计算机技术有限公司 Power supply redundancy control system, method and medium of GPU server
CN114035662A (en) * 2021-10-15 2022-02-11 苏州浪潮智能科技有限公司 AI server heat dissipation regulation and control method, system, terminal and storage medium
CN114035662B (en) * 2021-10-15 2023-07-14 苏州浪潮智能科技有限公司 AI server heat dissipation regulation and control method, system, terminal and storage medium
CN115877938A (en) * 2022-12-23 2023-03-31 摩尔线程智能科技(北京)有限责任公司 Control method, device, equipment, storage medium and program product of GPU
CN117369612A (en) * 2023-12-08 2024-01-09 电子科技大学 Server hardware management system and method
CN117369612B (en) * 2023-12-08 2024-02-13 电子科技大学 Server hardware management system and method

Similar Documents

Publication Publication Date Title
CN111427744A (en) Power consumption management method, equipment and medium for server
US7984311B2 (en) Demand based power allocation
WO2021043300A1 (en) Operation frequency adjustment method for switched power supply, and device
US20080189569A1 (en) Adjusting performance method for multi-core processor
US10560022B2 (en) Setting operating points for circuits in an integrated circuit chip using an integrated voltage regulator power loss model
US10170994B1 (en) Voltage regulators for an integrated circuit chip
KR20040083464A (en) Dram power management
US10410688B2 (en) Managing power state in one power domain based on power states in another power domain
CN108803860A (en) A kind of regulating power consumption method and electronic equipment
CN113826082B (en) Method and equipment for controlling heat dissipation device
US20240152191A1 (en) Cpu performance adjustment method and apparatus, and medium
KR20150083550A (en) Power supply device and micro server having the same
CN113849431A (en) System topology structure switching method, device and medium
CN112114644A (en) Server power supply current sharing method, system, equipment and medium
US8850444B2 (en) System for setting each transfer module in a network device into one of a plurality of standby states based upon the level of traffic
US10230263B2 (en) Adaptive power availability controller
WO2021098497A1 (en) Power supply system, power supply method, power supply apparatus, and terminal device
WO2022110199A1 (en) Power consumption control apparatus, processor, and power consumption control method
CN111562835A (en) Control method and electronic equipment
US20070148019A1 (en) Method and device for connecting several types of fans
US20130138981A1 (en) Power distribution method and server system using the same
CN112084089B (en) Method, device and equipment for determining upper limit of power consumption of data center node and storage medium
US9466982B2 (en) System and method for control of power consumption of information handling system devices
US11233679B2 (en) Phase adjustments for computer nodes
CN113867515B (en) Method, device, terminal and storage medium for automatically adjusting current sharing of server power supply

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200717

WW01 Invention patent application withdrawn after publication