CN116823587A - Graphics processor control method and device, electronic equipment and storage medium - Google Patents

Graphics processor control method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116823587A
CN116823587A CN202310786656.XA CN202310786656A CN116823587A CN 116823587 A CN116823587 A CN 116823587A CN 202310786656 A CN202310786656 A CN 202310786656A CN 116823587 A CN116823587 A CN 116823587A
Authority
CN
China
Prior art keywords
server
power consumption
management controller
baseboard management
power supply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310786656.XA
Other languages
Chinese (zh)
Inventor
苗永威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310786656.XA priority Critical patent/CN116823587A/en
Publication of CN116823587A publication Critical patent/CN116823587A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Power Sources (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a method, a device, an electronic device and a storage medium for controlling a graphics processor, wherein the method is applied to a server, the server comprises a baseboard management controller and the graphics processor, the baseboard management controller is communicated with the graphics processor, and the method comprises the following steps: acquiring current power consumption of a server, a server failure power supply and a server non-failure power supply through a baseboard management controller; and when the number of the server failure power supplies is greater than or equal to the number threshold, controlling the power consumption of the graphic processor through the baseboard management controller according to a comparison result of the maximum power consumption provided by the server failure power supply and the current power consumption of the server. According to the invention, the power consumption of the graphic processor is adjusted in real time according to the actual available power consumption of the server power supply and the current consumed power consumption of the server, so that downtime faults of the server caused by overhigh power consumption can be reduced.

Description

Graphics processor control method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of server technologies, and in particular, to a method and apparatus for controlling a graphics processor, an electronic device, and a storage medium.
Background
At present, artificial intelligence has become a hotspot industry and is permeated into various fields of various industries, and with the rapid development of artificial intelligence algorithms, the computing power demand for supporting the artificial intelligence is rapidly increased. A graphic processor (Graphics Processing Unit, GPU) server is used as a core carrier for intelligent computing, and the stability of the GPU server can ensure the stability of the computing power of an intelligent computing center. With the updating iteration of the GPU, the computing power level is greatly improved, the power consumption and the heat dissipation requirement of the GPU are also increased, and when the actual power consumption of the GPU server exceeds the design limit, the GPU server is subjected to failure phenomena such as downtime, card falling or shutdown restarting, so that the computing failure is caused.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention are directed to providing a graphics processor control method, apparatus, electronic device, and storage medium that overcome or at least partially solve the foregoing problems.
To solve the above-mentioned problem, in a first aspect, an embodiment of the present invention discloses a graphics processor control method, which is applied to a server, the server including a baseboard management controller and a graphics processor, the baseboard management controller being in communication with the graphics processor, the method including:
acquiring current power consumption of a server and an on-site condition of a server power supply through the baseboard management controller; the server power supply in-place condition comprises: power failure, or power failure;
judging whether the number of the server failure power supplies is larger than or equal to a number threshold value or not through the baseboard management controller;
and if the number of the server failure power supplies is greater than or equal to the number threshold, controlling the power consumption of the graphics processor by the baseboard management controller according to a comparison result of the maximum power consumption provided by the server failure power supply and the current power consumption of the server.
Optionally, the server includes a complex programmable logic device and a first pin, the baseboard management controller is in communication with the complex programmable logic device, the complex programmable logic device is in communication with the first pin, and the first pin is connected with the graphics processor;
the power consumption control of the graphics processor is performed by the baseboard management controller according to the comparison result of the maximum power consumption provided by the server without the failure power supply and the current power consumption of the server, and the method comprises the following steps:
judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the server through the baseboard management controller;
if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device through the baseboard management controller;
controlling the first pin low level signal by the complex programmable logic device in response to the graphics processor power consumption limiting instruction;
and controlling the graphics processor to run down by responding to the low-level signal of the first pin through the graphics processor.
Optionally, the method further comprises:
the baseboard management controller sends a power consumption limiting instruction of the graphic processor to the complex programmable logic device, and then the current power consumption of the server is obtained again through the baseboard management controller;
judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the re-acquired server or not through the baseboard management controller;
and if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device again through the baseboard management controller.
Optionally, the method further comprises:
and if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, generating a power consumption limiting failure alarm log through the baseboard management controller.
Optionally, the method further comprises:
after the graphic processor operates at a reduced speed, the in-place condition of a server power supply and the maximum power consumption of the server before the graphic processor operates at the reduced speed are acquired again through the baseboard management controller;
judging whether the failure power supply of the server is recovered to be normal or not through the baseboard management controller;
and if the server failure power supply is recovered to be normal, controlling the power consumption of the graphic processor according to a comparison result of the maximum power consumption provided by the re-acquired server failure power supply and the maximum power consumption of the server before the graphic processor operates at a reduced speed by the baseboard management controller.
Optionally, the controlling, by the baseboard management controller, the power consumption of the graphics processor according to a comparison result between the maximum power consumption that the re-acquired server can provide without the failure power supply and the maximum power consumption of the server before the graphics processor runs at a reduced speed, includes:
judging whether the maximum power consumption provided by the re-acquired server without the failure power supply is larger than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed or not through the baseboard management controller;
if the maximum power consumption provided by the re-acquired server without the failure power supply is greater than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed, sending a graphic processor power consumption recovery instruction to the complex programmable logic device through the baseboard management controller;
controlling the first pin high level signal by the complex programmable logic device in response to the graphics processor power consumption recovery instruction;
and controlling the graphics processor to recover normal operation by responding to the high-level signal of the first pin through the graphics processor.
Optionally, the number threshold is set according to a maximum power consumption of the server and a maximum power consumption that can be provided by a server power supply.
In a second aspect, an embodiment of the present invention discloses a graphics processor control apparatus, where the apparatus is applied to a server, the server includes a baseboard management controller and a graphics processor, the baseboard management controller communicates with the graphics processor, and the apparatus includes:
the first server information acquisition module is used for acquiring the current power consumption of the server and the on-site condition of the power supply of the server through the baseboard management controller; the server power supply in-place condition comprises: power failure, or power failure;
the first server failure power supply monitoring module is used for judging whether the number of the server failure power supplies is larger than or equal to a number threshold value through the baseboard management controller;
and the first graphic processor power consumption control module is used for controlling the power consumption of the graphic processor according to the comparison result of the maximum power consumption provided by the server without the failed power supply and the current power consumption of the server through the baseboard management controller if the number of the failed power supplies of the server is larger than or equal to the number threshold.
Optionally, the server includes a complex programmable logic device and a first pin, the baseboard management controller is in communication with the complex programmable logic device, the complex programmable logic device is in communication with the first pin, and the first pin is connected with the graphics processor;
the first graphics processor power consumption control module is specifically configured to: judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the server through the baseboard management controller; if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device through the baseboard management controller; controlling the first pin low level signal by the complex programmable logic device in response to the graphics processor power consumption limiting instruction; and controlling the graphics processor to run down by responding to the low-level signal of the first pin through the graphics processor.
Optionally, the apparatus further comprises:
the graphics processor speed-down monitoring module is used for the baseboard management controller to send a graphics processor power consumption limiting instruction to the complex programmable logic device and re-acquire the current power consumption of the server through the baseboard management controller; judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the re-acquired server or not through the baseboard management controller; and if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device again through the baseboard management controller.
Optionally, the apparatus further comprises:
and the log generation module is used for generating a power consumption limiting failure alarm log through the baseboard management controller if the maximum power consumption which can be provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server.
Optionally, the apparatus further comprises:
the second server information acquisition module is used for re-acquiring the on-site condition of a server power supply and the maximum power consumption of the server before the graphics processor operates at a reduced speed through the baseboard management controller after the graphics processor operates at a reduced speed;
the second server failure power supply monitoring module is used for judging whether the server failure power supply is recovered to be normal or not through the baseboard management controller;
and the second graphic processor power consumption control module is used for controlling the power consumption of the graphic processor according to the comparison result of the maximum power consumption provided by the re-acquired server without the failure power supply and the maximum power consumption of the server before the graphic processor operates in a speed-down mode through the baseboard management controller if the failure power supply of the server is recovered to be normal.
Optionally, the second graphics processor power consumption control module is specifically configured to: judging whether the maximum power consumption provided by the re-acquired server without the failure power supply is larger than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed or not through the baseboard management controller; if the maximum power consumption provided by the re-acquired server without the failure power supply is greater than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed, sending a graphic processor power consumption recovery instruction to the complex programmable logic device through the baseboard management controller; controlling the first pin high level signal by the complex programmable logic device in response to the graphics processor power consumption recovery instruction; and controlling the graphics processor to recover normal operation by responding to the high-level signal of the first pin through the graphics processor.
Optionally, the number threshold is set according to a maximum power consumption of the server and a maximum power consumption that can be provided by a server power supply.
In a third aspect, an embodiment of the present invention discloses an electronic device, including: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor, implements the steps of the graphics processor control method as described above.
In a fourth aspect of the present invention, embodiments of the present invention disclose a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of a graphics processor control method as described above.
The embodiment of the invention has the following advantages:
acquiring the power supply condition of the GPU server and the power consumption condition of the GPU server in real time through a baseboard management controller (Baseboard Management Controller, BMC) of the GPU server; when the quantity of the ineffective power supplies of the GPU server is monitored to reach a preset quantity threshold, the power consumption of the GPU is controlled according to a comparison result of the maximum power consumption provided by the ineffective power supplies of the GPU server and the current power consumption of the GPU server. The embodiment of the invention adjusts the power consumption of the GPU in real time based on the power supply condition of the GPU server and the power consumption condition of the GPU server, thereby reducing the faults of downtime, card falling or shutdown restarting and the like of the GPU server caused by overhigh power consumption.
Drawings
FIG. 1 is a flow chart of steps of a method for controlling a graphics processor according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of another method for controlling a graphics processor according to an embodiment of the present invention;
FIG. 3 is a flowchart of an automatic triggering of GPU power consumption limits provided by an embodiment of the present invention;
FIG. 4 is a flowchart for automatically triggering GPU power consumption recovery according to an embodiment of the present invention;
FIG. 5 is a block diagram of a graphics processor control apparatus according to an embodiment of the present invention;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention;
fig. 7 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
In order to ensure stable power supply of the GPU server, the prior scheme mainly comprises a design for increasing the maximum output power consumption of a power supply module, a design for a redundant power supply module and a design for multi-path power supply input redundancy. However, when the power modules have batch fault hidden trouble, the power failure of the multiple power modules may result in the failure of the overall power supply for ensuring the system calculation power. One of the core concepts of the embodiment of the invention is that the real-time grabbing of the overall power consumption of the BMC and the judgment of the on-site condition of the power supply module are adopted to confirm the power consumption satisfaction degree output by the power supply module in the current state, the fault alarm state of the power supply module and the automatic triggering of the fixed pin level control by the complex programmable logic device (Complex Programmable Logic Device, CPLD) are adopted to realize the GPU power consumption and the frequency protection, realize the automatic speed regulation when the power supply module of the GPU server does not meet the power supply requirement of the overall machine, avoid the occurrence of faults such as card falling, downtime, shutdown restarting and the like, and ensure the continuity and the stability of the program calculation of the GPU server.
Referring to fig. 1, a flowchart illustrating steps of a method for controlling a graphics processor according to an embodiment of the present invention is shown, where the method is applied to a server, and the server includes a baseboard management controller and a graphics processor, and the baseboard management controller communicates with the graphics processor, and the method specifically may include the following steps:
step 101, obtaining current power consumption of a server and an on-site condition of a server power supply through the baseboard management controller; the server power supply in-place condition comprises: the power supply fails, or the power supply does not fail.
The BMC chip is generally built in the server, and a manager can monitor or schedule the server by accessing the BMC system. The server includes slots, and the GPU may access the server by plugging into a particular slot on the server. The BMC in the server is communicatively coupled to the GPU.
The BMC can acquire the current power consumption of the server and the on-site condition of the power supply of the server in real time. Server power includes power that the server is using and power that the server can use but is not currently using, such as server backup power. The BMC may obtain the on-bit status of these power supplies. The in-place conditions of the power supply include: the power supply fails, or the power supply does not fail. The failure of the power supply represents that the power supply can not normally supply power to the server, and the failure of the power supply represents that the power supply can normally supply power to the server. For example, the server power has a total of 4, the BMC detects that 1 of the power has failed, leaving 3 power to fail.
Step 102, judging whether the number of the server failure power supplies is greater than or equal to a number threshold value by the baseboard management controller.
The number threshold may be a predetermined value. The number threshold may be set empirically or based on the maximum power consumption of the server and the maximum power consumption that the server power supply can provide. For example, the maximum power consumption of the server is 400W, the server comprises 8 power supplies, and the maximum power consumption which can be provided by each power supply is 100W, namely, as long as 4 of the 8 power supplies of the server are not failed, the normal operation of the server can be met by the non-failed power supplies. At this time, the number threshold may be set to 5.
After the BMC obtains the server power supply in-place condition, the quantity of the server failure power supplies can be counted, and whether the quantity of the server failure power supplies is larger than or equal to a quantity threshold value is judged.
And step 103, if the number of the server failure power supplies is greater than or equal to the number threshold, performing power consumption control on the graphics processor by the baseboard management controller according to a comparison result of the maximum power consumption provided by the server failure power supply and the current power consumption of the server.
When the number of the server failure power supplies is greater than or equal to the number threshold, the BMC can count the maximum power consumption which can be provided by the server failure power supplies, and control the power consumption of the GPU according to the comparison result of the maximum power consumption which can be provided by the server failure power supplies and the current power consumption of the server. For example, when the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the server, the GPU power consumption can be reduced so as to reduce the power consumption of the server, so that downtime caused by excessive power consumption of the server is avoided; when the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the server, the GPU can be normally operated. Methods to reduce GPU power consumption may be to reduce GPU running rate, turn off some unimportant threads, and so on. The invention does not limit the specific method for reducing the power consumption of the GPU.
In one embodiment, after the GPU power consumption is reduced, the server power supply in-place condition can be monitored in real time, and if the failed power supplies are recovered to be normal and the number of the failed power supplies is smaller than the number threshold, the GPU can be recovered to be in normal operation.
According to the embodiment of the invention, the power consumption of the graphic processor is adjusted in real time according to the actual available power consumption of the server power supply and the current consumed power consumption of the server, so that downtime faults of the server caused by overhigh power consumption can be reduced.
Referring to fig. 2, a flowchart illustrating steps of another method for controlling a graphics processor according to an embodiment of the present invention is shown, where the method is applied to a server, the server includes BMC, GPU, CPLD and a first pin, the BMC communicates with the GPU and the CPLD, the CPLD communicates with the first pin, and the first pin is connected with the GPU, and the method specifically includes the following steps:
step 201, obtaining current power consumption of a server and an on-site condition of a server power supply through a baseboard management controller; the server power on-site situation includes: the power supply fails, or the power supply does not fail.
Step 202, determining, by the baseboard management controller, whether the number of server failure power supplies is greater than or equal to a number threshold.
CPLD, complex programmable logic device, is a digital integrated circuit with logic function built by users according to their own needs.
The first pin is an additional pin set on the GPU slot of the server, after the GPU is inserted into the server slot, the server can communicate with the GPU through the first pin, and the first pin can be used for controlling the running power consumption of the GPU. The first pin is a pin based on the high speed serial computer expansion bus standard (Peripheral Component Interconnect express, PCIe) specification.
Steps 201 to 202 of the embodiment of the present invention are similar to steps 101 to 102, and are not repeated here.
Step 203, if the number of the failed power supplies of the server is greater than or equal to the number threshold, the baseboard management controller determines whether the maximum power consumption provided by the server without the failed power supplies is greater than or equal to the current power consumption of the server.
And 204, if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device through the baseboard management controller.
In step 205, the first pin low level signal is controlled by the complex programmable logic device in response to the graphics processor power consumption limit instruction.
Step 206, controlling, by the graphics processor, the graphics processor to run down in response to the first pin low level signal.
If the number of the server failure power supplies is greater than or equal to the number threshold, the BMC can judge whether the maximum power consumption provided by the server failure power supply is greater than or equal to the current power consumption of the server, and if the maximum power consumption provided by the server failure power supply is greater than or equal to the current power consumption of the server, the GPU continues to operate normally; if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the server, the BMC can send a GPU power consumption limiting instruction to the CPLD, the CPLD can control a first pin low-level signal after receiving the GPU power consumption limiting instruction, the first pin is connected with the GPU, and the GPU responds to the first pin low-level signal and starts to run at a reduced speed. The GPU may be set to slow down by 25%, or slow down by 50%, etc. in response to the first pin low level signal. The specific speed reduction requirement can be set according to actual requirements.
According to the embodiment of the invention, the first pin is additionally arranged at the connection part of the server and the GPU, and the power consumption of the server is reduced by controlling the level signal of the first pin through the out-of-band management of the BMC and the CPLD, so that the GPU is controlled to run at a reduced speed.
In one embodiment, the embodiment of the present invention further includes: the baseboard management controller sends a power consumption limiting instruction of the graphic processor to the complex programmable logic device, and then the current power consumption of the server is obtained again through the baseboard management controller; judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the server obtained again through the baseboard management controller; and if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device again through the baseboard management controller.
In one embodiment, the embodiment of the present invention further includes: if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, generating a power consumption limiting failure alarm log through the baseboard management controller.
After the BMC sends the GPU power consumption limiting instruction to the CPLD, the BMC can further detect whether the GPU has reduced power consumption or not, and judge whether the maximum power consumption provided by the server without the failure power supply can meet the current power consumption of the server after the GPU is reduced, if so, the GPU continues to be reduced in speed; if the power consumption limiting failure alarm log cannot be met, the BMC generates the power consumption limiting failure alarm log, and the BMC sends a GPU power consumption limiting instruction to the CPLD again, and the CPLD responds to the GPU power consumption limiting instruction and controls the first pin low-level signal; the GPU responds to the low-level signal of the first pin to perform the speed reduction again.
In one embodiment, the embodiment of the present invention further includes: after the graphic processor operates at a reduced speed, the in-place condition of a server power supply and the maximum power consumption of the server before the graphic processor operates at the reduced speed are re-acquired through the baseboard management controller; judging whether the failure power supply of the server is recovered to be normal or not through the baseboard management controller; if the server failure power supply is recovered to be normal, the power consumption of the graphic processor is controlled by the baseboard management controller according to the comparison result of the maximum power consumption provided by the re-acquired server failure power supply and the maximum power consumption of the server before the graphic processor is in the speed-down operation.
After the GPU is in the down running state, the BMC can continuously monitor the on-site condition of the server power supply, if the detected failure power supply is detected, the BMC can reckon the failure power supply, and according to the comparison result of the maximum power consumption provided by the failure power supply and the maximum power consumption of the server before the graphics processor is in the down running state, the BMC can control the power consumption of the graphics processor
In one embodiment, by the baseboard management controller, according to a comparison result between the maximum power consumption provided by the re-acquired server without the failure power supply and the maximum power consumption of the server before the graphics processor runs at a reduced speed, the specific steps of performing power consumption control on the graphics processor may include: judging whether the maximum power consumption provided by the server which is obtained again and has no failure power supply is larger than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed through the baseboard management controller; if the maximum power consumption provided by the re-acquired server without the failure power supply is greater than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed, sending a graphic processor power consumption recovery instruction to the complex programmable logic device through the baseboard management controller; responding to a power consumption recovery instruction of the graphic processor through the complex programmable logic device, and controlling a first pin high-level signal; and controlling the graphics processor to resume normal operation by responding to the high-level signal of the first pin through the graphics processor.
If the server failure power supply is recovered to be normal, the BMC can judge whether the maximum power consumption which can be provided by the server which is obtained again and is not failed is larger than or equal to the maximum power consumption of the server before the GPU is in the speed-down operation; if the maximum power consumption provided by the server which is obtained again and is not provided by the failure power supply is greater than or equal to the maximum power consumption of the server before the GPU is in the speed-down operation, the BMC sends a GPU power consumption recovery instruction to the CPLD; the CPLD responds to a GPU power consumption recovery instruction and controls a first pin high-level signal; and the GPU responds to the high-level signal of the first pin and resumes normal operation. If the maximum power consumption provided by the server which is obtained again and is not provided by the failure power supply is smaller than the maximum power consumption of the server before the GPU operates in a speed-reducing mode, the GPU maintains low-speed operation.
Referring to fig. 3, a flowchart for automatically triggering GPU power consumption limitation according to an embodiment of the present invention is shown. The BMC automatically monitors the power supply in-place condition of the server and the power consumption condition of the whole server. If the BMC monitors that the failure power supply exists, the BMC judges whether the maximum power consumption provided by the failure power supply can meet the operation of the server, and if so, the BMC generates an alarm log of the failure power supply module; if not, the BMC generates an alarm log of the failure power supply module and simultaneously sends a GPU power consumption limiting instruction to the CPLD. The CPLD controls a B30 pin low-level signal of the GPU slot to enable the GPU to run at a reduced speed, and the B30 pin is the first pin. The BMC continuously monitors whether the maximum power consumption provided by the non-failure power supply can meet the operation condition of the server, and if so, the GPU operates with low power consumption; if not, the BMC generates a power consumption limiting failure alarm log and sends a GPU power consumption limiting instruction to the CPLD again.
Referring to fig. 4, a flowchart of automatically triggering GPU power consumption recovery according to an embodiment of the present invention is shown. The BMC automatically monitors the power supply in-place condition of the server and the power consumption condition of the whole server. If the BMC monitors that the failure power supply is recovered to be normal, the BMC judges whether the maximum power consumption provided by the failure power supply can meet the maximum power consumption before the GPU is decelerated, and if the failure power supply cannot meet the maximum power consumption, the GPU continues to operate with low power consumption; if so, the BMC can send a GPU power consumption recovery instruction to the CPLD so that the CPLD controls a B30 pin high-level signal, and the GPU is recovered to normal operation. Meanwhile, the BMC can continuously monitor whether the power consumption of the GPU is recovered, if so, the server is determined to normally operate; if not, the BMC generates a power consumption limiting failure alarm log.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Referring to fig. 5, there is shown a block diagram of a graphics processor control apparatus according to an embodiment of the present invention, where the apparatus is applied to a server, and the server includes a baseboard management controller and a graphics processor, and the baseboard management controller communicates with the graphics processor, and the apparatus may specifically include the following modules:
a first server information obtaining module 301, configured to obtain, by using the baseboard management controller, current power consumption of a server and an on-site condition of a server power supply; the server power supply in-place condition comprises: power failure, or power failure;
a first server failure power supply monitoring module 302, configured to determine, by the baseboard management controller, whether the number of server failure power supplies is greater than or equal to a number threshold;
and the first graphics processor power consumption control module 303 is configured to perform power consumption control on the graphics processor according to a comparison result between the maximum power consumption that the server does not fail to provide and the current power consumption of the server through the baseboard management controller if the number of the server fail power supplies is greater than or equal to the number threshold.
Optionally, the server includes a complex programmable logic device and a first pin, the baseboard management controller is in communication with the complex programmable logic device, the complex programmable logic device is in communication with the first pin, and the first pin is connected with the graphics processor;
the first graphics processor power consumption control module 303 is specifically configured to: judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the server through the baseboard management controller; if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device through the baseboard management controller; controlling the first pin low level signal by the complex programmable logic device in response to the graphics processor power consumption limiting instruction; and controlling the graphics processor to run down by responding to the low-level signal of the first pin through the graphics processor.
Optionally, the apparatus further comprises:
the graphics processor speed-down monitoring module is used for the baseboard management controller to send a graphics processor power consumption limiting instruction to the complex programmable logic device and re-acquire the current power consumption of the server through the baseboard management controller; judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the re-acquired server or not through the baseboard management controller; and if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device again through the baseboard management controller.
Optionally, the apparatus further comprises:
and the log generation module is used for generating a power consumption limiting failure alarm log through the baseboard management controller if the maximum power consumption which can be provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server.
Optionally, the apparatus further comprises:
the second server information acquisition module is used for re-acquiring the on-site condition of a server power supply and the maximum power consumption of the server before the graphics processor operates at a reduced speed through the baseboard management controller after the graphics processor operates at a reduced speed;
the second server failure power supply monitoring module is used for judging whether the server failure power supply is recovered to be normal or not through the baseboard management controller;
and the second graphic processor power consumption control module is used for controlling the power consumption of the graphic processor according to the comparison result of the maximum power consumption provided by the re-acquired server without the failure power supply and the maximum power consumption of the server before the graphic processor operates in a speed-down mode through the baseboard management controller if the failure power supply of the server is recovered to be normal.
Optionally, the second graphics processor power consumption control module is specifically configured to: judging whether the maximum power consumption provided by the re-acquired server without the failure power supply is larger than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed or not through the baseboard management controller; if the maximum power consumption provided by the re-acquired server without the failure power supply is greater than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed, sending a graphic processor power consumption recovery instruction to the complex programmable logic device through the baseboard management controller; controlling the first pin high level signal by the complex programmable logic device in response to the graphics processor power consumption recovery instruction; and controlling the graphics processor to recover normal operation by responding to the high-level signal of the first pin through the graphics processor.
Optionally, the number threshold is set according to a maximum power consumption of the server and a maximum power consumption that can be provided by a server power supply.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
Referring to fig. 6, a block diagram of an electronic device 40 according to an embodiment of the present invention is shown, where the electronic device 40 includes:
the processor 401, the memory 402, and the computer program 4021 stored in the memory 402 and capable of running on the processor 401, where the computer program 4021 when executed by the processor 401 implements the processes of the foregoing embodiments of the graphics processor control method, and the same technical effects are achieved, so that repetition is avoided and detailed description is omitted herein.
Referring to fig. 7, a block diagram of a computer readable storage medium 50 according to an embodiment of the present invention is shown, where a computer program 501 is stored on the computer readable storage medium 50, and when the computer program 501 is executed by a processor, the processes of the above-mentioned embodiment of the graphics processor control method are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The above description of the method, the device, the electronic device and the storage medium for controlling the graphics processor provided by the invention applies specific examples to illustrate the principles and the implementation of the invention, and the description of the above examples is only used to help understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. A graphics processor control method, characterized by being applied to a server, the server including a baseboard management controller and a graphics processor, the baseboard management controller being in communication with the graphics processor, the method comprising:
acquiring current power consumption of a server and an on-site condition of a server power supply through the baseboard management controller; the server power supply in-place condition comprises: power failure, or power failure;
judging whether the number of the server failure power supplies is larger than or equal to a number threshold value or not through the baseboard management controller;
and if the number of the server failure power supplies is greater than or equal to the number threshold, controlling the power consumption of the graphics processor by the baseboard management controller according to a comparison result of the maximum power consumption provided by the server failure power supply and the current power consumption of the server.
2. The method of claim 1, wherein the server comprises a complex programmable logic device and a first pin, the baseboard management controller in communication with the complex programmable logic device, the complex programmable logic device in communication with the first pin, the first pin in connection with the graphics processor;
the power consumption control of the graphics processor is performed by the baseboard management controller according to the comparison result of the maximum power consumption provided by the server without the failure power supply and the current power consumption of the server, and the method comprises the following steps:
judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the server through the baseboard management controller;
if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device through the baseboard management controller;
controlling the first pin low level signal by the complex programmable logic device in response to the graphics processor power consumption limiting instruction;
and controlling the graphics processor to run down by responding to the low-level signal of the first pin through the graphics processor.
3. The method according to claim 2, wherein the method further comprises:
the baseboard management controller sends a power consumption limiting instruction of the graphic processor to the complex programmable logic device, and then the current power consumption of the server is obtained again through the baseboard management controller;
judging whether the maximum power consumption provided by the server without the failure power supply is greater than or equal to the current power consumption of the re-acquired server or not through the baseboard management controller;
and if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, sending a power consumption limiting instruction of the graphic processor to the complex programmable logic device again through the baseboard management controller.
4. A method according to claim 3, characterized in that the method further comprises:
and if the maximum power consumption provided by the server without the failure power supply is smaller than the current power consumption of the re-acquired server, generating a power consumption limiting failure alarm log through the baseboard management controller.
5. The method according to claim 2, wherein the method further comprises:
after the graphic processor operates at a reduced speed, the in-place condition of a server power supply and the maximum power consumption of the server before the graphic processor operates at the reduced speed are acquired again through the baseboard management controller;
judging whether the failure power supply of the server is recovered to be normal or not through the baseboard management controller;
and if the server failure power supply is recovered to be normal, controlling the power consumption of the graphic processor according to a comparison result of the maximum power consumption provided by the re-acquired server failure power supply and the maximum power consumption of the server before the graphic processor operates at a reduced speed by the baseboard management controller.
6. The method according to claim 5, wherein the controlling the power consumption of the graphics processor by the baseboard management controller according to a comparison result between the maximum power consumption that the re-acquired server can provide without the failure power supply and the maximum power consumption of the server before the graphics processor is running down, comprises:
judging whether the maximum power consumption provided by the re-acquired server without the failure power supply is larger than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed or not through the baseboard management controller;
if the maximum power consumption provided by the re-acquired server without the failure power supply is greater than or equal to the maximum power consumption of the server before the graphic processor operates at a reduced speed, sending a graphic processor power consumption recovery instruction to the complex programmable logic device through the baseboard management controller;
controlling the first pin high level signal by the complex programmable logic device in response to the graphics processor power consumption recovery instruction;
and controlling the graphics processor to recover normal operation by responding to the high-level signal of the first pin through the graphics processor.
7. The method of claim 1, wherein the number threshold is set based on a maximum power consumption of the server and a maximum power consumption that a server power supply can provide.
8. A graphics processor control apparatus for use with a server, the server comprising a baseboard management controller and a graphics processor, the baseboard management controller in communication with the graphics processor, the apparatus comprising:
the first server information acquisition module is used for acquiring the current power consumption of the server and the on-site condition of the power supply of the server through the baseboard management controller; the server power supply in-place condition comprises: power failure, or power failure;
the first server failure power supply monitoring module is used for judging whether the number of the server failure power supplies is larger than or equal to a number threshold value through the baseboard management controller;
and the first graphic processor power consumption control module is used for controlling the power consumption of the graphic processor according to the comparison result of the maximum power consumption provided by the server without the failed power supply and the current power consumption of the server through the baseboard management controller if the number of the failed power supplies of the server is larger than or equal to the number threshold.
9. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor, implements the steps of the graphics processor control method as claimed in any one of claims 1 to 7.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of the graphics processor control method according to any one of claims 1-7.
CN202310786656.XA 2023-06-29 2023-06-29 Graphics processor control method and device, electronic equipment and storage medium Pending CN116823587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310786656.XA CN116823587A (en) 2023-06-29 2023-06-29 Graphics processor control method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310786656.XA CN116823587A (en) 2023-06-29 2023-06-29 Graphics processor control method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116823587A true CN116823587A (en) 2023-09-29

Family

ID=88142584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310786656.XA Pending CN116823587A (en) 2023-06-29 2023-06-29 Graphics processor control method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116823587A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331425A (en) * 2023-12-01 2024-01-02 苏州元脑智能科技有限公司 Power consumption management system, power consumption management method, storage medium, and electronic device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331425A (en) * 2023-12-01 2024-01-02 苏州元脑智能科技有限公司 Power consumption management system, power consumption management method, storage medium, and electronic device
CN117331425B (en) * 2023-12-01 2024-03-22 苏州元脑智能科技有限公司 Power consumption management system, power consumption management method, storage medium, and electronic device

Similar Documents

Publication Publication Date Title
CN111752776B (en) Cyclic power-on and power-off test method and system for server
US9304562B2 (en) Server rack system and power management method applicable thereto
US20060041767A1 (en) Methods, devices and computer program products for controlling power supplied to devices coupled to an uninterruptible power supply (UPS)
CN111475288A (en) Server and power supply protection system thereof
CN116823587A (en) Graphics processor control method and device, electronic equipment and storage medium
JP2010524071A (en) Computer program, system, and method for thresholding system power loss notification in a data processing system
WO2023029375A1 (en) Power source consumption management apparatus for four-way server
CN104699215A (en) Power supply protection system and power supply protection method
US7132822B1 (en) Multi-processor restart stabilization system and method
CN115686935A (en) Data backup method, computer device and storage medium
US20050086460A1 (en) Apparatus and method for wakeup on LAN
CN113568707B (en) Computer control method and system for ocean platform based on container technology
CN111309132B (en) Method for multi-gear power supply redundancy of server
CN116755542B (en) Whole machine power consumption reduction method, system, substrate management controller and server
CN112527570B (en) I2C communication recovery method, device, equipment and computer readable storage medium
CN111984471B (en) Cabinet power BMC redundancy management system and method
CN108459984A (en) A kind of cabinet I2C buses deadlock treatment method, system, medium and equipment
CN112433580A (en) Fan control method and device, computer equipment and storage medium
CN112732058A (en) Multi-node server power failure protection apparatus, method and readable storage medium
CN109917895B (en) Control device and control method for voltage regulation module VRM
CN111880992A (en) Monitoring and maintaining method for controller state in storage device
CN107276832B (en) Method and device for improving communication reliability of PSU and system
CN110671350A (en) Method and system for storing speed regulation of double-control fan
CN115904050A (en) Power control system and method for preventing power failure shutdown of server
CN109491867A (en) A kind of communication automatic recovery method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination