CN117331425A - Power consumption management system, power consumption management method, storage medium, and electronic device - Google Patents
Power consumption management system, power consumption management method, storage medium, and electronic device Download PDFInfo
- Publication number
- CN117331425A CN117331425A CN202311634162.6A CN202311634162A CN117331425A CN 117331425 A CN117331425 A CN 117331425A CN 202311634162 A CN202311634162 A CN 202311634162A CN 117331425 A CN117331425 A CN 117331425A
- Authority
- CN
- China
- Prior art keywords
- power consumption
- server
- power
- consumption
- power supply
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007726 management method Methods 0.000 title claims abstract description 123
- 238000012544 monitoring process Methods 0.000 claims abstract description 71
- 230000002159 abnormal effect Effects 0.000 claims abstract description 52
- 238000000034 method Methods 0.000 claims description 40
- 230000006870 function Effects 0.000 claims description 35
- 230000009467 reduction Effects 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 16
- 230000008093 supporting effect Effects 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 11
- 230000001276 controlling effect Effects 0.000 claims description 9
- 238000012423 maintenance Methods 0.000 claims description 4
- 230000005611 electricity Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 230000008859 change Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000005856 abnormality Effects 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000001976 improved effect Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000017525 heat dissipation Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 229920002492 poly(sulfone) Polymers 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/30—Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3293—Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3296—Power saving characterised by the action undertaken by lowering the supply or operating voltage
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Power Sources (AREA)
Abstract
The embodiment of the application provides a power consumption management system, a power consumption management method, a storage medium and electronic equipment, wherein the system comprises: the power consumption monitoring unit is used for monitoring input power consumption provided by N power supply units in the power supply unit corresponding to the server and consumption power consumption corresponding to the power utilization unit in the server, and the power consumption control unit is connected with the power consumption monitoring unit and used for performing power consumption control on sub-consumption power consumption corresponding to different types of components in the power utilization unit according to the input power consumption, the consumption power consumption and a preset adjustment priority; the power consumption protection unit is connected with the power consumption control unit and is used for recording the target value of power consumption before at least one abnormal power supply unit occurs under the condition that at least one power supply unit among the N power supply units is abnormal, and comparing the target value with the input value of target input power consumption corresponding to all power supplies except at least one power supply unit among the N power supply units so as to determine whether the power consumption of the server is reduced.
Description
Technical Field
The embodiment of the application relates to the technical field of servers, in particular to a power consumption management system, a power consumption management method, a storage medium and electronic equipment.
Background
In the fast development years of AI, big data, deep operation and accelerated learning industries, the industries are putting up high-efficiency and low-cost high-efficiency data management and operation modes to obtain the maximum return rate. Therefore, the GPU (Graphics Processing Unit, graphics processor, GPU for short) server is rapidly rising in this surge, and is very rapidly becoming the most widely applied heterogeneous acceleration computing solution in the artificial intelligence field, with the greatest investment in various industries and the optimal yield ratio. The GPU server combines complete parallel computing capability and excellent logic control advantages of a CPU (Central Processing Unit, a central processing unit, CPU for short), can meet the requirements of an actual application scene more perfectly, and meet the environments of application of an artificial intelligent market and various deep learning and reasoning applications, but the existing abnormal protection of the power supply of the GPU server only aims at a single server and cannot support the power supply protection of a distributed power consumption complete cabinet.
Aiming at the problem that the power consumption of a server cannot be dynamically managed on the premise of ensuring the performance of the server in the related art, no effective solution is proposed at present.
Disclosure of Invention
The embodiment of the application provides a power consumption management system, a power consumption management method, a storage medium and electronic equipment, which at least solve the problem that the power consumption of a server cannot be dynamically managed on the premise of ensuring the performance of the server in the related technology.
According to one embodiment of the present application, there is provided a power consumption management system including: the power consumption monitoring unit is used for monitoring input power consumption provided by N power supply units in the power supply unit corresponding to the server and consumption power consumption corresponding to the power utilization unit in the server, wherein the power utilization unit at least comprises: the power consumption monitoring method comprises the steps of monitoring a first type of components of power consumption through a power management engine in a server and a second type of components of power consumption through a power limiting function module of the server, wherein the first type of components are components for supporting the server to carry out system logic control, the second type of components are components for executing parallel computation by the server, and N is a positive integer; the power consumption control unit is connected with the power consumption monitoring unit and is used for controlling the power consumption of the sub-consumption power consumption corresponding to different types of components in the power utilization unit according to the input power consumption, the consumption power consumption and a preset adjustment priority; and the power consumption protection unit is connected with the power consumption control unit and is used for recording the target value of the consumed power consumption before the abnormal at least one power supply unit occurs under the condition that the at least one power supply unit is abnormal in the N power supply units, and comparing the target value with the input value of the target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units so as to determine whether the consumed power consumption of the server is reduced.
In one exemplary embodiment, the power consumption management system includes: a power management bus; one end of the power management bus is connected with the N power supply units, and the other end of the power management bus is connected with the power consumption monitoring unit and used for acquiring the running state information and the sub-input power consumption corresponding to each power supply in the N power supply units.
In one exemplary embodiment, the power consumption monitoring unit includes: the first monitoring subunit is connected with the power management engine through a bus and is used for determining first sub-consumption power consumption corresponding to the first type component according to engine data in the power management engine under the condition that the server is in an operation state; the second monitoring subunit is connected with the power limiting function module through a preset communication channel and is used for determining second sub-consumption power consumption according to the running condition of the second type component under the condition that the server is in a running state, wherein the second type component is a module consisting of M graphic processors, and M is a positive integer.
In an exemplary embodiment, the power consumption protection measurement unit further includes: and the load reducing sub-unit is used for sending a strong load reducing instruction to the power consumption control unit when the target value is larger than the input value of target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units, wherein the strong load reducing instruction is used for indicating that the consumption power consumption corresponding to the target value is regulated to be smaller than or equal to the target consumption power consumption of the target input power consumption.
In an exemplary embodiment, the power consumption protection unit further includes: a computing subunit, configured to determine, when the preset adjustment priority is obtained, a power consumption range in which the second type component is allowed to reduce and an operation power consumption required by the first type component to operate normally, where the preset adjustment priority packet at least includes: the first class component corresponds to the first priority, and the second class component corresponds to the second priority.
In an exemplary embodiment, the power consumption protection unit further includes: and the notification subunit is used for sending first notification information carrying the operation power consumption to the power management engine and sending second notification information carrying the power consumption range to the power limiting functional module under the condition that the calculation subunit completes the calculation of the power consumption range and the operation power consumption.
In an exemplary embodiment, the power consumption control unit further includes: the first control subunit is used for controlling the reduction of the first sub-consumption power consumption corresponding to the first type of component according to the first notification information; and the second control subunit is used for controlling the reduction of the second sub-consumption power consumption corresponding to the second type of component according to the second notification information.
According to another embodiment of the present application, there is provided a power consumption management method including: n power supply units in the power supply units corresponding to the server and the power utilization units in the server are monitored through the power consumption monitoring units, so that input power consumption corresponding to the N power supply units and consumption power consumption corresponding to the power utilization units are obtained; wherein, the electricity consumption unit includes at least: the power consumption monitoring method comprises the steps of monitoring a first type of components of power consumption through a power management engine in a server and a second type of components of power consumption through a power limiting function module of the server, wherein the first type of components are components for supporting the server to carry out system logic control, the second type of components are components for executing parallel computation by the server, and N is a positive integer; performing power consumption control on sub-consumption power consumption corresponding to different types of components in the power utilization unit according to the input power consumption, the consumption power consumption and a preset adjustment priority; and under the condition that the power consumption control is executed and at least one power supply unit is determined to be abnormal, recording a target value of the power consumption before the at least one power supply unit is abnormal, and comparing the target value with the input values of the target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units to determine whether the power consumption of the server is reduced.
In an exemplary embodiment, comparing the target value with the input values of target input power consumption corresponding to all power sources except the at least one power source unit in the N power source units to determine whether to reduce the power consumption of the server includes: determining to reduce the power consumption of the server if the target value is greater than the input value; and determining not to perform reduction of the consumption power consumption of the server in the case that the target value is less than or equal to the input value.
In an exemplary embodiment, after determining to reduce the consumption power consumption of the server, the method further comprises: determining a difference between the target value and the input value; determining to execute a first reduction strategy on the server under the condition that the difference value is larger than a preset threshold value, wherein the first reduction strategy is used for indicating that the graphics processors with the target number are selected from M graphics processors corresponding to the second type component are in a state of releasing operation; and under the condition that the difference value is smaller than or equal to a preset threshold value, determining to execute a second reduction strategy on the server, wherein the second reduction strategy is used for indicating to reduce the number of parallel computing tasks corresponding to the second type of components so as to reduce second sub-consumption.
In an exemplary embodiment, the method further comprises: determining operation information of the server after reduction under the condition that the consumption power consumption of the server is reduced is determined; and determining the type of information sent to the target object according to the running information.
In an exemplary embodiment, determining the type of information sent to the target object according to the running information includes: when the operation information indicates that a server is in normal operation, a first message with a prompt information type is sent to a target object associated with the server, wherein the first message is used for indicating duration of continuous operation of the server when at least one power supply unit is abnormal; and under the condition that the running information indicates that the server is in abnormal running, sending a second message with the information type of failure to a target object associated with the server, wherein the second message is used for indicating the reason of the abnormal running of the server.
In an exemplary embodiment, the second message includes at least one of: a first type component supporting system logic control in the server fails; the second type of component in the server performing the parallel computation fails.
In an exemplary embodiment, after sending a second message with a failed information type to the target object associated with the server, the method further includes: determining the number of times of occurrence of the same second message after a preset time period; determining to initiate a maintenance task of the server under the condition that the times are larger than preset times; and under the condition that the times are smaller than or equal to the preset times, determining the second message as a fault message with reduced consumption power consumption, and recording the second message in the server.
In an exemplary embodiment, after the power consumption monitoring unit monitors N power supply units in the power supply units corresponding to the server and the power consumption units in the server to obtain the input power consumption corresponding to the N power supply units and the consumption power consumption corresponding to the power consumption units, the method further includes: comparing the input power consumption with a last recorded historical input power consumption in the server; determining that the power supply unit is abnormal under the condition that the input power consumption is smaller than the historical input power consumption; and determining that the power supply unit is not abnormal in the case where the input power consumption is equal to or greater than the historical input power consumption.
According to a further embodiment of the present application, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the present application, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the method and the device, the power consumption monitoring unit is used for monitoring the power consumption of the first type of components supporting the system logic control of the server and the second type of components for executing parallel calculation by the server, under the condition that the input power consumption corresponding to the power supply unit corresponding to the server is known, the input power consumption is compared with the monitored power consumption, whether the current power supply unit meets the power consumption requirement of the server is determined, and then when the power supply unit is abnormal, the power consumption protection unit and the power consumption control unit are used for managing the power consumption of the server, so that the effective operation of the server is ensured, the problem that the power consumption of the server cannot be dynamically managed under the premise of ensuring the performance of the server is solved, the application power consumption of different components in the server is dynamically adjusted according to the power consumption change condition of the power supply unit corresponding to the server, and the operation effect of the server in the system fault of the server is ensured.
Drawings
FIG. 1 is a block diagram of a power consumption management system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a power consumption monitoring unit according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a power consumption protection unit according to an embodiment of the present application;
fig. 4 is a schematic diagram of a structure of a power consumption control unit according to an embodiment of the present application;
fig. 5 is a hardware configuration block diagram of a server device of a power consumption management method according to an embodiment of the present application;
FIG. 6 is a flow chart of a power consumption management method according to an embodiment of the present application;
FIG. 7 is a flow chart diagram illustration of overall dynamic protection initiation according to an embodiment of the present application;
fig. 8 is a block diagram of a computer system of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In this embodiment, a power consumption management system is further provided, and the system is used to implement the embodiment and the preferred implementation, which are not described in detail. As used below, the terms "module," "unit" are a combination of software and/or hardware that can implement the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
FIG. 1 is a block diagram of a power consumption management system according to an embodiment of the present application, as shown in FIG. 1, including
The power consumption monitoring unit 12 is configured to monitor input power consumption provided by N power supply units in the power supply units corresponding to the server and consumption power consumption corresponding to the power utilization units in the server, where the power utilization units at least include: the power consumption monitoring method comprises the steps of monitoring a first type of components of power consumption through a power management engine in a server and a second type of components of power consumption through a power limiting function module of the server, wherein the first type of components are components for supporting the server to carry out system logic control, the second type of components are components for executing parallel computation by the server, and N is a positive integer;
the power consumption control unit 14 is connected with the power consumption monitoring unit and is used for controlling the power consumption of the sub-consumption power consumption corresponding to different types of components in the power utilization unit according to the input power consumption, the consumption power consumption and a preset adjustment priority;
and the power consumption protection unit 16 is connected with the power consumption control unit, and is used for recording the target value of the consumed power consumption before the at least one power supply unit with the abnormality when the abnormality occurs in the at least one power supply unit in the N power supply units, and comparing the target value with the input value of the target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units so as to determine whether the consumed power consumption of the server is reduced.
When the power supply unit corresponding to the server is determined to have abnormal power input, power failure or other abnormal power consumption risks, the power consumption protection unit is used for triggering operation protection of the server in time, namely, various protection rules can be preset in the power consumption protection unit, when the server triggers a certain rule in the operation process, the risk of abnormal operation of the whole server is described, at the moment, the power consumption corresponding to the server needs to be adjusted in time, and the actual power consumption of an image processor in the server is reduced in time, so that the overall power consumption of the server is controlled to be in a range which can be supported by the effective input power consumption corresponding to the power supply unit on the basis of not influencing the operation of the server by reducing the actual power consumption of the image processor under the condition that the stable operation of the central processor of the whole system of the server is ensured. Therefore, when the power supply unit corresponding to the server is abnormal, the power consumption protection unit can be used for triggering the power consumption control unit to adjust the overall power consumption of the server in time, so that the overall effective operation of the server is ensured.
According to the system, the power consumption monitoring unit monitors the consumption power consumption of the first type of components supporting the system logic control of the server and the second type of components for executing parallel calculation by the server, under the condition that the input power consumption corresponding to the power supply unit corresponding to the server is known, the input power consumption is compared with the monitored consumption power consumption, whether the current power supply unit meets the power consumption requirement of the server is determined, and then when the power supply unit is abnormal, the power consumption protection unit and the power consumption control unit manage the server, so that the effective operation of the server is ensured, the problem that the dynamic management of the power consumption of the server cannot be carried out on the premise of ensuring the performance of the server is solved, the application power consumption of different components in the server is dynamically adjusted according to the power consumption change condition of the power supply unit corresponding to the server, and the operation effect of the server in the system fault is ensured.
In one exemplary embodiment, the power consumption management system includes: a power management bus; one end of the power management bus is connected with the N power supply units, and the other end of the power management bus is connected with the power consumption monitoring unit and used for acquiring the running state information and the sub-input power consumption corresponding to each power supply in the N power supply units.
It can be understood that, in order to facilitate effective management of the power supplies corresponding to the server, the operation state information and the sub-input power consumption of a plurality of power supplies for supplying power to the server are obtained in real time through the corresponding power management bus, and then the power consumption provided by the power supply unit for the server can be accurately determined, the real-time states of different power supplies in the power supply unit can be monitored through the operation state information, and when at least one power supply in the power supply unit has abnormal power supply, the abnormal power supply in the power supply unit can be rapidly positioned through the operation state information, so that the influence of the abnormal power supply on the power supply of the server is reduced.
It can be understood that, after determining the power consumption of different central processing units and graphics processors in the server, in order to ensure the effectiveness of adjustment, real-time information of a power supply unit supplying power to the server needs to be obtained in time, so as to determine the consumable power consumption of the power supply unit for the server, that is, determine the power supply of the server while determining the corresponding load of the server; therefore, when the subsequent power consumption is adjusted, the effective power supply of the power supply unit is ensured, and the overload operation is avoided.
As an alternative implementation manner, fig. 2 is a schematic structural diagram of a power consumption monitoring unit according to an embodiment of the present application, where the power consumption monitoring unit is disposed in a motherboard management controller, obtains a current state and input power consumption of the power supply unit through a PMBUS connection line existing between the motherboard management controller and the power supply unit, monitors a power consumption change condition of a CPU through an I2C communication line between the motherboard management controller and Intel ME power supply management, and in addition, the motherboard management controller and a GPU module installed in a server perform interactive communication through an SMBPBI protocol to monitor a power consumption change condition of the GPU, so that the power consumption monitoring unit can know a power consumption requirement of the system in real time. By monitoring pointers such as power consumption, complexity and the like of the processed system module, the unit can accurately judge the power consumption states of the system and the GPU.
In summary, through the above manner, when monitoring different power consumption components in the server, the consumable power consumption provided by the corresponding power supply of the server is determined, so that the situation that the power consumption is insufficient for the use of the server in the adjustment process is avoided, the running safety of the server is ensured, the data loss caused by the stopping of the server is avoided, and the safety management of the corresponding power supply of the server is improved.
In one exemplary embodiment, the power consumption monitoring unit includes: the first monitoring subunit is connected with the power management engine through a bus and is used for determining first sub-consumption power consumption corresponding to the first type component according to engine data in the power management engine under the condition that the server is in an operation state; the second monitoring subunit is connected with the power limiting function module through a preset communication channel and is used for determining second sub-consumption power consumption according to the running condition of the second type component under the condition that the server is in a running state, wherein the second type component is a module consisting of M graphic processors, and M is a positive integer.
Optionally, in practical application, the first monitoring subunit is an application function set in a BMC (Baseboard Management controller motherboard management controller, abbreviated as BMC) installed in a server, and physical connection between the BMC and the central processor is constructed by using a bus, so that first power consumption information corresponding to the central processor can be effectively collected through the BMC, thereby ensuring accuracy of the first power consumption information; the second monitoring subunit is an application function set in a BMC (Baseboard Management controller motherboard management controller, abbreviated as BMC) installed in the server, and because the BMC is connected with a motherboard corresponding to the server, the graphics processor is installed on the motherboard, and then under the condition that the BMC and the graphics processor have a physical connection relationship, an interactive communication protocol can be performed with the graphics processor through setting, so that the BMC can smoothly read second power consumption information in the graphics processor through the protocol.
In summary, by the above manner, different monitoring subunits corresponding to different functions are built in the existing BMC, so that corresponding power consumption information is collected in detail before power consumption operation is performed by using different power consumption information, and the accuracy of post power consumption operation is ensured.
In an exemplary embodiment, the power consumption protection measurement unit further includes: and the load reducing sub-unit is used for sending a strong load reducing instruction to the power consumption control unit when the target value is larger than the input value of target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units, wherein the strong load reducing instruction is used for indicating that the consumption power consumption corresponding to the target value is regulated to be smaller than or equal to the target consumption power consumption of the target input power consumption.
Optionally, in the running process of the server, the power consumption transformation of the server is determined by recording the total power consumption information, the total power consumption information accumulated by the server at present is recorded between the anomalies of the power supply units, the corresponding power consumption requirement of the server is determined when the server runs before, and then when the target input power consumption input to the server after the anomalies exist cannot meet the consumption power consumption corresponding to the total power consumption information, the power consumption of the server is reduced by timely using the strong load reduction function started in the power control unit, so that the stability of the running of the system can be ensured when the anomalies occur in the power supply units.
As an alternative implementation manner, fig. 3 is a schematic structural diagram of a power consumption protection unit according to an embodiment of the present application, where the power consumption protection unit is also disposed in a motherboard management controller, and monitors a power consumption condition of a system by monitoring a power input condition in real time and by monitoring a power consumption condition of the system by the power consumption monitoring unit in real time, and performs power consumption management when the power supply unit is not abnormal based on the two conditions, and when the power supply (i.e., the power supply unit) inputs a abnormal condition, the system starts up the system for protection, and then performs dynamic power consumption management by the power consumption control unit.
It should be noted that the automatic protection unit is related to an abnormal condition of the system. When the power input abnormality, power failure or other abnormal power consumption risks are detected, the unit mechanism immediately triggers protection, and the safety of the GPU and the whole system is protected by crossing control of the Intel ME power management and NVIDIA Power limit functions.
In summary, by comparing the total power consumption information with the target input power consumption input by the power supply unit in the above manner, when the target input power consumption cannot support the consumption power corresponding to the total power consumption information, the strong load reduction of the power consumption of the server is performed, so that the running stability of the whole system of the server is ensured, the safety and the running stability of the system at present are ensured, and the normal running of the system is ensured when the abnormality occurs.
In an exemplary embodiment, the power consumption protection unit further includes: a computing subunit, configured to determine, when the preset adjustment priority is obtained, a power consumption range in which the second type component is allowed to reduce and an operation power consumption required by the first type component to operate normally, where the preset adjustment priority packet at least includes: the first class component corresponds to the first priority, and the second class component corresponds to the second priority.
It should be noted that, when the type of the server is a GPU server, the first type of component is a server component that controls the overall system operation of the GPU server, for example, a CPU, a memory of the server, or a heat dissipation component in the server, and the second type of component is a GPU module that performs data processing or performs a service task in the GPU server, so when the input power consumption corresponding to the server is low, the power consumption of a part of GPU units in the GPU module is preferentially controlled, so that the power consumption is reduced under the condition that the overall operation performance of the server is not affected, and the power consumption corresponding to each GPU unit in the GPU module can be reduced according to the performance requirement of the current data processing or service task, without affecting the overall processing efficiency, that is, in the process of performing power consumption adjustment, the normal operation of the server system needs to be preferentially ensured, and only when the second type of component cannot be adjusted, the power consumption control can be performed on the component with little influence on the operation of the server in the first type of component.
It can be understood that, because the running requirements of different types of servers are different, some servers need to adjust the power consumption corresponding to the graphics processor preferentially when the power consumption is adjusted, so as to reduce the power consumption of the graphics processor in the whole module, and some servers need to ensure the efficient running of the central processor preferentially. Therefore, the adjusted priority can be preset in the server system according to the actual operation requirement, and then the operation requirement of the corresponding equipment can be ensured after the adjustment is triggered.
In an exemplary embodiment, the power consumption protection unit further includes: and the notification subunit is used for sending first notification information carrying the operation power consumption to the power management engine and sending second notification information carrying the power consumption range to the power limiting functional module under the condition that the calculation subunit completes the calculation of the power consumption range and the operation power consumption.
Under the condition that the calculation of the power consumption adjustment corresponding to the first type of components and the second type of components is completed, first notification information carrying the operation power consumption is sent to a power management engine for managing the first type of components, second notification information carrying the power consumption range is sent to a power limiting function module for managing the second type of components, and accordingly the power management engine and the power limiting function module are instructed to dynamically adjust the overall power consumption of the server in the power consumption adjustment range corresponding to the notification information, and therefore good operation effect of the server can be guaranteed under the condition that an external power supply unit is abnormal.
In an exemplary embodiment, the power consumption control unit further includes: the first control subunit is used for controlling the reduction of the first sub-consumption power consumption corresponding to the first type of component according to the first notification information; and the second control subunit is used for controlling the reduction of the second sub-consumption power consumption corresponding to the second type of component according to the second notification information.
In an exemplary embodiment, the power consumption protection unit further includes: a recording subunit, configured to record total power consumption information currently accumulated by the server when the power supply unit is abnormal;
in an exemplary embodiment, the power consumption control unit further includes: a comparing subunit, configured to determine a first power consumption value of target input power consumption provided by the power supply unit for the server according to the power supply information, and determine a second power consumption value of real-time consumption of the server according to the first power consumption information and the second power consumption information; and comparing the first power consumption value with the second power consumption value, and adjusting the power consumption of the graphics processor corresponding to the power consumption to be applied according to the comparison result.
As an optional implementation manner, when the power consumption of the image processor reaches the maximum power consumption output requirement, the power consumption monitoring unit calculates and collates the monitored current power consumption and outputs a current total power consumption output value, and then the output value is transmitted to the power consumption control unit, and the power consumption control unit dynamically adjusts the access power supply of the server according to the received power consumption output value to ensure the stable operation of the system.
As an alternative implementation manner, fig. 4 is a schematic structural diagram of a power consumption control unit according to an embodiment of the present application, and based on power consumption information provided by the power consumption monitoring unit, the power consumption control unit of the system dynamically adjusts system power consumption by combining Intel ME power management and NVIDIA Power limit functions with power supply, so as to protect system stability.
As an alternative implementation, the method embodiment provided in the embodiment of the present application may be executed in a server device or a similar computing device. Taking the operation on the server device as an example, fig. 5 is a hardware block diagram of the server device of a power consumption management method according to an embodiment of the present application. As shown in fig. 5, the server device may include one or more (only one is shown in fig. 5) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like processing means) and a memory 104 for storing data, wherein the server device may further include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the structure shown in fig. 5 is merely illustrative and is not intended to limit the structure of the server apparatus described above. For example, the server device may also include more or fewer components than shown in fig. 5, or have a different configuration than shown in fig. 5.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a power consumption management method in the embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104, thereby performing various functional applications and data processing, that is, implementing the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located with respect to the processor 102, which may be connected to the server device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a server device. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, a power consumption management method is provided, and fig. 6 is a flowchart of the power consumption management method according to an embodiment of the present application, as shown in fig. 6, where the flowchart includes the following steps:
step S602, monitoring, by the power consumption monitoring unit, N power supply units in the power supply units corresponding to the server and power consumption units in the server, to obtain input power consumption corresponding to the N power supply units and consumption corresponding to the power consumption units; wherein, the electricity consumption unit includes at least: the power consumption monitoring method comprises the steps of monitoring a first type of components of power consumption through a power management engine in a server and a second type of components of power consumption through a power limiting function module of the server, wherein the first type of components are components for supporting the server to carry out system logic control, the second type of components are components for executing parallel computation by the server, and N is a positive integer;
step S604, performing power consumption control on sub-consumption power consumption corresponding to different types of components in the power consumption unit according to the input power consumption, the consumption power consumption and a preset adjustment priority;
step S606, when the power consumption control is performed and it is determined that at least one power supply unit among the N power supply units is abnormal, recording a target value of the power consumption before the at least one power supply unit is abnormal, and comparing the target value with an input value of the target input power consumption corresponding to all power supplies except the at least one power supply unit among the N power supply units, so as to determine whether to reduce the power consumption of the server.
Through the steps, the power consumption monitoring unit monitors the consumption power consumption of the first type of components supporting the system logic control of the server and the second type of components for executing parallel calculation by the server, compares the input power consumption with the monitored consumption power consumption under the condition of the known power supply unit corresponding to the server, determines whether the current power supply unit meets the power consumption requirement of the server, and then when the power supply unit is abnormal, manages the power consumption of the server through the power consumption protection unit and the power consumption control unit, so that the effective operation of the server is ensured, the problem that the dynamic management of the power consumption of the server cannot be carried out on the premise of ensuring the performance of the server is solved, the application power consumption of different components in the server is dynamically adjusted according to the power consumption change condition of the power supply unit corresponding to the server, and the operation effect of the server in system fault is ensured.
In an exemplary embodiment, comparing the target value with the input values of target input power consumption corresponding to all power sources except the at least one power source unit in the N power source units to determine whether to reduce the power consumption of the server includes: determining to reduce the power consumption of the server if the target value is greater than the input value; and determining not to perform reduction of the consumption power consumption of the server in the case that the target value is less than or equal to the input value.
In an exemplary embodiment, after determining to reduce the consumption power consumption of the server, the method further comprises: determining a difference between the target value and the input value; determining to execute a first reduction strategy on the server under the condition that the difference value is larger than a preset threshold value, wherein the first reduction strategy is used for indicating that the graphics processors with the target number are selected from M graphics processors corresponding to the second type component are in a state of releasing operation; and under the condition that the difference value is smaller than or equal to a preset threshold value, determining to execute a second reduction strategy on the server, wherein the second reduction strategy is used for indicating to reduce the number of parallel computing tasks corresponding to the second type of components so as to reduce second sub-consumption.
It can be understood that when the GPU power consumption in the server reaches the maximum power consumption output requirement, the power consumption monitoring unit calculates and sorts the monitored current power consumption and outputs the current total power consumption output value, then the output value is transmitted to the power consumption control unit, the control unit receives the power consumption output value, dynamically adjusts the power supply through the Intel ME power management and NVIDIA power limiting function, ensures the stable operation of the system, and then the dynamic power consumption protection unit of the system starts detection, when the system detects that the system power consumption is abnormal, the protection unit dynamically adjusts the system power consumption through the power consumption control unit to realize the protection mechanism of the system, thereby ensuring the safety and stability of the current operation of the system and ensuring the normal operation of the system when the abnormality occurs.
In summary, through the above manner, in the process of adjusting the corresponding power consumption of the server, the input power consumption of the corresponding power supply unit of the server and the consumption power consumption of the whole system operation of the server are balanced, so that in the process of dynamically adjusting, the adjusted power consumption of the server can be ensured to meet the requirement of stable operation, data cannot be lost under the fault condition, and in the condition that the input power consumption of the power supply unit is overloaded, the overload operation of the GPU cannot be caused, and the operation safety of the GPU is ensured.
The main execution body of the above steps may be a server, a terminal, or the like, but is not limited thereto.
To facilitate an understanding of the embodiments of the present application, relevant scenarios will now be explained, but this application is not limited thereto.
With the advent of computationally intensive tasks such as Artificial Intelligence (AI) and deep learning, graphics Processors (GPUs) have been widely used as one of the main tools for high performance computing. However, high performance brings about high power consumption, which presents new challenges for stability and reliability of hardware. In particular, in large data centers and AI training scenarios, power management of GPU servers becomes critical.
Traditionally, many GPU servers use fixed power limits or settings to control power consumption. Although the static method can ensure the stable operation of hardware, the static method cannot effectively adapt to the change of different operation loads. In addition, static constraints can lead to wasted resources and reduced performance, as AI and deep learning workloads are often characterized by uncertainty.
In the marketplace, the need for GPU servers that enable dynamic power management is increasing. These demands come from a variety of fields, including large data centers, cloud computing providers, scientific research institutions, and businesses that require high performance computing. In these applications, the efficiency and stability of the GPU server is directly related to cost savings and task completion time.
Under the background, the application provides a dynamic power consumption management and protection architecture based on functions of BMC management Intel ME power capping and NVIDIA Power limit, which aims to overcome the defects of the existing method and provide a more effective power management solution for a GPU server, so that the application value of the GPU server on the market is improved.
As an alternative embodiment, the above power consumption management method may be applied in the architecture of the following example;
Optionally, an optional example architecture combines a system motherboard management controller corresponding to the server with an Intel ME power management and NVIDIA power limiting function, and dynamically controls the Intel ME power management and NVIDIA power limiting function to perform power consumption management on the server by determining a change relationship between input power and application power in the server by using a dynamic adjustment function of the motherboard management controller.
Optionally, the architecture comprises a power consumption monitoring unit, a system power consumption control unit and a system dynamic power consumption protection unit;
the power consumption monitoring unit is used for continuously monitoring the power consumption change conditions of the CPU and the GPU in the server so as to know the power consumption requirement of the server system in real time. And accurately judging the power consumption states of the server system and the GPU in the server by monitoring and processing pointers such as power consumption, complexity and the like of system modules corresponding to different servers in the whole cabinet.
It should be noted that, the above power consumption monitoring unit is mainly implemented functionally by a BMC (Baseboard Management controller motherboard management controller, abbreviated as BMC) installed in a server.
Optionally, the system power consumption control unit is configured to determine the power consumption of the current server during operation based on the power consumption information provided by the load monitoring unit, so that the current power consumption of the server is adjusted by using the system power consumption control unit through an Intel ME power management function and an NVIDIA power limiting function, and in the adjustment, the power consumption information of the power supply source of the server, that is, the power consumption amount of the system in the server is dynamically adjusted by determining the input power consumption of the power supply unit of the server and the overall power consumption of the server, so as to protect the stable operation of the system in the server.
Optionally, the system dynamic power consumption protection unit is used for automatically protecting the unit from abnormal conditions of the system. When detecting abnormal power input, power failure or other abnormal power consumption risks, the unit mechanism can immediately trigger protection, and the safety of the GPU and the whole system is protected by crossing over the control Intel ME power management and NVIDIA power limiting functions.
In an exemplary embodiment, the method further comprises: determining operation information of the server after reduction under the condition that the consumption power consumption of the server is reduced is determined; and determining the type of information sent to the target object according to the running information.
That is, after determining that the power consumption of the server is reduced, the running information of the adjusted server may be estimated, and the running information may be sent to the target object for managing the server, so that the target object knows the adjustment result in advance, and when sending the information, the type of information sent to the target object may be determined according to the adjustment result, for example, the first type of information is adjusted for the first type of component, and the second type of information is adjusted for the second type of component.
In an exemplary embodiment, determining the type of information sent to the target object according to the running information includes: when the operation information indicates that a server is in normal operation, a first message with a prompt information type is sent to a target object associated with the server, wherein the first message is used for indicating duration of continuous operation of the server when at least one power supply unit is abnormal; and under the condition that the running information indicates that the server is in abnormal running, sending a second message with the information type of failure to a target object associated with the server, wherein the second message is used for indicating the reason of the abnormal running of the server.
In an exemplary embodiment, the second message includes at least one of: a first type component supporting system logic control in the server fails; the second type of component in the server performing the parallel computation fails.
In an exemplary embodiment, after sending a second message with a failed information type to the target object associated with the server, the method further includes: determining the number of times of occurrence of the same second message after a preset time period; determining to initiate a maintenance task of the server under the condition that the times are larger than preset times; and under the condition that the times are smaller than or equal to the preset times, determining the second message as a fault message with reduced consumption power consumption, and recording the second message in the server.
In an exemplary embodiment, after the power consumption monitoring unit monitors N power supply units in the power supply units corresponding to the server and the power consumption units in the server to obtain the input power consumption corresponding to the N power supply units and the consumption power consumption corresponding to the power consumption units, the method further includes: comparing the input power consumption with a last recorded historical input power consumption in the server; determining that the power supply unit is abnormal under the condition that the input power consumption is smaller than the historical input power consumption; and determining that the power supply unit is not abnormal in the case where the input power consumption is equal to or greater than the historical input power consumption.
As an optional implementation manner, after the architecture is completed, the dynamic management function corresponding to the server can be executed, and after the dynamic management function is started, the dynamic power consumption management and protection of the GPU server are started after the system corresponding to the server is started; the monitoring management unit (corresponding to the power consumption monitoring unit in the above embodiment) starts the system management chip BMC to perform real-time monitoring of the power input state management and the PMBUS to acquire the current state of the power supply and the input power consumption, and also monitors the CPU/memory power consumption state through the Intel ME management function. When an abnormal or damaged Power supply occurs in an input Power supply, the system monitors and manages the system in real time, the system monitors and manages the unit, the system dynamic Power consumption protection unit (equivalent to the Power consumption protection unit in the embodiment) can be started in an emergency mode, the system is recorded with current Power consumption information, the emergency load reduction is carried out, when the Power supply fails, the system can still continue to work normally through another Power supply, the system Power consumption control unit is started, the system Power consumption value is dynamically adjusted by comparing the recorded Power consumption value before the last abnormal Power supply occurs, the priority order is adjusted, the GPU Power limit Power consumption is adjusted firstly, then the dynamic Power consumption protection unit carries out internal logic calculation, the amount of Power consumption required to be filled in by Intel ME is calculated, after the Power consumption is calculated, the BMC is informed of the ME for Power consumption management, the process is executed in a circulating mode, dynamic adjustment is continuously monitored, and the dynamic Power consumption management of the GPU server is realized, and the system can have certain optimal efficiency under a fault scene.
As an alternative implementation, fig. 7 is a flowchart illustration of overall dynamic protection initiation according to an embodiment of the present application, including the following steps:
in step S702, under the condition that the system of the server is in normal operation, the monitoring states and inputs of all PSUs are continuously collected.
Step S704, under the condition that the system of the server is in abnormal operation, the current total power consumption is recorded, and the dynamic power management and protection unit is started.
Step S706, starting a CPU/GPU strong line-down function;
step S708, starting dynamic power supply calculation for the first time;
step S710, starting a power consumption control unit;
step S712, performing power consumption comparison calculation;
step S714, determining that the system is normally operated under the condition that the last recorded total power supply is smaller than or equal to the current power supply power consumption;
step S716, determining abnormal operation of the system under the condition that the total power supply is recorded to be larger than the current power supply power consumption last time; performing a second start dynamic power calculation;
step S718, according to the calculation result of dynamic power calculation, ME power management and GPU power management are started;
and step S720, after management is completed, prompting the system to normally operate, and waiting for maintenance members to repair the power supply.
In summary, the present application designs an artificial intelligent GPU server dynamic power consumption management and protection architecture based on an Intel server platform, so as to realize the fault protection mechanism of the present GPU server, and can meet the requirements of a data center PDU damage power protection mechanism and maintain the continuous operation of a whole cabinet system in a machine room without interruption, and through Intel ME power management and NVIDIA Power limit functions, efficient power management and protection are realized, more intelligent power management is provided, when the whole system is in fault, the power consumption supporting efficiency is ensured, the hardware damage is prevented, and the system reliability is improved.
Optionally, the technical scheme of the application has the following advantages: combining the two functions of Intel ME power management and NVIDIA Power limit, a set of complete dynamic power protection and management functions special for the GPU server product is determined. And the product realization defect of the existing power management protection on the GPU server is overcome, and a more effective power management solution is provided for the GPU server, so that the application value of the GPU server on the market is improved. In addition, the implementation scheme strengthens the overall power supply dynamic management mode in the server and optimizes the power supply management function of the system.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Embodiments of the present application also provide an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Fig. 8 schematically shows a block diagram of a computer system for implementing an electronic device according to an embodiment of the present application. As shown in fig. 8, the computer system 800 includes a central processing unit 801 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 802 (ROM) or a program loaded from a storage section 808 into a random access Memory 803 (Random Access Memory, RAM). In the random access memory 803, various programs and data required for system operation are also stored. The central processing unit 801, the read only memory 802, and the random access memory 803 are connected to each other through a bus 804. An Input/Output interface 805 (i.e., an I/O interface) is also connected to the bus 804.
The following components are connected to the input/output interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, and a speaker, and the like; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a local area network card, modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the input/output interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor, and an input/output device connected to the processor.
Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principles of the present application should be included in the protection scope of the present application.
Claims (17)
1. A power consumption management system, applied to a server, comprising:
The power consumption monitoring unit is used for monitoring input power consumption provided by N power supply units in the power supply unit corresponding to the server and consumption power consumption corresponding to the power utilization unit in the server, wherein the power utilization unit at least comprises: the power consumption monitoring method comprises the steps of monitoring a first type of components of power consumption through a power management engine in a server and a second type of components of power consumption through a power limiting function module of the server, wherein the first type of components are components for supporting the server to carry out system logic control, the second type of components are components for executing parallel computation by the server, and N is a positive integer;
the power consumption control unit is connected with the power consumption monitoring unit and is used for controlling the power consumption of the sub-consumption power consumption corresponding to different types of components in the power utilization unit according to the input power consumption, the consumption power consumption and a preset adjustment priority;
and the power consumption protection unit is connected with the power consumption control unit and is used for recording the target value of the consumed power consumption before the abnormal at least one power supply unit occurs under the condition that the at least one power supply unit is abnormal in the N power supply units, and comparing the target value with the input value of the target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units so as to determine whether the consumed power consumption of the server is reduced.
2. The power consumption management system of claim 1, wherein the power consumption management system comprises: a power management bus;
one end of the power management bus is connected with the N power supply units, and the other end of the power management bus is connected with the power consumption monitoring unit and used for acquiring the running state information and the sub-input power consumption corresponding to each power supply in the N power supply units.
3. The power consumption management system according to claim 1, wherein the power consumption monitoring unit includes:
the first monitoring subunit is connected with the power management engine through a bus and is used for determining first sub-consumption power consumption corresponding to the first type component according to engine data in the power management engine under the condition that the server is in an operation state;
the second monitoring subunit is connected with the power limiting function module through a preset communication channel and is used for determining second sub-consumption power consumption according to the running condition of the second type component under the condition that the server is in a running state, wherein the second type component is a module consisting of M graphic processors, and M is a positive integer.
4. The power consumption management system according to claim 1, wherein the power consumption protection measurement unit further comprises:
And the load reducing sub-unit is used for sending a strong load reducing instruction to the power consumption control unit when the target value is larger than the input value of target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units, wherein the strong load reducing instruction is used for indicating that the consumption power consumption corresponding to the target value is regulated to be smaller than or equal to the target consumption power consumption of the target input power consumption.
5. The power consumption management system according to claim 1, wherein the power consumption protection unit further comprises:
a computing subunit, configured to determine, when the preset adjustment priority is obtained, a power consumption range in which the second type component is allowed to reduce and an operation power consumption required by the first type component to operate normally, where the preset adjustment priority packet at least includes: the first class component corresponds to the first priority, and the second class component corresponds to the second priority.
6. The power consumption management system according to claim 5, wherein the power consumption protection unit further comprises:
and the notification subunit is used for sending first notification information carrying the operation power consumption to the power management engine and sending second notification information carrying the power consumption range to the power limiting functional module under the condition that the calculation subunit completes the calculation of the power consumption range and the operation power consumption.
7. The power consumption management system according to claim 6, wherein the power consumption control unit further comprises:
the first control subunit is used for controlling the reduction of the first sub-consumption power consumption corresponding to the first type of component according to the first notification information;
and the second control subunit is used for controlling the reduction of the second sub-consumption power consumption corresponding to the second type of component according to the second notification information.
8. A power consumption management method, characterized by being applied to the server of the power consumption management system according to any one of claims 1 to 7, comprising:
n power supply units in the power supply units corresponding to the server and the power utilization units in the server are monitored through the power consumption monitoring units, so that input power consumption corresponding to the N power supply units and consumption power consumption corresponding to the power utilization units are obtained; wherein, the electricity consumption unit includes at least: the power consumption monitoring method comprises the steps of monitoring a first type of components of power consumption through a power management engine in a server and a second type of components of power consumption through a power limiting function module of the server, wherein the first type of components are components for supporting the server to carry out system logic control, the second type of components are components for executing parallel computation by the server, and N is a positive integer;
Performing power consumption control on sub-consumption power consumption corresponding to different types of components in the power utilization unit according to the input power consumption, the consumption power consumption and a preset adjustment priority;
and under the condition that the power consumption control is executed and at least one power supply unit is determined to be abnormal, recording a target value of the power consumption before the at least one power supply unit is abnormal, and comparing the target value with the input values of the target input power consumption corresponding to all power supplies except the at least one power supply unit in the N power supply units to determine whether the power consumption of the server is reduced.
9. The power consumption management method according to claim 8, wherein comparing the target value with the input value of target input power consumption corresponding to all power sources except the at least one power source unit of the N power source units to determine whether to perform reduction of consumption power consumption of the server, comprises:
determining to reduce the power consumption of the server if the target value is greater than the input value;
and determining not to perform reduction of the consumption power consumption of the server in the case that the target value is less than or equal to the input value.
10. The power consumption management method according to claim 9, wherein after determining to reduce the power consumption of the server, the method further comprises:
determining a difference between the target value and the input value;
determining to execute a first reduction strategy on the server under the condition that the difference value is larger than a preset threshold value, wherein the first reduction strategy is used for indicating that the graphics processors with the target number are selected from M graphics processors corresponding to the second type component are in a state of releasing operation;
and under the condition that the difference value is smaller than or equal to a preset threshold value, determining to execute a second reduction strategy on the server, wherein the second reduction strategy is used for indicating to reduce the number of parallel computing tasks corresponding to the second type of components so as to reduce second sub-consumption.
11. The power consumption management method according to claim 8, wherein the method further comprises:
determining operation information of the server after reduction under the condition that the consumption power consumption of the server is reduced is determined;
and determining the type of information sent to the target object according to the running information.
12. The power consumption management method according to claim 11, wherein determining the type of information transmitted to the target object based on the operation information comprises:
When the operation information indicates that a server is in normal operation, a first message with a prompt information type is sent to a target object associated with the server, wherein the first message is used for indicating duration of continuous operation of the server when at least one power supply unit is abnormal;
and under the condition that the running information indicates that the server is in abnormal running, sending a second message with the information type of failure to a target object associated with the server, wherein the second message is used for indicating the reason of the abnormal running of the server.
13. The power consumption management method of claim 12, wherein the second message comprises at least one of:
a first type component supporting system logic control in the server fails;
the second type of component in the server performing the parallel computation fails.
14. The power consumption management method according to claim 13, wherein after sending the second message of which information type is a failure to the target object associated with the server, the method further comprises:
determining the number of times of occurrence of the same second message after a preset time period;
Determining to initiate a maintenance task of the server under the condition that the times are larger than preset times;
and under the condition that the times are smaller than or equal to the preset times, determining the second message as a fault message with reduced consumption power consumption, and recording the second message in the server.
15. The power consumption management method according to claim 8, wherein after the power consumption monitoring unit monitors N power supply units in the power supply units corresponding to the server and the power consumption units in the server to obtain the input power consumption corresponding to the N power supply units and the consumption power consumption corresponding to the power consumption units, the method further comprises:
comparing the input power consumption with a last recorded historical input power consumption in the server;
determining that the power supply unit is abnormal under the condition that the input power consumption is smaller than the historical input power consumption;
and determining that the power supply unit is not abnormal in the case where the input power consumption is equal to or greater than the historical input power consumption.
16. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 8 to 15.
17. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method as claimed in any one of claims 8 to 15 when the computer program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311634162.6A CN117331425B (en) | 2023-12-01 | 2023-12-01 | Power consumption management system, power consumption management method, storage medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311634162.6A CN117331425B (en) | 2023-12-01 | 2023-12-01 | Power consumption management system, power consumption management method, storage medium, and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117331425A true CN117331425A (en) | 2024-01-02 |
CN117331425B CN117331425B (en) | 2024-03-22 |
Family
ID=89279694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311634162.6A Active CN117331425B (en) | 2023-12-01 | 2023-12-01 | Power consumption management system, power consumption management method, storage medium, and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117331425B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117666750A (en) * | 2024-01-31 | 2024-03-08 | 苏州元脑智能科技有限公司 | Power supply energy consumption adjusting method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116823587A (en) * | 2023-06-29 | 2023-09-29 | 苏州浪潮智能科技有限公司 | Graphics processor control method and device, electronic equipment and storage medium |
CN116991221A (en) * | 2023-05-19 | 2023-11-03 | 苏州浪潮智能科技有限公司 | Power consumption adjusting method and device |
-
2023
- 2023-12-01 CN CN202311634162.6A patent/CN117331425B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116991221A (en) * | 2023-05-19 | 2023-11-03 | 苏州浪潮智能科技有限公司 | Power consumption adjusting method and device |
CN116823587A (en) * | 2023-06-29 | 2023-09-29 | 苏州浪潮智能科技有限公司 | Graphics processor control method and device, electronic equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117666750A (en) * | 2024-01-31 | 2024-03-08 | 苏州元脑智能科技有限公司 | Power supply energy consumption adjusting method and device |
CN117666750B (en) * | 2024-01-31 | 2024-04-30 | 苏州元脑智能科技有限公司 | Power supply energy consumption adjusting method and device |
Also Published As
Publication number | Publication date |
---|---|
CN117331425B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117331425B (en) | Power consumption management system, power consumption management method, storage medium, and electronic device | |
US8473768B2 (en) | Power control apparatus and method for cluster system | |
US20160378570A1 (en) | Techniques for Offloading Computational Tasks between Nodes | |
US10466729B2 (en) | Power supply system, power management, apparatus, power management method, and power management program | |
US20080201595A1 (en) | Intelligent power control | |
US9037878B2 (en) | Server rack system | |
US11733762B2 (en) | Method to allow for higher usable power capacity in a redundant power configuration | |
CN114530904A (en) | Control method of uninterruptible power supply | |
CN115686935A (en) | Data backup method, computer device and storage medium | |
CN109639490B (en) | Downtime notification method and device | |
CN116557322A (en) | Fan control method and device | |
CN113360344B (en) | Server monitoring method, device, equipment and computer readable storage medium | |
US8046602B2 (en) | Controlling connection status of network adapters | |
CN109491867A (en) | A kind of communication automatic recovery method and device | |
CN116991221A (en) | Power consumption adjusting method and device | |
WO2023056851A1 (en) | Voltage monitoring method and apparatus, electronic device and storage medium | |
CN115912556A (en) | Battery charging method and device, electronic equipment and storage medium | |
CN115129565A (en) | Log data processing method, device, system, equipment and medium | |
US20200394081A1 (en) | Leveraging reserved data center resources to improve data center utilization | |
CN112230755A (en) | Power management method, device, equipment and machine-readable storage medium | |
CN115857641B (en) | Control method, device and equipment for fan rotor and storage medium | |
Mukherjee et al. | AMAS: Adaptive auto-scaling on the edge | |
CN118245269B (en) | PCI equipment fault processing method and device and fault processing system | |
CN117033084B (en) | Virtual machine backup method and device, electronic equipment and storage medium | |
CN117220653B (en) | Solid-state switch control method, solid-state switch system, control unit and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |