CN115373507A - Whole machine resource balance management method and system based on electric energy loss - Google Patents

Whole machine resource balance management method and system based on electric energy loss Download PDF

Info

Publication number
CN115373507A
CN115373507A CN202211316675.8A CN202211316675A CN115373507A CN 115373507 A CN115373507 A CN 115373507A CN 202211316675 A CN202211316675 A CN 202211316675A CN 115373507 A CN115373507 A CN 115373507A
Authority
CN
China
Prior art keywords
gpu
resource
gpu hardware
value
hardware
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211316675.8A
Other languages
Chinese (zh)
Other versions
CN115373507B (en
Inventor
耿春胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Pinli Technology Co ltd
Original Assignee
Beijing Pinli Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Pinli Technology Co ltd filed Critical Beijing Pinli Technology Co ltd
Priority to CN202211316675.8A priority Critical patent/CN115373507B/en
Publication of CN115373507A publication Critical patent/CN115373507A/en
Application granted granted Critical
Publication of CN115373507B publication Critical patent/CN115373507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Power Sources (AREA)

Abstract

A complete machine resource balance management method and a system based on electric energy loss relate to the technical field of computer electric energy management, and the method comprises the following steps: acquiring a GPU resource consumption curve of each task, determining a GPU resource consumption upper limit value of each task, determining a prepared resource value of each task based on a preset proportion of the GPU resource consumption upper limit value of each task, and determining a GPU resource theoretical total occupation value of each task in combination with the GPU resource consumption upper limit value; and determining a total GPU resource value according to the GPU resource theoretical total occupation value of each task, determining GPU hardware needing to be started according to a preset GPU hardware resource table, migrating all tasks to the GPU hardware needing to be started, and finally closing the GPU hardware without running the tasks, so that the power consumption of the GPU cluster host is reduced to the maximum extent.

Description

Whole machine resource balance management method and system based on electric energy loss
Technical Field
The invention relates to the technical field of computer electric energy management, in particular to a complete machine resource balance management method and system based on electric energy loss.
Background
Because the provided computing power of a single computer is prior, when a computing task with a large computing requirement needs to be processed, a cluster is usually used for computing, and the cluster is a super computer formed by interconnecting a plurality of computers through a high-speed network. The GPU is also a kind of computing resource, and currently, computing operations related to artificial intelligence and machine learning are generally performed by using the GPU. Usually, each node on the cluster configured with GPU resources will have multiple GPU graphics cards installed, for example, 8 GPU graphics cards installed or more, so the total number of GPU graphics cards in the cluster is very large.
A GPU cluster is a computer cluster in which each node is equipped with a graphics processing unit, and by utilizing the computational power of modern GPUs through general-purpose computation on the graphics processing unit, very fast computations can be performed using the GPU cluster.
When the computer cluster is applied, even in a state of very low task pressure, most GPUs still operate at a lower frequency, thereby causing a waste of a part of performance resources.
Disclosure of Invention
The invention aims to provide a method and a system capable of configuring electric energy of a GPU cluster control whole machine.
The invention discloses a complete machine resource balance management method based on electric energy loss, which comprises the following steps:
acquiring a GPU resource consumption curve of each task, determining a GPU resource consumption upper limit value of each task, determining a prepared resource value of each task based on a preset proportion of the GPU resource consumption upper limit value of each task, and determining a GPU resource theoretical total occupation value of each task in combination with the GPU resource consumption upper limit value;
and determining a total GPU resource value according to the GPU resource theoretical total occupation value of each task, determining GPU hardware needing to be started according to a preset GPU hardware resource table, migrating all tasks to the GPU hardware needing to be started, and finally closing the GPU hardware without task operation.
In some embodiments of the present application, in order to determine the GPU hardware that needs to be started, contents of the GPU hardware resource table are disclosed, and the contents of the GPU hardware resource table include:
and the GPU hardware is correspondingly set with position information, performance resource values and power consumption values aiming at each GPU hardware.
In some embodiments of the present application, in order to avoid overload of GPU hardware due to a newly injected task, the overall resource balancing management method is improved, and the overall resource balancing management method further includes:
and determining and starting prepared GPU hardware according to the total needed GPU resource value and the GPU hardware resource table, wherein the prepared GPU hardware is used for preparing to run the newly added task.
In some embodiments of the present application, in order to avoid overload of the GPU hardware caused by a newly injected task, the method for determining the preparation GPU hardware includes:
determining a total required prepared resource value according to a preset proportion of the total required GPU resource value, and determining prepared GPU hardware needing to be called according to the GPU hardware resource table, wherein the performance resource value of the prepared GPU hardware is larger than or equal to the total required prepared resource value.
In some embodiments of the present application, a method for presetting the GPU hardware resource table is disclosed, and the method for presetting the GPU hardware resource table includes:
after the position information of the GPU hardware is obtained, sending a virtual task to the GPU hardware so as to enable the GPU hardware to run at full load;
evaluating the execution effect of the virtual task based on the GPU hardware to determine a performance resource value of the GPU hardware;
and when the GPU hardware runs at full load, acquiring the power consumption of the GPU hardware so as to determine the power consumption value of the GPU hardware.
In some embodiments of the present application, in order to determine the GPU hardware that needs to be booted, a method for applying the GPU hardware resource table is disclosed, and the method for applying the GPU hardware resource table includes:
and determining a plurality of GPU hardware as the GPU hardware needing to be started according to the total needed GPU resource value and the GPU hardware resource table, so that the total performance resource value of the GPU hardware needing to be started is larger than the total needed GPU resource value, and the total energy consumption value of the GPU hardware is the lowest.
In some embodiments of the present application, a method for applying the GPU hardware resource table is further disclosed, and the method for applying the GPU hardware resource table further includes:
establishing a temporary GPU hardware call table, successively filling position information of GPU hardware in the temporary GPU hardware call table according to the total needed GPU resource value, and successively calculating the total performance resource value of the GPU hardware in the temporary GPU hardware call table;
and if the total performance resource value of the GPU hardware in the temporary GPU hardware call list is larger than or equal to the total needed GPU resource value, determining the GPU hardware corresponding to the position information of the GPU hardware recorded in the temporary GPU hardware call list as the GPU hardware needing to be started.
In some embodiments of the present application, in order to minimize the total energy consumption value of the GPU hardware that needs to be started, the method for applying the GPU hardware resource table further includes a method further including:
setting the performance resource value of the GPU hardware as a, setting the energy consumption value as b, setting the performance energy consumption ratio of the GPU hardware as h, and calculating the performance energy consumption ratio h by h = a/b;
and when the temporary GPU hardware call table is established, sequentially calling GPU hardware according to the relation that the performance energy consumption ratio h is reduced from large to small.
According to the overall resource balance management method based on the electric energy loss, the actually needed GPU resources are determined by establishing the relation between tasks and GPU resource consumption, GPU hardware needing to be started is determined according to the actually needed GPU resources, all tasks are migrated to the GPU hardware needing to be started, and then the GPU hardware without task execution is closed, so that the power consumption of a GPU cluster host is reduced to the maximum extent.
In some embodiments of the present application, a system for balancing and managing overall resources based on power consumption is disclosed, the system comprising:
the resource occupation condition acquisition unit is used for acquiring the GPU performance resource occupancy rate of the task;
the analysis unit is used for generating a GPU resource consumption curve according to the GPU performance resource occupation value, analyzing and determining the GPU resource consumption upper limit value of the task in a preset time period, determining the prepared resource value of the task based on the preset proportion of the GPU resource consumption upper limit value, determining the theoretical total occupation value of the GPU resource of the task by combining the GPU resource consumption upper limit value, determining the total GPU resource value according to the theoretical total occupation value of the GPU resource of each task, determining GPU hardware needing to be started according to a preset GPU hardware resource table, migrating all tasks to the GPU hardware needing to be started, and determining the GPU hardware needing to be stopped and not running the tasks;
and the power supply control unit is used for connecting or disconnecting the power supply of the GPU hardware according to the determination condition that the analysis unit needs to be started or closed on the GPU hardware.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
Fig. 1 is a method step diagram of a method for balancing management of resources of a whole machine based on power consumption in an embodiment of the present application;
fig. 2 is a flowchart of performing energy consumption control on a GPU cluster in this embodiment.
Detailed Description
The technical solution of the present invention is further illustrated by the accompanying drawings and examples.
Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The use of "first," "second," and similar terms in the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
Example (b):
the invention aims to provide a method and a system capable of configuring electric energy of a GPU cluster control whole machine.
The invention discloses a complete machine resource balance management method based on electric energy loss, and with reference to fig. 1, the method comprises the following steps:
and S100, acquiring a GPU resource consumption curve of each task, and determining the GPU resource consumption upper limit value of the task.
And S200, determining a prepared resource value of the task based on a preset proportion of the GPU resource consumption upper limit value of the task, and determining a GPU resource theoretical total occupancy value of the task by combining the GPU resource consumption upper limit value.
The upper limit value of the GPU resource consumption is the same as the fixed value standard of the prepared resource value, the resource value of the GPU hardware needs to be determined before the fixed value, the resource value of the GPU hardware is the capacity value of the GPU hardware for processing tasks, and the capacity value can be specifically set through the rendering effect of a graph or the execution effect of an algorithm. The purpose of setting the prepared resource value of the task is to avoid the problem of GPU hardware running overload in the execution process of a single task, so the prepared resource value is reserved. And the GPU total occupation value of the task is equal to the sum of the theoretical total occupation value of the GPU resources and the prepared resource value.
Step S300, determining a total GPU resource value according to the theoretical total occupation value of the GPU resources of each task, and determining GPU hardware needing to be started according to a preset GPU hardware resource table.
And step S400, migrating all tasks to GPU hardware needing to be started, and finally closing the GPU hardware without running the tasks.
In order to determine the GPU hardware that needs to be booted, in some embodiments of the present application, contents of the GPU hardware resource table are disclosed, and the contents of the GPU hardware resource table include:
and the GPU hardware is correspondingly set with position information, performance resource values and power consumption values aiming at each GPU hardware.
In some embodiments of the present application, in order to avoid overload of GPU hardware due to a newly injected task, the overall resource balancing management method is improved, and the overall resource balancing management method further includes: and determining and starting prepared GPU hardware according to the total needed GPU resource value and the GPU hardware resource table, wherein the prepared GPU hardware is used for preparing to run the newly added task.
In some embodiments of the present application, a method for presetting the GPU hardware resource table is disclosed, and the method for presetting the GPU hardware resource table includes:
the method comprises the steps of firstly, after position information of GPU hardware is obtained, sending a virtual task to the GPU hardware so as to enable the GPU hardware to run at full load.
The virtual task can be a virtual task based on a neural network learning algorithm and can also be a virtual task based on dynamic image rendering.
And secondly, evaluating the execution effect of the virtual task based on the GPU hardware so as to determine the performance resource value of the GPU hardware.
The evaluation method is that the same virtual task is set first, and the effect of executing the virtual task is used as the evaluation criterion.
And thirdly, acquiring the power consumption of the GPU hardware when the GPU hardware runs at full load so as to determine the power consumption value of the GPU hardware.
Wherein the energy consumption value and the consumed power are in direct proportion.
In order to avoid overload of GPU hardware caused by a newly injected task, in some embodiments of the present application, the method for determining the preparation GPU hardware includes: and determining a total required prepared resource value according to a preset proportion of the total required GPU resource value, and determining prepared GPU hardware needing to be called according to the GPU hardware resource table, wherein the performance resource value of the prepared GPU hardware is greater than or equal to the total required prepared resource value.
In comparison with the prepared resource value set for each GPU hardware, the above-mentioned prepared GPU hardware does not refer to reserving a part of performance resources in each GPU hardware, and in contrast, the prepared GPU hardware is set to avoid the situation that the task execution effect is poor due to the overload operation of the injected GPU hardware caused by the newly injected task.
In some embodiments of the present application, in order to determine the GPU hardware that needs to be started, a method for applying the GPU hardware resource table is disclosed, and the method for applying the GPU hardware resource table includes:
and determining a plurality of GPU hardware as GPU hardware needing to be started according to the total needed GPU resource value and the GPU hardware resource table, so that the total performance resource value of the GPU hardware needing to be started is larger than the total needed GPU resource value, and the total power consumption value of the GPU hardware is the lowest.
In some embodiments of the present application, a method for applying the GPU hardware resource table is further disclosed, and the method for applying the GPU hardware resource table further includes:
the method comprises the steps of firstly, establishing a temporary GPU hardware call table, filling position information of GPU hardware in the temporary GPU hardware call table successively according to the total needed GPU resource value, and calculating the total performance resource value of the GPU hardware in the temporary GPU hardware call table successively.
And secondly, if the total performance resource value of the GPU hardware in the temporary GPU hardware call table is larger than or equal to the total needed GPU resource value, determining the GPU hardware corresponding to the position information of the GPU hardware recorded in the temporary GPU hardware call table as the GPU hardware needing to be started.
In some embodiments of the present application, in order to minimize the total energy consumption value of the GPU hardware that needs to be started, the method for applying the GPU hardware resource table further includes a method further including:
firstly, setting the performance resource value of the GPU hardware as a, setting the energy consumption value as b, setting the performance energy consumption ratio of the GPU hardware as h, and calculating the performance energy consumption ratio h as h = a/b.
And secondly, sequentially calling GPU hardware according to the relationship from large to small of the performance energy consumption ratio h when the temporary GPU hardware calling table is established.
According to the complete machine resource balance management method based on the electric energy loss, the actually needed GPU resources are determined by establishing the relation between tasks and GPU resource consumption, GPU hardware needing to be started is determined according to the actually needed GPU resources, all the tasks are migrated to the GPU hardware needing to be started, and then the GPU hardware without task execution is closed, so that the power consumption of a GPU cluster host is reduced to the maximum extent.
To further illustrate the technical solution of the present application, a specific application scenario is now disclosed to explain the technical solution of the present application.
In order to implement the tasks of rendering and arithmetic operations of large-scale graphics, it is necessary to establish a GPU cluster, which is a computer cluster in which each node is equipped with a Graphics Processing Unit (GPU). Taking advantage of the computational power of modern GPUs through general-purpose computation on a graphics processing unit (GPGPU), very fast computations can be performed using a cluster of GPUs.
In order to solve the problem that a plurality of GPU hardware still consumes electric energy resources due to the fact that a GPU cluster has a running state with low task amount when executing tasks, a complete machine resource balance management system based on electric energy loss is needed, and the management system comprises an analysis unit, a resource occupation acquisition unit and a power supply control unit.
Referring to fig. 2, the method for controlling the GPU cluster by using the analysis unit, the resource occupation acquisition unit, and the power control unit includes the steps of:
firstly, a GPU resource consumption curve of each task is obtained.
And secondly, determining the upper limit value of GPU resource consumption.
And thirdly, determining the prepared resource value of the task.
And fourthly, determining the theoretical total occupation value of the GPU resources.
And fifthly, determining a total GPU resource value.
And sixthly, determining GPU hardware needing to be started.
And seventhly, migrating the task to GPU hardware needing to be started.
And step eight, closing GPU hardware without task operation.
The GPU resource consumption curve of each task is generated in a mode that the GPU performance resource occupancy rate is obtained through the resource occupancy condition obtaining unit, the GPU performance resource occupancy rate is estimated by taking the GPU hardware performance resource value determined by the analysis unit as a standard, and then the GPU resource consumption curve of each task is generated.
The analysis unit is used for generating a GPU resource consumption curve according to the GPU performance resource occupation value, analyzing and determining a GPU resource consumption upper limit value of a task in a preset time period, determining a prepared resource value of the task based on a preset proportion of the GPU resource consumption upper limit value, determining a GPU resource theoretical total occupation value of the task by combining the GPU resource consumption upper limit value, determining a total GPU resource required value according to the GPU resource theoretical total occupation value of each task, determining GPU hardware required to be started according to a preset GPU hardware resource table, migrating all tasks to GPU hardware required to be started, and determining GPU hardware required to be stopped and operated without the tasks.
And the power supply control unit is used for connecting or disconnecting the power supply of the GPU hardware according to the determination condition that the analysis unit needs to be started or closed on the GPU hardware.
According to the method and the system for the complete machine resource balanced management based on the electric energy loss, the GPU resources which are actually needed are determined by establishing the relation between the tasks and the GPU resource consumption, the GPU hardware which needs to be started is determined according to the relation, all the tasks are migrated to the GPU hardware which needs to be started, and then the GPU hardware which does not execute the tasks is closed, so that the power consumption of a GPU cluster host is reduced to the maximum extent.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the invention without departing from the spirit and scope of the invention.

Claims (9)

1. A complete machine resource balance management method based on electric energy loss is characterized by comprising the following steps:
acquiring a GPU resource consumption curve of each task, determining a GPU resource consumption upper limit value of each task, determining a prepared resource value of each task based on a preset proportion of the GPU resource consumption upper limit value of each task, and determining a GPU resource theoretical total occupation value of each task in combination with the GPU resource consumption upper limit value;
and determining a total GPU resource value according to the GPU resource theoretical total occupation value of each task, determining GPU hardware needing to be started according to a preset GPU hardware resource table, migrating all tasks to the GPU hardware needing to be started, and finally closing the GPU hardware without task operation.
2. The overall machine resource balance management method based on the electric energy loss according to claim 1, wherein the content of the GPU hardware resource table comprises:
and the plurality of GPU hardware are correspondingly set with position information, performance resource values and energy consumption values aiming at each GPU hardware.
3. The overall resource balance management method based on the electric energy loss as claimed in claim 2, wherein the overall resource balance management method further comprises:
and determining and starting prepared GPU hardware according to the total needed GPU resource value and the GPU hardware resource table, wherein the prepared GPU hardware is used for preparing to run the newly added task.
4. The power consumption-based complete machine resource balance management method according to claim 3, wherein the method for determining the standby GPU hardware comprises the following steps:
and determining a total required prepared resource value according to a preset proportion of the total required GPU resource value, and determining prepared GPU hardware needing to be called according to the GPU hardware resource table, wherein the performance resource value of the prepared GPU hardware is greater than or equal to the total required prepared resource value.
5. The power consumption-based complete machine resource balance management method according to claim 2, wherein the method for presetting the GPU hardware resource table comprises the following steps:
after the position information of the GPU hardware is obtained, sending a virtual task to the GPU hardware so as to enable the GPU hardware to run at full load;
evaluating the execution effect of the virtual task based on the GPU hardware to determine a performance resource value of the GPU hardware;
and when the GPU hardware runs at full load, acquiring the power consumption of the GPU hardware so as to determine the power consumption value of the GPU hardware.
6. The power consumption-based complete machine resource balance management method according to claim 2, wherein the method for applying the GPU hardware resource table comprises the following steps:
and determining a plurality of GPU hardware as GPU hardware needing to be started according to the total needed GPU resource value and the GPU hardware resource table, so that the total performance resource value of the GPU hardware needing to be started is larger than the total needed GPU resource value, and the total power consumption value of the GPU hardware is the lowest.
7. The power consumption-based complete machine resource balance management method according to claim 6, wherein the method for applying the GPU hardware resource table further comprises the following steps:
establishing a temporary GPU hardware call table, filling position information of GPU hardware in the temporary GPU hardware call table successively according to the total needed GPU resource value, and calculating the total performance resource value of the GPU hardware in the temporary GPU hardware call table successively;
and if the total performance resource value of the GPU hardware in the temporary GPU hardware call table is larger than or equal to the total needed GPU resource value, determining the GPU hardware corresponding to the position information of the GPU hardware recorded in the temporary GPU hardware call table as the GPU hardware needing to be started.
8. The power consumption-based complete machine resource balance management method according to claim 7, wherein the method for applying the GPU hardware resource table further comprises the following steps:
setting the performance resource value of the GPU hardware as a, setting the energy consumption value as b, setting the performance energy consumption ratio of the GPU hardware as h, and obtaining the performance energy consumption ratio h by the calculation method of h = a/b;
and when the temporary GPU hardware call table is established, sequentially calling GPU hardware according to the relation that the performance energy consumption ratio h is decreased from large to small.
9. A complete machine resource balance management system based on electric energy loss is characterized by comprising:
the resource occupation condition acquisition unit is used for acquiring the GPU performance resource occupancy rate of the task;
the analysis unit is used for generating a GPU resource consumption curve according to the GPU performance resource occupation value, analyzing and determining the GPU resource consumption upper limit value of the task in a preset time period, determining the prepared resource value of the task based on the preset proportion of the GPU resource consumption upper limit value, determining the theoretical total occupation value of the GPU resource of the task by combining the GPU resource consumption upper limit value, determining the total GPU resource value according to the theoretical total occupation value of the GPU resource of each task, determining GPU hardware needing to be started according to a preset GPU hardware resource table, migrating all tasks to the GPU hardware needing to be started, and determining the GPU hardware needing to be stopped and not running the tasks;
and the power supply control unit is used for connecting or disconnecting the power supply of the GPU hardware according to the determination condition that the analysis unit needs to be started or closed on the GPU hardware.
CN202211316675.8A 2022-10-26 2022-10-26 Whole machine resource balance management method and system based on electric energy loss Active CN115373507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211316675.8A CN115373507B (en) 2022-10-26 2022-10-26 Whole machine resource balance management method and system based on electric energy loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211316675.8A CN115373507B (en) 2022-10-26 2022-10-26 Whole machine resource balance management method and system based on electric energy loss

Publications (2)

Publication Number Publication Date
CN115373507A true CN115373507A (en) 2022-11-22
CN115373507B CN115373507B (en) 2023-01-06

Family

ID=84072821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211316675.8A Active CN115373507B (en) 2022-10-26 2022-10-26 Whole machine resource balance management method and system based on electric energy loss

Country Status (1)

Country Link
CN (1) CN115373507B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268063A1 (en) * 2004-05-25 2005-12-01 International Business Machines Corporation Systems and methods for providing constrained optimization using adaptive regulatory control
CN106951955A (en) * 2017-03-09 2017-07-14 中国人民解放军军械工程学院 Electronic cell number system of selection in bus embryo's electronic cell array
CN108074022A (en) * 2016-11-10 2018-05-25 中国电力科学研究院 A kind of hardware resource analysis and appraisal procedure based on concentration O&M
WO2018161842A1 (en) * 2017-03-10 2018-09-13 Huawei Technologies Co., Ltd. Optimization of energy management of mobile devices based on specific user and device metrics uploaded to cloud
WO2018232746A1 (en) * 2017-06-23 2018-12-27 上海诺基亚贝尔股份有限公司 Method and apparatus for resource management in edge cloud
CN109379727A (en) * 2018-10-16 2019-02-22 重庆邮电大学 Task distribution formula unloading in car networking based on MEC carries into execution a plan with cooperating
CN111614746A (en) * 2020-05-15 2020-09-01 北京金山云网络技术有限公司 Load balancing method and device of cloud host cluster and server
CN112540854A (en) * 2020-12-28 2021-03-23 上海体素信息科技有限公司 Deep learning model scheduling deployment method and system under condition of limited hardware resources
CN112764905A (en) * 2021-01-25 2021-05-07 江苏赞奇科技股份有限公司 Energy management method of cloud rendering system based on software definition

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268063A1 (en) * 2004-05-25 2005-12-01 International Business Machines Corporation Systems and methods for providing constrained optimization using adaptive regulatory control
CN108074022A (en) * 2016-11-10 2018-05-25 中国电力科学研究院 A kind of hardware resource analysis and appraisal procedure based on concentration O&M
CN106951955A (en) * 2017-03-09 2017-07-14 中国人民解放军军械工程学院 Electronic cell number system of selection in bus embryo's electronic cell array
WO2018161842A1 (en) * 2017-03-10 2018-09-13 Huawei Technologies Co., Ltd. Optimization of energy management of mobile devices based on specific user and device metrics uploaded to cloud
WO2018232746A1 (en) * 2017-06-23 2018-12-27 上海诺基亚贝尔股份有限公司 Method and apparatus for resource management in edge cloud
CN109379727A (en) * 2018-10-16 2019-02-22 重庆邮电大学 Task distribution formula unloading in car networking based on MEC carries into execution a plan with cooperating
CN111614746A (en) * 2020-05-15 2020-09-01 北京金山云网络技术有限公司 Load balancing method and device of cloud host cluster and server
WO2021228103A1 (en) * 2020-05-15 2021-11-18 北京金山云网络技术有限公司 Load balancing method and apparatus for cloud host cluster, and server
CN112540854A (en) * 2020-12-28 2021-03-23 上海体素信息科技有限公司 Deep learning model scheduling deployment method and system under condition of limited hardware resources
CN112764905A (en) * 2021-01-25 2021-05-07 江苏赞奇科技股份有限公司 Energy management method of cloud rendering system based on software definition

Also Published As

Publication number Publication date
CN115373507B (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN102508718B (en) Method and device for balancing load of virtual machine
US20190080429A1 (en) Adaptive scheduling for task assignment among heterogeneous processor cores
US8910153B2 (en) Managing virtualized accelerators using admission control, load balancing and scheduling
US10664318B2 (en) Method and apparatus for allocating computing resources of processor
US10514957B2 (en) Network service infrastructure management system and method of operation
CN112905326B (en) Task processing method and device
CN111506434B (en) Task processing method and device and computer readable storage medium
CN112527513B (en) Method and system for dynamically distributing multiple GPUs
CN114936173B (en) Read-write method, device, equipment and storage medium of eMMC device
CN110096339B (en) System load-based capacity expansion and contraction configuration recommendation system and method
CN105242954A (en) Mapping method between virtual CPUs (Central Processing Unit) and physical CPUs, and electronic equipment
US20210011764A1 (en) Workload/processor resource scheduling system
CN105335236B (en) A kind of distributed dynamic load leveling dispatching method and device of collecting evidence
CN115373507B (en) Whole machine resource balance management method and system based on electric energy loss
CN107423114B (en) Virtual machine dynamic migration method based on service classification
CN111367655B (en) Method, system and storage medium for GPU resource scheduling in cloud computing environment
CN116088971A (en) Service providing method and device, electronic equipment and storage medium
CN110806918A (en) Virtual machine operation method and device based on deep learning neural network
CN115718603A (en) Python model distributed online deployment method and system
CN107341060B (en) Virtual machine memory allocation method and device
CN112506622B (en) Cloud-mobile-phone-oriented GPU computing performance prediction method and device
CN115934349A (en) Resource scheduling method, device, equipment and computer readable storage medium
CN114153592A (en) Physical host load scheduling method and device of cloud platform, electronic equipment and medium
CN110908783A (en) Management and control method, system and equipment for virtual machine of cloud data center
CN111399942A (en) Network card configuration method, network card configuration device, network card configuration equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant