CN112559191B - Method and device for dynamically deploying GPU resources and computer equipment - Google Patents

Method and device for dynamically deploying GPU resources and computer equipment Download PDF

Info

Publication number
CN112559191B
CN112559191B CN202011538689.5A CN202011538689A CN112559191B CN 112559191 B CN112559191 B CN 112559191B CN 202011538689 A CN202011538689 A CN 202011538689A CN 112559191 B CN112559191 B CN 112559191B
Authority
CN
China
Prior art keywords
date
specified
model
historical
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011538689.5A
Other languages
Chinese (zh)
Other versions
CN112559191A (en
Inventor
孙浩鑫
王晟宇
赖众程
李会璟
李骁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202011538689.5A priority Critical patent/CN112559191B/en
Publication of CN112559191A publication Critical patent/CN112559191A/en
Application granted granted Critical
Publication of CN112559191B publication Critical patent/CN112559191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to the field of big data, and discloses a method for dynamically deploying GPU resources, which comprises the following steps: acquiring historical service data corresponding to a specified model in a system to be matched; predicting response time of a specified date corresponding to the specified model according to each historical date, the working day state corresponding to each historical date and the service request quantity of the day before each historical date; calculating response efficiency of the specified model corresponding to the specified date according to the response time of the specified date; acquiring monitoring data of a specified model on a statistical termination day corresponding to historical service data; calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date; and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified models, and dynamically matching the GPU resource duty ratio of the specified date corresponding to the specified models. And reasonably distributing the GPU resources according to the deployment state of the dynamic adjustment GPU resources.

Description

Method and device for dynamically deploying GPU resources and computer equipment
Technical Field
The present application relates to the field of big data, and in particular, to a method, an apparatus, and a computer device for dynamically deploying GPU resources.
Background
With the rapid growth of internet services, the service access volume and data traffic are rapidly increased, the demand for system computing resources is correspondingly increased, and the deployment mode of the GPU graphics card serving as a key resource for application computing directly influences the progress state of the services. Under the condition that AI engineering is still in the primary stage at present, the deployment of GPU resources in AI application computation generally depends on manual adjustment, the deployment of GPU resources is solidified, the deployment needs to be manually expanded when the application access flow is increased, excessive GPU resource waste easily exists when the flow is reduced again, and therefore the deployment scheme of GPU resources cannot be adjusted in real time according to the dynamic change of the application access flow, excessive computing resources cannot be released in time, and the service matching requirement cannot be met.
Disclosure of Invention
The main purpose of the application is to provide a method for dynamically deploying GPU resources, which aims to solve the technical problem that GPU resources cannot be adjusted in real time according to dynamic changes of application access flow.
The application provides a method for dynamically deploying GPU resources, which comprises the following steps:
acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched;
Predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;
calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;
acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data;
calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;
and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.
Preferably, the step of calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date includes:
Acquiring a response time threshold corresponding to the specified model;
calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).
Preferably, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the step of calculating a business effect score of the specified model corresponding to the specified date according to the response efficiency of the specified model corresponding to the specified date and the monitoring data of the statistical termination date includes:
calculating a GPU load state quantity according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio through a second calculation formula, wherein the second calculation formula is F= (a is Wa+b is Wb+c is Wc)/(Wa+Wb+wc), F represents the GPU load state quantity, a represents the display card utilization rate, a belongs to (0, 1, b represents the GPU utilization rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents the weight corresponding to the display card utilization rate, wb represents the weight corresponding to the GPU utilization rate, wc represents the weight corresponding to the temperature duty ratio, and Wa, wb and Wc are non-zero real numbers;
Acquiring a preset priority corresponding to the specified model;
and calculating a service effect score corresponding to the specified model according to the GPU load state quantity, the preset priority and the response efficiency by a third calculation formula, wherein the second calculation formula is that Y= (P x wp+U x Wu)/F, Y represents the service effect score, U represents the priority, U belongs to (0, 1), wp represents the weight corresponding to the response efficiency, wu represents the weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers.
Preferably, the step of dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date by controlling the number of service containers through the container cluster according to the service effect score corresponding to the specified model includes:
acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers;
comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively;
and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.
Preferably, the step of dynamically adjusting the GPU resource duty ratio corresponding to the specified model according to the numerical relation by controlling the number of service containers through a container cluster includes:
judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value;
if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold;
if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.
Preferably, the step of predicting the response time of the specified date corresponding to the specified model according to each of the history dates, the working day status corresponding to each of the history dates, and the service request amount on the day before each of the history dates, includes:
forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates;
Training the XGBoost model under an objective function by utilizing a training set of the XGBoost model;
judging whether an objective function of the XGBoost model is converged or not;
if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model;
and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.
The application also provides a device for dynamically deploying GPU resources, which comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring historical service data corresponding to a specified model in a system to be matched, the historical service data comprises a statistical termination day, each historical date, a working day state respectively corresponding to each historical date and a service request amount which is positioned on the day before each historical date, the specified model is any one of all models in the system to be matched, and the plurality of models share GPU resources in the system to be matched;
the prediction module is used for predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on the day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;
The first calculation module is used for calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;
the second acquisition module is used for acquiring the monitoring data of the specified model on the statistical termination day corresponding to the historical service data;
the second calculation module is used for calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;
and the matching module is used for dynamically matching the GPU resource duty ratio of the appointed model corresponding to the appointed date through controlling the number of the service containers through the container cluster according to the service effect score corresponding to the appointed model.
The present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.
The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above-described method.
According to the method and the device, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resource is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resource is achieved, and the use requirement of efficiently running each service is improved.
Drawings
FIG. 1 is a flow chart of a method for dynamically deploying GPU resources according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a system flow for dynamically deploying GPU resources according to one embodiment of the present application;
FIG. 3 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Referring to fig. 1, a method for dynamically deploying GPU resources according to an embodiment of the present application includes:
s1: acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched;
S2: predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;
s3: calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;
s4: acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data;
s5: calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;
s6: and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.
In the embodiment of the application, by setting the application layer resource management function module, the running states of a plurality of models sharing the GPU resource in the system to be matched are analyzed in real time, and the occupation ratio of each model to the GPU resource is adjusted in real time. The application layer resource management functional module comprises an operation effect scoring functional block, a GPU resource management functional block, a response time prediction functional block and a dynamic resource adjustment functional block, wherein the dynamic resource adjustment functional block carries out logic analysis by collecting real-time data of the operation effect scoring functional block, the GPU resource management functional block and the response time prediction functional block in real time, and transmits an analysis result to a Kubernetes scheduling functional block of a Kubernetes container cluster so as to adjust the GPU resource occupation ratio of each model in a GPU resource pool in real time through the Kubernetes scheduling functional block, and the adjusted GPU resource pool feeds back adjustment service information to the operation effect scoring functional block of the application layer resource management functional block in real time through service monitoring and hardware monitoring, thereby realizing real-time dynamic adjustment of GPU resource allocation to each model.
The GPU resource management functional block maintains a GPU resource pool, ensures that a plurality of models run smoothly at the same time, and distributes proper GPU resources to kubernetes clusters as much as possible. The response time prediction function block predicts the response time of a future day according to the historical data of the response time of the model. The dynamic resource adjusting functional block allocates more GPU resources for tasks needing resources urgently, and reduces the GPU resources for tasks which are not important and urgent. The Kubernetes scheduling function calls Kubernetes API to perform server resource monitoring and resource matching operations. Kubernetes can realize HPA (Horizontal Pod Autoscaling, automatic expansion and contraction capacity) through the current container service condition, and Kubernetes realizes the elastic expansion function of the container through the setting of HPA. For the container cluster in Kubernetes, the HPA can implement many automation functions, for example, when the traffic load in the container cluster increases, a new service container can be created to ensure the stable operation of the service system, and when the traffic load in the container cluster decreases, part of the service containers can be destroyed to reduce the resource waste. Current indicators of elastic stretch include: CPU, memory, concurrency number, and packet transfer size.
The service monitoring and hardware monitoring feedback service can feed back the service state of the depth model, the health state of the GPU resources, the allocation state of the GPU resources and the full utilization efficiency of the GPU card to the resource monitoring service, and simultaneously record monitoring data of each time.
And the operation effect scoring functional block respectively performs comprehensive scoring on the operation effect of each model in the service scene according to the response time of each model and the load condition of hardware resources, and more particularly reflects the operation state condition of each model. The response time prediction functional block predicts the response time of the model in the future adjacent to the time sequence of the termination day according to the historical data of the response time of each model, and provides important reference data for the operation effect of each model in the future, so that the capacity expansion or capacity reduction adjustment planning can be carried out on the GPU resources of each model more timely. The future date refers to the day after the statistical expiration date. In the embodiment of the application layer resource management function module and the kubernetes resource scheduling service module are decoupled, so that the dependence among the function modules is reduced, and the ductility and maintainability of a resource adjustment mechanism are increased.
In this embodiment of the present application, the historical service data includes a statistical expiration date corresponding to the historical service data, each historical date, a working day state corresponding to each historical date, and a service request amount located on a day before each historical date. The above-mentioned working day state indicates whether it is working day, and is working day marked as 0, otherwise marked as 1. The service request amount of the previous day is the previous day of a specific history date, and the history service data is continuous date data, as shown in table 1 below. The historical dates in table 1 are 2020/9/1 to 2020/9/15, the statistical ending date corresponding to the historical service data is 2020/9/15, and the appointed date is the adjacent date after the time sequence of the statistical ending date, namely 2020/9/16.
TABLE 1
Figure BDA0002854293480000071
According to the method and the device, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resources is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resources is achieved, and the use requirements of each service are efficiently operated.
Further, according to the response time of the specified date, a step S3 of calculating the response efficiency of the specified model corresponding to the specified date includes:
S31: acquiring a response time threshold corresponding to the specified model;
s32: calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).
In the embodiment of the application, the response efficiency of the model is obtained through calculation of (response time threshold-response time)/response time threshold, the response efficiency of the model directly influences the operation effect scoring of the model, and further influences the distribution of GPU resources, and the distribution of the GPU resources is in a normal state according to the response efficiency of each model. The response time threshold in this example was 0.22, which can be obtained by statistical analysis. The response time exceeds the response time threshold and the response efficiency is set to 0.1.
Further, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the step S5 of calculating a business effect score of the specified model corresponding to the specified date according to the response efficiency of the specified model corresponding to the specified date and the monitoring data of the statistical termination date includes:
S51: calculating a GPU load state quantity according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio through a second calculation formula, wherein the second calculation formula is F= (a is Wa+b is Wb+c is Wc)/(Wa+Wb+wc), F represents the GPU load state quantity, a represents the display card utilization rate, a belongs to (0, 1, b represents the GPU utilization rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents the weight corresponding to the display card utilization rate, wb represents the weight corresponding to the GPU utilization rate, wc represents the weight corresponding to the temperature duty ratio, and Wa, wb and Wc are non-zero real numbers;
s52: acquiring a preset priority corresponding to the specified model;
s53: and calculating a service effect score corresponding to the specified model according to the GPU load state quantity, the preset priority and the response efficiency by a third calculation formula, wherein the second calculation formula is that Y= (P x wp+U x Wu)/F, Y represents the service effect score, U represents the priority, U belongs to (0, 1), wp represents the weight corresponding to the response efficiency, wu represents the weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers.
In the embodiment of the application, the service effect score corresponding to the model is calculated according to the GPU load state quantity, the preset priority and the response efficiency so as to accurately evaluate the running state. The GPU load state quantity is related to the utilization rate of the display card, the utilization rate of the GPU and the temperature ratio, the utilization rate of the display memory and the utilization rate of the GPU are both hardware monitoring data, and are statistical average values within a period of time, and the statistical average values do not exceed a value 1. The temperature duty cycle is equal to the GPU hardware temperature divided by the GPU hardware temperature threshold, and has a non-zero real number value of 0 to 1. The preset priority is set by the service layer according to the urgency of the service, 1 is the highest priority, and in this embodiment, the priority is 0.6. The weight values can be obtained through experimental tests according to specific service scenarios, for example, in the embodiment of the application, wa is 2, wb is 3, wc is 1, wp is 6, and wu is 4. The parameter values except the business effect score are all non-zero real numbers from 0 to 1, and the count value exceeding 1 is 1. The higher the score of the business effect, the better the business effect, the more 10 points are total, and the more 10 points are set as 10 points.
Further, the step S6 of dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date according to the service effect score corresponding to the specified model by controlling the number of service containers through the container cluster includes:
s61: acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers;
s62: comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively;
s63: and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.
According to the embodiment of the application, according to the size relation between the service effect score and the preset capacity expansion threshold and capacity reduction threshold, how to dynamically adjust the GPU resource duty ratio corresponding to the specified model, namely whether to expand or reduce the capacity of the GPU resource duty ratio is determined.
Further, the step S63 of dynamically adjusting the GPU resource duty ratio corresponding to the specified model according to the numerical relationship by controlling the number of service containers through the container cluster includes:
s631: judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value;
S632: if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold;
s633: if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.
According to the embodiment of the application, the GPU resource is dynamically adjusted by comparing the service effect score with the preset capacity expansion and contraction threshold, and the adjustment rule is as follows: if the service effect score is smaller than the capacity expansion threshold, GPU resources are added to the model; if the business effect score is larger than the capacity reduction threshold value, reducing GPU resources for the model; and if the business effect score is between the capacity expansion threshold and the capacity contraction threshold, not performing GPU resource adjustment operation. For example, in the embodiment of the present application, the business effect score= (0.36×6+0.6×4)/0.8=5.7, the GPU resource expansion threshold is 7, the contraction threshold is 8, and the business effect score is smaller than the expansion threshold, so that GPU resources, that is, expansion, are added to the model of the embodiment. The model is in smoother and more effective operation through the capacity expansion operation of GPU resources.
Further, a step S2 of predicting a response time of a specified date corresponding to the specified model according to each of the history dates, a working day status corresponding to each of the history dates, and a service request amount located on a day before each of the history dates, includes:
s21: forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates;
s22: training the XGBoost model under an objective function by utilizing a training set of the XGBoost model;
s23: judging whether an objective function of the XGBoost model is converged or not;
s24: if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model;
s25: and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.
In the embodiment of the application, the future trend of model data is predicted according to the historical service data, so that the load condition of the model is predicted, and the allocation plan of GPU resources occupied by the model, namely capacity expansion or capacity shrinkage, is obtained. According to the embodiment of the application, through historical service data collection and processing, training data of an XGBoost (Extreme Gradient Boost, time sequence prediction) model is formed. For example, the data in table 1 are written as x for each history date, the working day status corresponding to each history date, and the service request amount of the previous day 1 、x 2 、x 3 The response time of the service request is recorded as y, and the XGBoost model is trained by training data serving as a model, so that the trained XGBoost model can predict future response time according to historical service data, and the load condition is predicted, so that whether the GPU expansion or contraction is carried out on the XGBoost model is determined. In the embodiment of the application, x is calculated for simplicity 3 Normalization processing was performed. The normalization formula is as follows:
Figure BDA0002854293480000101
wherein X is norm Represents x 3 Normalized values, X tableX to be normalized is shown 3 ,X min And X max Respectively all x 3 Is a minimum and a maximum of (a).
According to the embodiment of the application, according to the AI application scene, the expression of the XGBoost model objective function is set as follows:
Figure BDA0002854293480000111
wherein y is i Is true value +.>
Figure BDA0002854293480000112
For the predicted value, the above->
Figure BDA0002854293480000113
Is the accumulated output of the entire model. The objective function is divided into two parts: a loss function which reveals model training errors, i.e. differences between predicted and actual values, and a regularization term>
Figure BDA0002854293480000114
Is a function representing the complexity of the tree, the smaller the value, the lower the complexity, the stronger the generalization capability, and the expression is +.>
Figure BDA0002854293480000115
T represents the number of leaf nodes, gamma controls the number of leaf nodes, and omega represents the fraction of leaf nodes. The training targets are that the prediction error is as small as possible, the leaf nodes T are as few as possible, the leaf node values omega are as low as possible, namely the score of the lambda control leaf nodes is not too large, so that the overfitting is prevented. According to the embodiment of the application, through repeated iterative training, the optimal parameters of the XGBoost model obtained during training convergence are as follows: learning_rate 0.085; n_evastiators 500; max_depth 5;
min_child_weight:1;subsample:0.75;colsample_bytree:0.8;gamma:0;reg_alpha:0;reg_lambda:1。
For example, the XGBoost model trained by the historical business data of 2020, 9, 1 and 15 in table 1 above predicts a model response time of 0.3S for 16, 9 and 2020, and the specific results are shown in table 2 below. And then substituting the predicted response time into the first calculation formula, and combining the second calculation formula and the third calculation formula to calculate and obtain the business effect score of 9 months and 16 days in 2020 so as to adjust the CPU resource occupation ratio according to the business effect score.
TABLE 2
Date of day Whether or not to work day Number of requests of previous day RT
2020/9/16 0 39064 0.30
Referring to fig. 2, an apparatus for dynamically deploying GPU resources according to an embodiment of the present application includes:
a first obtaining module 1, configured to obtain historical service data corresponding to a specified model in a system to be matched, where the historical service data includes a statistical termination day corresponding to the historical service data, each historical date, a working day state corresponding to each historical date, and a service request amount located on a day before each historical date, the system to be matched includes multiple models sharing GPU resources, and the specified model is any one of all models in the system to be matched;
a prediction module 2, configured to predict a response time of a specified date corresponding to the specified model according to each of the historical dates, a working day state corresponding to each of the historical dates, and a service request amount located on a day before each of the historical dates, where the specified date is a date adjacent to a statistical termination day corresponding to the historical service data and located after a time sequence of the statistical termination day corresponding to the historical service data;
A first calculation module 3, configured to calculate response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;
the second obtaining module 4 is configured to obtain monitoring data of the specified model on a statistical termination day corresponding to the historical service data;
a second calculation module 5, configured to calculate a business effect score corresponding to the specified date by the specified model according to the response efficiency corresponding to the specified date by the specified model and the monitoring data of the statistical termination date;
and the matching module 6 is used for controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.
The relevant explanation of the embodiments of the present application refers to the corresponding method parts and is not repeated.
Further, the first computing module 3 includes:
the first acquisition unit is used for acquiring a response time threshold value corresponding to the specified model;
a first calculation unit configured to calculate, according to a response time of the specified date and the response time threshold, a response efficiency of the specified model corresponding to the specified date by a first calculation formula, where the first calculation formula is p= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).
Further, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the second computing module 5 includes:
the second calculation unit is configured to calculate a GPU load state quantity according to the graphics card usage rate, the GPU usage rate and the temperature duty ratio according to a second calculation formula, where the second calculation formula is f= (a×wa+b×wb+c×wc)/(wa+wb+wc), F represents the GPU load state quantity, a represents the graphics card usage rate, a belongs to (0, 1), b represents the GPU usage rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents a weight corresponding to the graphics card usage rate, wb represents a weight corresponding to the GPU usage rate, wc represents a weight corresponding to the temperature duty ratio, wa, wb and Wc are non-zero real numbers;
the second acquisition unit is used for acquiring a preset priority corresponding to the specified model;
and a third calculation unit, configured to calculate, according to the GPU load state quantity, the preset priority, and the response efficiency, a service effect score corresponding to the specified model according to a third calculation formula, where the second calculation formula is y= (p×wp+u×wu)/F, Y represents the service effect score, U represents the priority, U belongs to (0, 1), wp represents a weight corresponding to the response efficiency, wu represents a weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers.
Further, the matching module 6 includes:
the third acquisition unit is used for acquiring a preset capacity expansion threshold and a preset capacity reduction threshold, wherein the capacity expansion threshold is smaller than the capacity reduction threshold, and the capacity expansion threshold and the capacity reduction threshold are non-zero real numbers;
the comparison unit is used for comparing the numerical relation between the business effect scores and the capacity expansion threshold and the capacity contraction threshold respectively;
and the adjusting unit is used for controlling the number of the service containers through the container cluster according to the numerical relation and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.
Further, the adjusting unit includes:
the first judging subunit is used for judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value;
the second judging subunit is configured to increase a GPU resource duty ratio corresponding to the specified model by creating a specified service container if the service effect score is smaller than the capacity expansion threshold, and judge whether the numerical relationship is that the service effect score is greater than the capacity reduction threshold if the service effect score is not smaller than the capacity expansion threshold;
and the adjustment subunit is used for reducing the GPU resource duty ratio corresponding to the appointed model by destroying the appointed service container if the service effect score is larger than the capacity reduction threshold, otherwise, not adjusting the GPU resource duty ratio corresponding to the appointed model.
Further, the prediction module 2 includes:
the composition unit is used for composing the history dates, the working day states corresponding to the history dates and the service request quantity which is positioned on the day before the history dates into a training set of the XGBoost model;
the training unit is used for training the XGBoost model under an objective function by utilizing the training set of the XGBoost model;
the judging unit is used for judging whether the objective function of the XGBoost model is converged or not;
the input unit is used for counting the response time of the termination day of the historical service data of the appointed model and inputting the response time into the XGBoost model if the historical service data of the appointed model is converged;
and a fourth obtaining unit, configured to obtain the response time of the XGBoost model on the termination day according to the historical service data statistics of the specified model, where the predicted response time of the specified model on the specified date is obtained.
Referring to fig. 3, a computer device is further provided in the embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store all the data needed for the process of dynamically deploying GPU resources. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for dynamically deploying GPU resources.
The method for dynamically deploying GPU resources by the processor comprises the following steps: acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched; predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data; calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date; acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data; calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date; and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.
According to the computer equipment, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resources is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resources is achieved, and the use requirement of efficiently running each service is improved.
In one embodiment, the step of calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date by the processor includes: acquiring a response time threshold corresponding to the specified model; calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).
In one embodiment, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the step of calculating, by the processor, a business effect score of the specified model corresponding to the specified date according to the response efficiency of the specified model corresponding to the specified date, and the monitoring data of the statistical termination date includes: calculating a GPU load state quantity according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio through a second calculation formula, wherein the second calculation formula is F= (a, wa+b, wb+c)/(Wa+Wb+wc), F represents the GPU load state quantity, a represents the display card utilization rate, a belongs to (0, 1), b represents the GPU utilization rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents the weight corresponding to the display card utilization rate, wb represents the weight corresponding to the GPU utilization rate, wc represents the weight corresponding to the temperature duty ratio, wa, wb and Wc are non-zero real numbers, acquiring a preset priority corresponding to the specified model, and calculating a service effect score corresponding to the specified model through a third calculation formula according to the GPU load state quantity, the preset priority and the response efficiency, wherein the second calculation formula is Y= (U, U+U represents the preset weight, and Wu represents the service effect score corresponding to the zero, and Wu represents the corresponding to the real number, and Wu represents the service effect score corresponding to the weight corresponding to the zero.
In one embodiment, the step of dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date by controlling the number of service containers through the container cluster according to the business effect score corresponding to the specified model includes: acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers; comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively; and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.
In one embodiment, the step of dynamically adjusting the GPU resource occupancy ratio corresponding to the specified model by the processor according to the numerical relation and by controlling the number of service containers through the container cluster includes: judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value; if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold; if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.
In one embodiment, the step of predicting, by the processor, a response time of a specified date corresponding to the specified model according to each of the history dates, a working day status corresponding to each of the history dates, and a service request amount on a day before each of the history dates, includes: forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates; training the XGBoost model under an objective function by utilizing a training set of the XGBoost model; judging whether an objective function of the XGBoost model is converged or not; if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model; and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present application and is not intended to limit the computer device to which the present application is applied.
An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for dynamically deploying GPU resources, comprising: acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched; predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data; calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date; acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data; calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date; and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.
According to the computer readable storage medium, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resource is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resource is achieved, and the use requirement of efficiently running each service is improved.
In one embodiment, the step of calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date by the processor includes: acquiring a response time threshold corresponding to the specified model; calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).
In one embodiment, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the step of calculating, by the processor, a business effect score of the specified model corresponding to the specified date according to the response efficiency of the specified model corresponding to the specified date, and the monitoring data of the statistical termination date includes: calculating a GPU load state quantity according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio through a second calculation formula, wherein the second calculation formula is F= (a, wa+b, wb+c)/(Wa+Wb+wc), F represents the GPU load state quantity, a represents the display card utilization rate, a belongs to (0, 1), b represents the GPU utilization rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents the weight corresponding to the display card utilization rate, wb represents the weight corresponding to the GPU utilization rate, wc represents the weight corresponding to the temperature duty ratio, wa, wb and Wc are non-zero real numbers, acquiring a preset priority corresponding to the specified model, and calculating a service effect score corresponding to the specified model through a third calculation formula according to the GPU load state quantity, the preset priority and the response efficiency, wherein the second calculation formula is Y= (U, U+U represents the preset weight, and Wu represents the service effect score corresponding to the zero, and Wu represents the corresponding to the real number, and Wu represents the service effect score corresponding to the weight corresponding to the zero.
In one embodiment, the step of dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date by controlling the number of service containers through the container cluster according to the business effect score corresponding to the specified model includes: acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers; comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively; and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.
In one embodiment, the step of dynamically adjusting the GPU resource occupancy ratio corresponding to the specified model by the processor according to the numerical relation and by controlling the number of service containers through the container cluster includes: judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value; if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold; if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.
In one embodiment, the step of predicting, by the processor, a response time of a specified date corresponding to the specified model according to each of the history dates, a working day status corresponding to each of the history dates, and a service request amount on a day before each of the history dates, includes: forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates; training the XGBoost model under an objective function by utilizing a training set of the XGBoost model; judging whether an objective function of the XGBoost model is converged or not; if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model; and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims (9)

1. A method for dynamically deploying GPU resources, comprising:
acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched;
Predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;
calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;
acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data;
calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;
according to the service effect scores corresponding to the specified models, controlling the number of service containers through container clusters, and dynamically matching the GPU resource duty ratio of the specified dates corresponding to the specified models;
the step of calculating the service effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitored data of the statistics termination date, wherein the monitored data comprises the utilization rate of a display card, the utilization rate of a GPU and the temperature ratio, and the step of calculating the service effect score of the appointed model corresponding to the appointed date comprises the following steps:
Calculating a GPU load state quantity according to a second calculation formula according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio, wherein the second calculation formula is F= (a, wa+b, wb+c, wc) or F = (a, wa+b, wb+c, wc+c) or F =
(wa+wb+wc), F represents a GPU load state quantity, a represents a graphics card usage, a belongs to (0, 1], b represents a GPU usage, b belongs to (0, 1), c represents a temperature duty ratio, c belongs to (0, 1), wa represents a weight corresponding to the graphics card usage, wb represents a weight corresponding to the GPU usage, wc represents a weight corresponding to the temperature duty ratio, wa, wb, and Wc are non-zero real numbers;
acquiring a preset priority corresponding to the specified model;
calculating a service effect score corresponding to the specified model according to the GPU load state quantity, the preset priority and the response efficiency through a third calculation formula, wherein the second calculation formula is that Y= (P+U+wu)/F, Y represents the service effect score, U represents the priority, and U belongs to (0, 1)]Wp represents the weight corresponding to the response efficiency, wu represents the weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers
2. The method of dynamically deploying GPU resources according to claim 1, wherein the step of calculating the response efficiency of the specified model for the specified date based on the response time of the specified date comprises:
Acquiring a response time threshold corresponding to the specified model;
calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).
3. The method for dynamically deploying GPU resources according to claim 1, wherein the step of dynamically matching the GPU resource duty ratio of the specified model to the specified date by controlling the number of service containers through the container cluster according to the business effect score corresponding to the specified model comprises:
acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers;
comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively;
and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.
4. A method for dynamically deploying GPU resources according to claim 3, wherein the step of dynamically adjusting the GPU resource duty ratio corresponding to the specified model by controlling the number of service containers through a container cluster according to the numerical relationship comprises:
judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value;
if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold;
if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.
5. The method for dynamically deploying GPU resources according to claim 1, wherein the step of predicting the response time of the specified date corresponding to the specified model according to each of the history dates, the weekday status corresponding to each of the history dates, and the service request amount on the day before each of the history dates comprises:
Forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates;
training the XGBoost model under an objective function by utilizing a training set of the XGBoost model;
judging whether an objective function of the XGBoost model is converged or not;
if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model;
and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.
6. An apparatus for dynamically deploying GPU resources, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring historical service data corresponding to a specified model in a system to be matched, the historical service data comprises a statistical termination day, each historical date, a working day state respectively corresponding to each historical date and a service request amount which is positioned on the day before each historical date, the specified model is any one of all models in the system to be matched, and the plurality of models share GPU resources in the system to be matched;
The prediction module is used for predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on the day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;
the first calculation module is used for calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;
the second acquisition module is used for acquiring the monitoring data of the specified model on the statistical termination day corresponding to the historical service data;
the second calculation module is used for calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;
and the matching module is used for dynamically matching the GPU resource duty ratio of the appointed model corresponding to the appointed date through controlling the number of the service containers through the container cluster according to the service effect score corresponding to the appointed model.
7. The apparatus for dynamically deploying GPU resources according to claim 6, wherein the first computing module comprises:
the first acquisition unit is used for acquiring a response time threshold value corresponding to the specified model;
a first calculation unit configured to calculate, according to a response time of the specified date and the response time threshold, a response efficiency of the specified model corresponding to the specified date by a first calculation formula, where the first calculation formula is p= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.
CN202011538689.5A 2020-12-23 2020-12-23 Method and device for dynamically deploying GPU resources and computer equipment Active CN112559191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011538689.5A CN112559191B (en) 2020-12-23 2020-12-23 Method and device for dynamically deploying GPU resources and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011538689.5A CN112559191B (en) 2020-12-23 2020-12-23 Method and device for dynamically deploying GPU resources and computer equipment

Publications (2)

Publication Number Publication Date
CN112559191A CN112559191A (en) 2021-03-26
CN112559191B true CN112559191B (en) 2023-04-25

Family

ID=75030960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011538689.5A Active CN112559191B (en) 2020-12-23 2020-12-23 Method and device for dynamically deploying GPU resources and computer equipment

Country Status (1)

Country Link
CN (1) CN112559191B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568741A (en) * 2021-07-19 2021-10-29 咪咕文化科技有限公司 Service expansion and contraction method, device, equipment and storage medium of distributed system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766182A (en) * 2018-12-18 2019-05-17 平安科技(深圳)有限公司 The scalable appearance method, apparatus of system resource dynamic, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2887219A1 (en) * 2013-12-23 2015-06-24 Deutsche Telekom AG System and method for mobile augmented reality task scheduling
CN107077385B (en) * 2014-09-10 2019-10-25 亚马逊技术公司 For reducing system, method and the storage medium of calculated examples starting time
CN106549772B (en) * 2015-09-16 2019-11-19 华为技术有限公司 Resource prediction method, system and capacity management device
US10942776B2 (en) * 2016-09-21 2021-03-09 Accenture Global Solutions Limited Dynamic resource allocation for application containers
CN109714395B (en) * 2018-12-10 2021-10-26 平安科技(深圳)有限公司 Cloud platform resource use prediction method and terminal equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766182A (en) * 2018-12-18 2019-05-17 平安科技(深圳)有限公司 The scalable appearance method, apparatus of system resource dynamic, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112559191A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN104317658B (en) A kind of loaded self-adaptive method for scheduling task based on MapReduce
CN104298550A (en) Hadoop-oriented dynamic scheduling method
CN105320559A (en) Scheduling method and device of cloud computing system
CN104462432A (en) Self-adaptive distributed computing method
CN112559191B (en) Method and device for dynamically deploying GPU resources and computer equipment
CN108509280A (en) A kind of Distributed Calculation cluster locality dispatching method based on push model
CN115689207A (en) Wind power plant operation and maintenance management method and device, computer equipment and storage medium
CN110196773B (en) Multi-time-scale security check system and method for unified scheduling computing resources
CN113918341A (en) Equipment scheduling method, device, equipment and storage medium
CN111813524B (en) Task execution method and device, electronic equipment and storage medium
CN111105050B (en) Fan maintenance plan generation method, device, equipment and storage medium
CN116662014A (en) Task allocation method, device, equipment and medium
CN113742059B (en) Task allocation method, device, computer equipment and storage medium
CN115514020A (en) Cross-region power scheduling method and device, computer equipment and storage medium
CN115375199A (en) Long-distance intelligent water supply scheduling method and system
CN113988804A (en) Digital management method, device, equipment and storage medium for whole engineering construction process
CN112817721A (en) Task scheduling method and device based on artificial intelligence, computer equipment and medium
CN117573382B (en) Data acquisition task arrangement method and device
CN113742051A (en) Data center equipment method and device, computer equipment and storage medium
CN111354449A (en) Long-term care strategy distribution method and device, computer equipment and storage medium
CN115686865B (en) Super computing node resource distribution system based on multi-scene application
Tărăbuţă Use of “Petri nets system” concept in modeling dynamics with increased complexity
CN116822877B (en) Knowledge graph-based water resource distribution method, device and medium
CN117094539B (en) Power-preserving intelligent industrial personal management control method, system, equipment and storage medium
Pace et al. Dynamic Resource Shaping for Compute Clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant