CN112559191B

CN112559191B - Method and device for dynamically deploying GPU resources and computer equipment

Info

Publication number: CN112559191B
Application number: CN202011538689.5A
Authority: CN
Inventors: 孙浩鑫; 王晟宇; 赖众程; 李会璟; 李骁
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2023-04-25
Anticipated expiration: 2040-12-23
Also published as: CN112559191A

Abstract

The application relates to the field of big data, and discloses a method for dynamically deploying GPU resources, which comprises the following steps: acquiring historical service data corresponding to a specified model in a system to be matched; predicting response time of a specified date corresponding to the specified model according to each historical date, the working day state corresponding to each historical date and the service request quantity of the day before each historical date; calculating response efficiency of the specified model corresponding to the specified date according to the response time of the specified date; acquiring monitoring data of a specified model on a statistical termination day corresponding to historical service data; calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date; and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified models, and dynamically matching the GPU resource duty ratio of the specified date corresponding to the specified models. And reasonably distributing the GPU resources according to the deployment state of the dynamic adjustment GPU resources.

Description

Method and device for dynamically deploying GPU resources and computer equipment

Technical Field

The present application relates to the field of big data, and in particular, to a method, an apparatus, and a computer device for dynamically deploying GPU resources.

Background

With the rapid growth of internet services, the service access volume and data traffic are rapidly increased, the demand for system computing resources is correspondingly increased, and the deployment mode of the GPU graphics card serving as a key resource for application computing directly influences the progress state of the services. Under the condition that AI engineering is still in the primary stage at present, the deployment of GPU resources in AI application computation generally depends on manual adjustment, the deployment of GPU resources is solidified, the deployment needs to be manually expanded when the application access flow is increased, excessive GPU resource waste easily exists when the flow is reduced again, and therefore the deployment scheme of GPU resources cannot be adjusted in real time according to the dynamic change of the application access flow, excessive computing resources cannot be released in time, and the service matching requirement cannot be met.

Disclosure of Invention

The main purpose of the application is to provide a method for dynamically deploying GPU resources, which aims to solve the technical problem that GPU resources cannot be adjusted in real time according to dynamic changes of application access flow.

The application provides a method for dynamically deploying GPU resources, which comprises the following steps:

acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched;

Predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;

calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;

acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data;

calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;

and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.

Preferably, the step of calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date includes:

Acquiring a response time threshold corresponding to the specified model;

calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).

Preferably, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the step of calculating a business effect score of the specified model corresponding to the specified date according to the response efficiency of the specified model corresponding to the specified date and the monitoring data of the statistical termination date includes:

calculating a GPU load state quantity according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio through a second calculation formula, wherein the second calculation formula is F= (a is Wa+b is Wb+c is Wc)/(Wa+Wb+wc), F represents the GPU load state quantity, a represents the display card utilization rate, a belongs to (0, 1, b represents the GPU utilization rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents the weight corresponding to the display card utilization rate, wb represents the weight corresponding to the GPU utilization rate, wc represents the weight corresponding to the temperature duty ratio, and Wa, wb and Wc are non-zero real numbers;

Acquiring a preset priority corresponding to the specified model;

and calculating a service effect score corresponding to the specified model according to the GPU load state quantity, the preset priority and the response efficiency by a third calculation formula, wherein the second calculation formula is that Y= (P x wp+U x Wu)/F, Y represents the service effect score, U represents the priority, U belongs to (0, 1), wp represents the weight corresponding to the response efficiency, wu represents the weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers.

Preferably, the step of dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date by controlling the number of service containers through the container cluster according to the service effect score corresponding to the specified model includes:

acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers;

comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively;

and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.

Preferably, the step of dynamically adjusting the GPU resource duty ratio corresponding to the specified model according to the numerical relation by controlling the number of service containers through a container cluster includes:

judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value;

if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold;

if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.

Preferably, the step of predicting the response time of the specified date corresponding to the specified model according to each of the history dates, the working day status corresponding to each of the history dates, and the service request amount on the day before each of the history dates, includes:

forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates;

Training the XGBoost model under an objective function by utilizing a training set of the XGBoost model;

judging whether an objective function of the XGBoost model is converged or not;

if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model;

and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.

The application also provides a device for dynamically deploying GPU resources, which comprises:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring historical service data corresponding to a specified model in a system to be matched, the historical service data comprises a statistical termination day, each historical date, a working day state respectively corresponding to each historical date and a service request amount which is positioned on the day before each historical date, the specified model is any one of all models in the system to be matched, and the plurality of models share GPU resources in the system to be matched;

the prediction module is used for predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on the day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;

The first calculation module is used for calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;

the second acquisition module is used for acquiring the monitoring data of the specified model on the statistical termination day corresponding to the historical service data;

the second calculation module is used for calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;

and the matching module is used for dynamically matching the GPU resource duty ratio of the appointed model corresponding to the appointed date through controlling the number of the service containers through the container cluster according to the service effect score corresponding to the appointed model.

The present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above-described method.

According to the method and the device, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resource is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resource is achieved, and the use requirement of efficiently running each service is improved.

Drawings

FIG. 1 is a flow chart of a method for dynamically deploying GPU resources according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a system flow for dynamically deploying GPU resources according to one embodiment of the present application;

FIG. 3 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Referring to fig. 1, a method for dynamically deploying GPU resources according to an embodiment of the present application includes:

s1: acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched;

S2: predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data;

s3: calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;

s4: acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data;

s5: calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date;

s6: and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.

In the embodiment of the application, by setting the application layer resource management function module, the running states of a plurality of models sharing the GPU resource in the system to be matched are analyzed in real time, and the occupation ratio of each model to the GPU resource is adjusted in real time. The application layer resource management functional module comprises an operation effect scoring functional block, a GPU resource management functional block, a response time prediction functional block and a dynamic resource adjustment functional block, wherein the dynamic resource adjustment functional block carries out logic analysis by collecting real-time data of the operation effect scoring functional block, the GPU resource management functional block and the response time prediction functional block in real time, and transmits an analysis result to a Kubernetes scheduling functional block of a Kubernetes container cluster so as to adjust the GPU resource occupation ratio of each model in a GPU resource pool in real time through the Kubernetes scheduling functional block, and the adjusted GPU resource pool feeds back adjustment service information to the operation effect scoring functional block of the application layer resource management functional block in real time through service monitoring and hardware monitoring, thereby realizing real-time dynamic adjustment of GPU resource allocation to each model.

The GPU resource management functional block maintains a GPU resource pool, ensures that a plurality of models run smoothly at the same time, and distributes proper GPU resources to kubernetes clusters as much as possible. The response time prediction function block predicts the response time of a future day according to the historical data of the response time of the model. The dynamic resource adjusting functional block allocates more GPU resources for tasks needing resources urgently, and reduces the GPU resources for tasks which are not important and urgent. The Kubernetes scheduling function calls Kubernetes API to perform server resource monitoring and resource matching operations. Kubernetes can realize HPA (Horizontal Pod Autoscaling, automatic expansion and contraction capacity) through the current container service condition, and Kubernetes realizes the elastic expansion function of the container through the setting of HPA. For the container cluster in Kubernetes, the HPA can implement many automation functions, for example, when the traffic load in the container cluster increases, a new service container can be created to ensure the stable operation of the service system, and when the traffic load in the container cluster decreases, part of the service containers can be destroyed to reduce the resource waste. Current indicators of elastic stretch include: CPU, memory, concurrency number, and packet transfer size.

The service monitoring and hardware monitoring feedback service can feed back the service state of the depth model, the health state of the GPU resources, the allocation state of the GPU resources and the full utilization efficiency of the GPU card to the resource monitoring service, and simultaneously record monitoring data of each time.

And the operation effect scoring functional block respectively performs comprehensive scoring on the operation effect of each model in the service scene according to the response time of each model and the load condition of hardware resources, and more particularly reflects the operation state condition of each model. The response time prediction functional block predicts the response time of the model in the future adjacent to the time sequence of the termination day according to the historical data of the response time of each model, and provides important reference data for the operation effect of each model in the future, so that the capacity expansion or capacity reduction adjustment planning can be carried out on the GPU resources of each model more timely. The future date refers to the day after the statistical expiration date. In the embodiment of the application layer resource management function module and the kubernetes resource scheduling service module are decoupled, so that the dependence among the function modules is reduced, and the ductility and maintainability of a resource adjustment mechanism are increased.

In this embodiment of the present application, the historical service data includes a statistical expiration date corresponding to the historical service data, each historical date, a working day state corresponding to each historical date, and a service request amount located on a day before each historical date. The above-mentioned working day state indicates whether it is working day, and is working day marked as 0, otherwise marked as 1. The service request amount of the previous day is the previous day of a specific history date, and the history service data is continuous date data, as shown in table 1 below. The historical dates in table 1 are 2020/9/1 to 2020/9/15, the statistical ending date corresponding to the historical service data is 2020/9/15, and the appointed date is the adjacent date after the time sequence of the statistical ending date, namely 2020/9/16.

TABLE 1

According to the method and the device, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resources is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resources is achieved, and the use requirements of each service are efficiently operated.

Further, according to the response time of the specified date, a step S3 of calculating the response efficiency of the specified model corresponding to the specified date includes:

S31: acquiring a response time threshold corresponding to the specified model;

s32: calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).

In the embodiment of the application, the response efficiency of the model is obtained through calculation of (response time threshold-response time)/response time threshold, the response efficiency of the model directly influences the operation effect scoring of the model, and further influences the distribution of GPU resources, and the distribution of the GPU resources is in a normal state according to the response efficiency of each model. The response time threshold in this example was 0.22, which can be obtained by statistical analysis. The response time exceeds the response time threshold and the response efficiency is set to 0.1.

Further, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the step S5 of calculating a business effect score of the specified model corresponding to the specified date according to the response efficiency of the specified model corresponding to the specified date and the monitoring data of the statistical termination date includes:

S51: calculating a GPU load state quantity according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio through a second calculation formula, wherein the second calculation formula is F= (a is Wa+b is Wb+c is Wc)/(Wa+Wb+wc), F represents the GPU load state quantity, a represents the display card utilization rate, a belongs to (0, 1, b represents the GPU utilization rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents the weight corresponding to the display card utilization rate, wb represents the weight corresponding to the GPU utilization rate, wc represents the weight corresponding to the temperature duty ratio, and Wa, wb and Wc are non-zero real numbers;

s52: acquiring a preset priority corresponding to the specified model;

s53: and calculating a service effect score corresponding to the specified model according to the GPU load state quantity, the preset priority and the response efficiency by a third calculation formula, wherein the second calculation formula is that Y= (P x wp+U x Wu)/F, Y represents the service effect score, U represents the priority, U belongs to (0, 1), wp represents the weight corresponding to the response efficiency, wu represents the weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers.

In the embodiment of the application, the service effect score corresponding to the model is calculated according to the GPU load state quantity, the preset priority and the response efficiency so as to accurately evaluate the running state. The GPU load state quantity is related to the utilization rate of the display card, the utilization rate of the GPU and the temperature ratio, the utilization rate of the display memory and the utilization rate of the GPU are both hardware monitoring data, and are statistical average values within a period of time, and the statistical average values do not exceed a value 1. The temperature duty cycle is equal to the GPU hardware temperature divided by the GPU hardware temperature threshold, and has a non-zero real number value of 0 to 1. The preset priority is set by the service layer according to the urgency of the service, 1 is the highest priority, and in this embodiment, the priority is 0.6. The weight values can be obtained through experimental tests according to specific service scenarios, for example, in the embodiment of the application, wa is 2, wb is 3, wc is 1, wp is 6, and wu is 4. The parameter values except the business effect score are all non-zero real numbers from 0 to 1, and the count value exceeding 1 is 1. The higher the score of the business effect, the better the business effect, the more 10 points are total, and the more 10 points are set as 10 points.

Further, the step S6 of dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date according to the service effect score corresponding to the specified model by controlling the number of service containers through the container cluster includes:

s61: acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers;

s62: comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively;

s63: and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.

According to the embodiment of the application, according to the size relation between the service effect score and the preset capacity expansion threshold and capacity reduction threshold, how to dynamically adjust the GPU resource duty ratio corresponding to the specified model, namely whether to expand or reduce the capacity of the GPU resource duty ratio is determined.

Further, the step S63 of dynamically adjusting the GPU resource duty ratio corresponding to the specified model according to the numerical relationship by controlling the number of service containers through the container cluster includes:

s631: judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value;

S632: if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold;

s633: if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.

According to the embodiment of the application, the GPU resource is dynamically adjusted by comparing the service effect score with the preset capacity expansion and contraction threshold, and the adjustment rule is as follows: if the service effect score is smaller than the capacity expansion threshold, GPU resources are added to the model; if the business effect score is larger than the capacity reduction threshold value, reducing GPU resources for the model; and if the business effect score is between the capacity expansion threshold and the capacity contraction threshold, not performing GPU resource adjustment operation. For example, in the embodiment of the present application, the business effect score= (0.36×6+0.6×4)/0.8=5.7, the GPU resource expansion threshold is 7, the contraction threshold is 8, and the business effect score is smaller than the expansion threshold, so that GPU resources, that is, expansion, are added to the model of the embodiment. The model is in smoother and more effective operation through the capacity expansion operation of GPU resources.

Further, a step S2 of predicting a response time of a specified date corresponding to the specified model according to each of the history dates, a working day status corresponding to each of the history dates, and a service request amount located on a day before each of the history dates, includes:

s21: forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates;

s22: training the XGBoost model under an objective function by utilizing a training set of the XGBoost model;

s23: judging whether an objective function of the XGBoost model is converged or not;

s24: if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model;

s25: and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.

In the embodiment of the application, the future trend of model data is predicted according to the historical service data, so that the load condition of the model is predicted, and the allocation plan of GPU resources occupied by the model, namely capacity expansion or capacity shrinkage, is obtained. According to the embodiment of the application, through historical service data collection and processing, training data of an XGBoost (Extreme Gradient Boost, time sequence prediction) model is formed. For example, the data in table 1 are written as x for each history date, the working day status corresponding to each history date, and the service request amount of the previous day ₁ 、x ₂ 、x ₃ The response time of the service request is recorded as y, and the XGBoost model is trained by training data serving as a model, so that the trained XGBoost model can predict future response time according to historical service data, and the load condition is predicted, so that whether the GPU expansion or contraction is carried out on the XGBoost model is determined. In the embodiment of the application, x is calculated for simplicity ₃ Normalization processing was performed. The normalization formula is as follows:

wherein X is _norm Represents x ₃ Normalized values, X tableX to be normalized is shown ₃ ，X _min And X _max Respectively all x ₃ Is a minimum and a maximum of (a).

According to the embodiment of the application, according to the AI application scene, the expression of the XGBoost model objective function is set as follows:

wherein y is _i Is true value +.>

For the predicted value, the above->

Is the accumulated output of the entire model. The objective function is divided into two parts: a loss function which reveals model training errors, i.e. differences between predicted and actual values, and a regularization term>

Is a function representing the complexity of the tree, the smaller the value, the lower the complexity, the stronger the generalization capability, and the expression is +.>

T represents the number of leaf nodes, gamma controls the number of leaf nodes, and omega represents the fraction of leaf nodes. The training targets are that the prediction error is as small as possible, the leaf nodes T are as few as possible, the leaf node values omega are as low as possible, namely the score of the lambda control leaf nodes is not too large, so that the overfitting is prevented. According to the embodiment of the application, through repeated iterative training, the optimal parameters of the XGBoost model obtained during training convergence are as follows: learning_rate 0.085; n_evastiators 500; max_depth 5;

min_child_weight:1；subsample:0.75；colsample_bytree:0.8；gamma:0；reg_alpha:0；reg_lambda:1。

For example, the XGBoost model trained by the historical business data of 2020, 9, 1 and 15 in table 1 above predicts a model response time of 0.3S for 16, 9 and 2020, and the specific results are shown in table 2 below. And then substituting the predicted response time into the first calculation formula, and combining the second calculation formula and the third calculation formula to calculate and obtain the business effect score of 9 months and 16 days in 2020 so as to adjust the CPU resource occupation ratio according to the business effect score.

TABLE 2

Date of day	Whether or not to work day	Number of requests of previous day	RT
				2020/9/16	0	39064	0.30

Referring to fig. 2, an apparatus for dynamically deploying GPU resources according to an embodiment of the present application includes:

a first obtaining module 1, configured to obtain historical service data corresponding to a specified model in a system to be matched, where the historical service data includes a statistical termination day corresponding to the historical service data, each historical date, a working day state corresponding to each historical date, and a service request amount located on a day before each historical date, the system to be matched includes multiple models sharing GPU resources, and the specified model is any one of all models in the system to be matched;

a prediction module 2, configured to predict a response time of a specified date corresponding to the specified model according to each of the historical dates, a working day state corresponding to each of the historical dates, and a service request amount located on a day before each of the historical dates, where the specified date is a date adjacent to a statistical termination day corresponding to the historical service data and located after a time sequence of the statistical termination day corresponding to the historical service data;

A first calculation module 3, configured to calculate response efficiency of the specified model corresponding to the specified date according to the response time of the specified date;

the second obtaining module 4 is configured to obtain monitoring data of the specified model on a statistical termination day corresponding to the historical service data;

a second calculation module 5, configured to calculate a business effect score corresponding to the specified date by the specified model according to the response efficiency corresponding to the specified date by the specified model and the monitoring data of the statistical termination date;

and the matching module 6 is used for controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.

The relevant explanation of the embodiments of the present application refers to the corresponding method parts and is not repeated.

Further, the first computing module 3 includes:

the first acquisition unit is used for acquiring a response time threshold value corresponding to the specified model;

a first calculation unit configured to calculate, according to a response time of the specified date and the response time threshold, a response efficiency of the specified model corresponding to the specified date by a first calculation formula, where the first calculation formula is p= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).

Further, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the second computing module 5 includes:

the second calculation unit is configured to calculate a GPU load state quantity according to the graphics card usage rate, the GPU usage rate and the temperature duty ratio according to a second calculation formula, where the second calculation formula is f= (a×wa+b×wb+c×wc)/(wa+wb+wc), F represents the GPU load state quantity, a represents the graphics card usage rate, a belongs to (0, 1), b represents the GPU usage rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents a weight corresponding to the graphics card usage rate, wb represents a weight corresponding to the GPU usage rate, wc represents a weight corresponding to the temperature duty ratio, wa, wb and Wc are non-zero real numbers;

the second acquisition unit is used for acquiring a preset priority corresponding to the specified model;

and a third calculation unit, configured to calculate, according to the GPU load state quantity, the preset priority, and the response efficiency, a service effect score corresponding to the specified model according to a third calculation formula, where the second calculation formula is y= (p×wp+u×wu)/F, Y represents the service effect score, U represents the priority, U belongs to (0, 1), wp represents a weight corresponding to the response efficiency, wu represents a weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers.

Further, the matching module 6 includes:

the third acquisition unit is used for acquiring a preset capacity expansion threshold and a preset capacity reduction threshold, wherein the capacity expansion threshold is smaller than the capacity reduction threshold, and the capacity expansion threshold and the capacity reduction threshold are non-zero real numbers;

the comparison unit is used for comparing the numerical relation between the business effect scores and the capacity expansion threshold and the capacity contraction threshold respectively;

and the adjusting unit is used for controlling the number of the service containers through the container cluster according to the numerical relation and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.

Further, the adjusting unit includes:

the first judging subunit is used for judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value;

the second judging subunit is configured to increase a GPU resource duty ratio corresponding to the specified model by creating a specified service container if the service effect score is smaller than the capacity expansion threshold, and judge whether the numerical relationship is that the service effect score is greater than the capacity reduction threshold if the service effect score is not smaller than the capacity expansion threshold;

and the adjustment subunit is used for reducing the GPU resource duty ratio corresponding to the appointed model by destroying the appointed service container if the service effect score is larger than the capacity reduction threshold, otherwise, not adjusting the GPU resource duty ratio corresponding to the appointed model.

Further, the prediction module 2 includes:

the composition unit is used for composing the history dates, the working day states corresponding to the history dates and the service request quantity which is positioned on the day before the history dates into a training set of the XGBoost model;

the training unit is used for training the XGBoost model under an objective function by utilizing the training set of the XGBoost model;

the judging unit is used for judging whether the objective function of the XGBoost model is converged or not;

the input unit is used for counting the response time of the termination day of the historical service data of the appointed model and inputting the response time into the XGBoost model if the historical service data of the appointed model is converged;

and a fourth obtaining unit, configured to obtain the response time of the XGBoost model on the termination day according to the historical service data statistics of the specified model, where the predicted response time of the specified model on the specified date is obtained.

Referring to fig. 3, a computer device is further provided in the embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store all the data needed for the process of dynamically deploying GPU resources. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for dynamically deploying GPU resources.

The method for dynamically deploying GPU resources by the processor comprises the following steps: acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched; predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data; calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date; acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data; calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date; and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.

According to the computer equipment, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resources is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resources is achieved, and the use requirement of efficiently running each service is improved.

In one embodiment, the step of calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date by the processor includes: acquiring a response time threshold corresponding to the specified model; calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date and the response time threshold value through a first calculation formula, wherein the first calculation formula is P= (Tm-T)/Tm, P represents the response efficiency, P belongs to (0, 1), tm represents the response time threshold value, tm belongs to (0, 1), T represents the response time of the specified date, and T belongs to (0, 1).

In one embodiment, the monitoring data includes a graphics card usage rate, a GPU usage rate, and a temperature duty ratio, and the step of calculating, by the processor, a business effect score of the specified model corresponding to the specified date according to the response efficiency of the specified model corresponding to the specified date, and the monitoring data of the statistical termination date includes: calculating a GPU load state quantity according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio through a second calculation formula, wherein the second calculation formula is F= (a, wa+b, wb+c)/(Wa+Wb+wc), F represents the GPU load state quantity, a represents the display card utilization rate, a belongs to (0, 1), b represents the GPU utilization rate, b belongs to (0, 1), c represents the temperature duty ratio, c belongs to (0, 1), wa represents the weight corresponding to the display card utilization rate, wb represents the weight corresponding to the GPU utilization rate, wc represents the weight corresponding to the temperature duty ratio, wa, wb and Wc are non-zero real numbers, acquiring a preset priority corresponding to the specified model, and calculating a service effect score corresponding to the specified model through a third calculation formula according to the GPU load state quantity, the preset priority and the response efficiency, wherein the second calculation formula is Y= (U, U+U represents the preset weight, and Wu represents the service effect score corresponding to the zero, and Wu represents the corresponding to the real number, and Wu represents the service effect score corresponding to the weight corresponding to the zero.

In one embodiment, the step of dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date by controlling the number of service containers through the container cluster according to the business effect score corresponding to the specified model includes: acquiring a preset capacity expansion threshold and a preset capacity contraction threshold, wherein the capacity expansion threshold is smaller than the capacity contraction threshold, and the capacity expansion threshold and the capacity contraction threshold are non-zero real numbers; comparing the numerical relation between the business effect score and the capacity expansion threshold and the capacity contraction threshold respectively; and according to the numerical relation, controlling the number of service containers through the container cluster, and dynamically adjusting the GPU resource duty ratio corresponding to the specified model.

In one embodiment, the step of dynamically adjusting the GPU resource occupancy ratio corresponding to the specified model by the processor according to the numerical relation and by controlling the number of service containers through the container cluster includes: judging whether the numerical relation is that the service effect score is smaller than the capacity expansion threshold value; if the business effect score is smaller than the capacity expansion threshold, increasing the GPU resource duty ratio corresponding to the appointed model by creating an appointed service container, and if the business effect score is not smaller than the capacity expansion threshold, judging whether the numerical relation is that the business effect score is larger than the capacity reduction threshold; if the service effect score is larger than the capacity reduction threshold, reducing the GPU resource duty ratio corresponding to the specified model by destroying the specified service container, otherwise, not adjusting the GPU resource duty ratio corresponding to the specified model.

In one embodiment, the step of predicting, by the processor, a response time of a specified date corresponding to the specified model according to each of the history dates, a working day status corresponding to each of the history dates, and a service request amount on a day before each of the history dates, includes: forming a training set of the XGBoost model by the historical dates, working day states corresponding to the historical dates and service request quantity on the day before the historical dates; training the XGBoost model under an objective function by utilizing a training set of the XGBoost model; judging whether an objective function of the XGBoost model is converged or not; if yes, the response time of the historical service data statistics termination day of the appointed model is input into the XGBoost model; and acquiring response time of the XGBoost model on the expiration date according to historical service data statistics of the specified model, and predicting the obtained response time of the specified model on the specified date.

Those skilled in the art will appreciate that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present application and is not intended to limit the computer device to which the present application is applied.

An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for dynamically deploying GPU resources, comprising: acquiring historical service data corresponding to specified models in a system to be matched, wherein the historical service data comprises statistical termination days corresponding to the historical service data, historical dates, working day states corresponding to the historical dates and service request amounts located on the day before the historical dates, respectively, the system to be matched comprises a plurality of models sharing GPU resources, and the specified model is any one of all models in the system to be matched; predicting response time of a specified date corresponding to the specified model according to each historical date, a working day state corresponding to each historical date and a service request amount located on a day before each historical date, wherein the specified date is a date adjacent to a statistical ending date corresponding to the historical service data and located after a time sequence of the statistical ending date corresponding to the historical service data; calculating the response efficiency of the specified model corresponding to the specified date according to the response time of the specified date; acquiring monitoring data of the specified model on a statistical termination day corresponding to the historical service data; calculating a business effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitoring data of the statistical termination date; and controlling the number of service containers through the container cluster according to the service effect scores corresponding to the specified model, and dynamically matching the GPU resource duty ratio of the specified model corresponding to the specified date.

According to the computer readable storage medium, the application layer resource monitoring function module is designed on the service application layer, and the deployment state of the GPU resource is comprehensively and dynamically adjusted according to the service priority and the response efficiency of each model through the application layer resource monitoring function module, so that the effect of reasonably distributing the GPU resource is achieved, and the use requirement of efficiently running each service is improved.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. A method for dynamically deploying GPU resources, comprising:

according to the service effect scores corresponding to the specified models, controlling the number of service containers through container clusters, and dynamically matching the GPU resource duty ratio of the specified dates corresponding to the specified models;

the step of calculating the service effect score of the appointed model corresponding to the appointed date according to the response efficiency of the appointed model corresponding to the appointed date and the monitored data of the statistics termination date, wherein the monitored data comprises the utilization rate of a display card, the utilization rate of a GPU and the temperature ratio, and the step of calculating the service effect score of the appointed model corresponding to the appointed date comprises the following steps:

Calculating a GPU load state quantity according to a second calculation formula according to the display card utilization rate, the GPU utilization rate and the temperature duty ratio, wherein the second calculation formula is F= (a, wa+b, wb+c, wc) or F = (a, wa+b, wb+c, wc+c) or F =

(wa+wb+wc), F represents a GPU load state quantity, a represents a graphics card usage, a belongs to (0, 1], b represents a GPU usage, b belongs to (0, 1), c represents a temperature duty ratio, c belongs to (0, 1), wa represents a weight corresponding to the graphics card usage, wb represents a weight corresponding to the GPU usage, wc represents a weight corresponding to the temperature duty ratio, wa, wb, and Wc are non-zero real numbers;

acquiring a preset priority corresponding to the specified model;

calculating a service effect score corresponding to the specified model according to the GPU load state quantity, the preset priority and the response efficiency through a third calculation formula, wherein the second calculation formula is that Y= (P+U+wu)/F, Y represents the service effect score, U represents the priority, and U belongs to (0, 1)]Wp represents the weight corresponding to the response efficiency, wu represents the weight corresponding to the preset priority, and Wp and Wu are non-zero real numbers _。

2. The method of dynamically deploying GPU resources according to claim 1, wherein the step of calculating the response efficiency of the specified model for the specified date based on the response time of the specified date comprises:

Acquiring a response time threshold corresponding to the specified model;

3. The method for dynamically deploying GPU resources according to claim 1, wherein the step of dynamically matching the GPU resource duty ratio of the specified model to the specified date by controlling the number of service containers through the container cluster according to the business effect score corresponding to the specified model comprises:

4. A method for dynamically deploying GPU resources according to claim 3, wherein the step of dynamically adjusting the GPU resource duty ratio corresponding to the specified model by controlling the number of service containers through a container cluster according to the numerical relationship comprises:

5. The method for dynamically deploying GPU resources according to claim 1, wherein the step of predicting the response time of the specified date corresponding to the specified model according to each of the history dates, the weekday status corresponding to each of the history dates, and the service request amount on the day before each of the history dates comprises:

judging whether an objective function of the XGBoost model is converged or not;

6. An apparatus for dynamically deploying GPU resources, comprising:

7. The apparatus for dynamically deploying GPU resources according to claim 6, wherein the first computing module comprises:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.