CN111158908A - Kubernetes-based scheduling method and device for improving GPU utilization rate - Google Patents

Kubernetes-based scheduling method and device for improving GPU utilization rate Download PDF

Info

Publication number
CN111158908A
CN111158908A CN201911374630.4A CN201911374630A CN111158908A CN 111158908 A CN111158908 A CN 111158908A CN 201911374630 A CN201911374630 A CN 201911374630A CN 111158908 A CN111158908 A CN 111158908A
Authority
CN
China
Prior art keywords
time
period
utilization rate
resource utilization
pod
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911374630.4A
Other languages
Chinese (zh)
Other versions
CN111158908B (en
Inventor
王亚东
曹稳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN201911374630.4A priority Critical patent/CN111158908B/en
Publication of CN111158908A publication Critical patent/CN111158908A/en
Application granted granted Critical
Publication of CN111158908B publication Critical patent/CN111158908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a Kubernets-based scheduling method for improving GPU utilization rate, which comprises the following steps: acquiring the real-time resource utilization rate in a period of time; sampling the real-time resource utilization rate in the period of time to obtain the average resource utilization rate in the period of time; predicting to obtain the pod number required by the next period of time according to the average resource utilization rate of the period of time, the pod number used by an algorithm, the maximum utilization rate and the minimum utilization rate; and adjusting the pod quantity according to the pod quantity required by the certain algorithm in the next period of time and the pod quantity used by the certain algorithm. The invention replaces the utilization rate of bottom hardware resources by the load rate of the algorithm capacity, has stronger adaptability, can better reflect the use condition of the artificial intelligence algorithm to the current distributed resources, and can more reasonably utilize the hardware resources and improve the resource utilization rate by scheduling according to the method.

Description

Kubernetes-based scheduling method and device for improving GPU utilization rate
Technical Field
The invention belongs to the field of resource allocation, and particularly relates to a Kubernetes-based scheduling method and device for improving GPU utilization rate
Background
With the gradual advance of technologies such as cloud computing, artificial intelligence, big data and the like, security systems are increasingly huge, and hardware resource scheduling and software deployment management are gradually performed by adopting Kubernets. Kubernetes is an open-source container cluster management system, and provides a series of complete functions such as deployment, operation, resource scheduling and the like for containerized applications, wherein a pod is the minimum unit for managing and scheduling, and each pod can be regarded as a virtual machine.
For original picture data generated in a security system, data information extracted by an artificial intelligence algorithm is required to be gradually multidimensional, and the artificial intelligence algorithms of different manufacturers and different service types often appear in the same system at the same time.
The artificial intelligence algorithm generally runs in an independent pod, most of the artificial intelligence algorithms are accelerated through GPU hardware resources, the GPU hardware resources in the system are limited, and the artificial intelligence algorithms of different manufacturers have different optimization degrees on the GPU hardware utilization rate, so that the GPU hardware utilization rate is not high, but the maximum performance of the algorithm is achieved.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present invention is to provide a scheduling method and apparatus for improving GPU utilization based on kubernets, so as to solve the drawbacks of the prior art.
In order to achieve the above objects and other related objects, the present invention provides a scheduling method for improving GPU utilization based on kubernets, including:
acquiring the real-time resource utilization rate in a period of time;
sampling the real-time resource utilization rate in the period of time to obtain the average resource utilization rate in the period of time;
predicting to obtain the pod number required by the next period of time according to the average resource utilization rate of the period of time, the pod number used by an algorithm, the maximum utilization rate and the minimum utilization rate;
and adjusting the pod quantity according to the pod quantity required by the certain algorithm in the next period of time and the pod quantity used by the certain algorithm.
Optionally, the acquiring the real-time resource utilization rate over a period of time includes:
respectively acquiring the number of pictures analyzed by a certain algorithm in a first time period and a second time period;
and obtaining the real-time resource utilization rate according to the number of pictures analyzed by a certain algorithm in the first time period, the number of pictures analyzed by the certain algorithm in the second time period and the maximum number of pictures analyzed by the certain algorithm in 1 second.
Optionally, the real-time resource utilization is expressed as:
Figure RE-GDA0002394594050000021
wherein, UijRepresenting the real-time resource utilization rate of the GPU, i, j is respectively a first time period, a second time period, i>j,CiRepresenting the number of pictures, C, analyzed by an algorithm during a first time periodjThe number of pictures analyzed by a certain algorithm in the second time period is shown, and M shows the maximum number of pictures analyzed by the certain algorithm in 1 second.
Alternatively, the average resource utilization is calculated by the following formula,
Figure 100002_1
wherein the content of the first and second substances,
Figure RE-GDA0002394594050000023
the average resource utilization rate is represented, I represents the sampling times of the real-time resource utilization rate in a period of time, and J represents the number of pod operated by a certain algorithm.
Optionally, calculating the required pod number for a next period of time for an algorithm by the following formula;
Figure RE-GDA0002394594050000024
wherein the content of the first and second substances,
Figure RE-GDA0002394594050000025
represents the average resource utilization, Z represents the number of pod required for a period of time next to an algorithm, Zo is the number of pods used by an algorithm, pmaxDenotes the maximum utilization, pminIndicating a minimum utilization.
To achieve the above and other related objects, the present invention provides a kubernets-based scheduling apparatus for improving GPU utilization, including:
the real-time resource utilization rate acquisition module is used for acquiring the real-time resource utilization rate in a period of time;
the average resource utilization rate calculation module is used for sampling the real-time resource utilization rate in the period of time to obtain the average resource utilization rate in the period of time;
the prediction module is used for predicting the pod number required by the next period of time according to the average resource utilization rate of the period of time, the pod number used by an algorithm, the maximum utilization rate and the minimum utilization rate;
and the scheduling module is used for adjusting the pod number according to the pod number required by the certain algorithm in the next period of time and the pod number used by the certain algorithm.
Optionally, the real-time resource utilization obtaining module:
the first picture acquisition sub-module is used for acquiring the number of pictures analyzed by a certain algorithm in a first time period;
the second picture acquisition sub-module is used for acquiring the number of pictures analyzed by a certain algorithm in a second time period;
and the real-time resource utilization rate calculation submodule is used for obtaining the real-time resource utilization rate according to the number of pictures analyzed by a certain algorithm in the first time period, the number of pictures analyzed by the certain algorithm in the second time period and the maximum number of pictures analyzed by the certain algorithm in 1 second.
Optionally, the real-time resource utilization obtaining module obtains the real-time resource utilization by the following formula:
Figure RE-GDA0002394594050000031
wherein, UijRepresenting the real-time resource utilization rate of the GPU, i, j is respectively a first time period, a second time period, i>j,CiRepresenting the number of pictures, C, analyzed by an algorithm during a first time periodjThe number of pictures analyzed by a certain algorithm in the second time period is shown, and M shows the maximum number of pictures analyzed by the certain algorithm in 1 second.
Optionally, the average resource utilization calculation module calculates the average resource utilization by the following formula,
wherein the content of the first and second substances,
Figure RE-GDA0002394594050000033
the average resource utilization rate is represented, I represents the sampling times of the real-time resource utilization rate in a period of time, and J represents the number of pod operated by a certain algorithm.
Optionally, the prediction module calculates the amount of pod required by an algorithm for the next period of time by the following equation;
Figure RE-GDA0002394594050000034
wherein the content of the first and second substances,
Figure RE-GDA0002394594050000035
represents the average resource utilization, Z represents the number of pod required for a period of time next to an algorithm, Zo is the number of pods used by an algorithm, pmaxDenotes the maximum utilization, pminIndicating a minimum utilization.
As described above, the scheduling method and apparatus for improving GPU utilization based on kubernets of the present invention have the following beneficial effects:
the invention replaces the utilization rate of bottom hardware resources by the load rate of the algorithm capacity, has stronger adaptability, can better reflect the use condition of the artificial intelligence algorithm to the current distributed resources, and can more reasonably utilize the hardware resources and improve the resource utilization rate by scheduling according to the method.
Drawings
Fig. 1 is a flowchart of a scheduling method for improving GPU utilization based on kubernets according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for obtaining real-time resource utilization according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a scheduling apparatus for improving GPU utilization based on kubernets according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a data acquisition module according to an embodiment of the invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The invention provides a scheduling method and device for improving the utilization rate of a GPU (graphics processing unit) in a cloud platform aiming at a picture data multi-algorithm analysis scene based on a kubernets cloud platform.
Although the utilization rate of the GPU hardware actually used is different when the maximum number of pictures which can be analyzed is reached due to different optimization of hardware and processes for different artificial intelligence algorithms for analyzing picture data, the maximum number of pictures which can be analyzed by the algorithm can be determined and given, so that the utilization rate of the hardware resources distributed by the algorithm can be obtained by calculating the ratio of the number of pictures analyzed by the algorithm at the current moment to the peak value. Based on the idea, as shown in fig. 1, a kubernets-based scheduling method for improving GPU utilization includes the following steps:
s11 obtaining real-time resource utilization rate in a period of time;
s12, sampling the real-time resource utilization rate in the period of time to obtain the average resource utilization rate in the period of time;
s13, predicting the pod number needed by the next period of time according to the average resource utilization rate of the period of time, the pod number used by an algorithm, the maximum utilization rate and the minimum utilization rate;
s14 adjusts the pod number according to the pod number required by the certain algorithm in the next period of time and the pod number already used by the certain algorithm.
Wherein, the adjusting means that if the currently used pod number is larger than the pod number required in the next period of time, the pod number is reduced, that is, the capacity reduction of the pod is executed; if the currently used pod number is smaller than the pod number required in the next period of time, the pod number is increased, that is, the expansion of the pod is performed.
The invention replaces the utilization rate of bottom hardware resources by the load rate of the algorithm capacity, has stronger adaptability, can better reflect the use condition of the artificial intelligence algorithm to the current distributed resources, and can more reasonably utilize the hardware resources and improve the resource utilization rate by scheduling according to the method.
In step S11, as shown in FIG. 2, so
The obtaining real-time resource utilization over a period of time includes:
s21, respectively acquiring the number of pictures analyzed by an algorithm in a first time period and a second time period;
s22 obtains the real-time resource utilization rate according to the number of pictures analyzed by an algorithm in the first time period, the number of pictures analyzed by an algorithm in the second time period, and the maximum number of pictures that can be analyzed by an algorithm in 1 second.
Specifically, the real-time resource utilization rate may be calculated by formula (1).
Figure RE-GDA0002394594050000051
Wherein, UijRepresenting the real-time resource utilization rate of the GPU, i, j respectively representing the first time period,Second time period, i>j,CiRepresenting the number of pictures, C, analyzed by an algorithm during a first time periodjThe number of pictures analyzed by a certain algorithm in the second time period is shown, and M shows the maximum number of pictures analyzed by the certain algorithm in 1 second.
In step S12, the real-time resource utilization rate in the period of time is sampled to obtain the average resource utilization rate in the period of time.
Kubernetes is a container arrangement engine based on docker, can conveniently carry out cloud management on resources, and the basic unit of Kubernetes is pod. In security application, the number of pictures generated in real time fluctuates, and the received pictures needing to be analyzed often have the phenomena of wave crest, wave trough, peak staggering and the like in artificial intelligence algorithms corresponding to different services, so that the pod number required by one artificial intelligence algorithm can be obtained by obtaining the average resource utilization rate of the current pod within a period of time
Figure RE-GDA0002394594050000052
And calculating and predicting.
Specifically, the calculation formula of the average resource utilization rate of the current pod in a period of time is as follows (2):
wherein the content of the first and second substances,
Figure RE-GDA0002394594050000061
representing the average resource utilization, I represents the number of acquisitions of the real-time resource utilization over a period of time (every N seconds, I is recommended>30) And J denotes the number of pod that a certain algorithm runs.
In step S13, the number of pods needed for the next period of time is predicted according to the average resource utilization rate, the number of pods already used by an algorithm, the maximum utilization rate, and the minimum utilization rate of the period of time.
Specifically, the amount of pod required for a certain algorithm in the next period of time is calculated by the following formula;
Figure RE-GDA0002394594050000062
wherein, among others,
Figure RE-GDA0002394594050000063
represents the average resource utilization, Z represents the number of pod required for a period of time next to an algorithm, Zo is the number of pods used by an algorithm, pmaxDenotes the maximum utilization (empirical value), pminIndicating the minimum utilization (empirical value). In the embodiment, the high minimum utilization rate is set, so that the up-and-down jitter of the expansion and contraction capacity in a short time during scheduling is prevented.
Assuming that 10 GPU display card resources are used in the current system, a 3-person face recognition algorithm pod is operated currently, the maximum number of pictures analyzed by 1 pod of the face recognition algorithm is 100, the maximum utilization rate is 90%, and the minimum utilization rate is 40%; the number of the 2 vehicle identification algorithms pod is 100, the maximum utilization rate is 90%, and the minimum utilization rate is 40%.
Assuming that only 10 images per second are analyzed in each pod per second of face recognition within a period of time, calculation is performed according to formulas (1) to (3), and only 1 GPU video card is actually needed. The vehicle identification has reached 95 pieces per second per pod analysis, indicating that the algorithm is continuously under high load, which may result in pile-up of picture data, so the algorithm pod of 1 vehicle identification should be added.
And sending a telescopic instruction according to a calculation result, shrinking the face identification pod to 1, increasing the number of the vehicle identification pods to 4, and improving the utilization rate of the whole GPU display card resource.
As shown in fig. 3, a scheduling apparatus for improving GPU utilization based on kubernets includes:
a real-time resource utilization obtaining module 31, configured to obtain a real-time resource utilization within a period of time;
the average resource utilization rate calculation module 32 is configured to sample the real-time resource utilization rate in the period of time to obtain an average resource utilization rate in the period of time;
a prediction module 33, configured to predict, according to the average resource utilization rate of the period of time, the used pod number of a certain algorithm, the maximum utilization rate, and the minimum utilization rate, the pod number required by the next period of time;
and the scheduling module 34 is configured to adjust the pod number according to the pod number required by the certain algorithm in the next period of time and the pod number used by the certain algorithm.
Wherein, the adjusting means that if the currently used pod number to the pod number is larger than the pod number required by the next period of time, the pod number is reduced, that is, the shrinking of the pod is executed; and if the currently used pod number to pod number is smaller than the pod number required in the next period of time, increasing the pod number, namely performing capacity expansion of the pod.
According to the invention, the resource utilization rate is reported through a quantitative formula, the pod number required in the next period of time is calculated and predicted through the resource information in a period of time, the expansion and contraction capacity is carried out according to the predicted pod number, the processes are repeated continuously, the hardware resource utilization rate of the whole system is balanced, and the overall performance of the system is improved.
In one embodiment, as shown in fig. 4, the data acquisition module includes:
a first picture obtaining sub-module 41, configured to obtain, by a first picture obtaining sub-module, a number of pictures analyzed by a certain algorithm in a first time period;
a second picture obtaining sub-module 42, configured to obtain the number of pictures analyzed by a certain algorithm in a second time period;
and the real-time resource utilization rate calculating submodule 43 is configured to obtain the real-time resource utilization rate according to the number of pictures analyzed by a certain algorithm in the first time period, the number of pictures analyzed by a certain algorithm in the second time period, and the maximum number of pictures that can be analyzed by a certain algorithm in 1 second.
Specifically, the real-time resource utilization rate is obtained by the real-time resource utilization rate obtaining module according to the formula (1), which is expressed as:
Figure RE-GDA0002394594050000071
wherein, UijRepresenting real-time resource utilization of the GPU, i, j being the first time respectivelySegment, second time segment, i>j,CiRepresenting the number of pictures, C, analyzed by an algorithm during a first time periodjThe number of pictures analyzed by a certain algorithm in the second time period is shown, and M shows the maximum number of pictures analyzed by the certain algorithm in 1 second.
And the average resource utilization rate calculation module is used for sampling the real-time resource utilization rate in the period of time to obtain the average resource utilization rate in the period of time.
Kubernetes is a container arrangement engine based on docker, can conveniently carry out cloud management on resources, and the basic unit of Kubernetes is pod. In security application, the number of pictures generated in real time fluctuates, and the received pictures which need to be analyzed often have the phenomena of wave crests, wave troughs, peak faults and the like in artificial intelligence algorithms corresponding to different services, so that the pod number required by the artificial intelligence algorithm can be obtained by obtaining the average resource utilization rate of the current pod within a period of time
Figure RE-GDA0002394594050000072
And calculating and predicting.
Specifically, the average resource utilization calculation module calculates the average resource utilization within a period of the current pod by using equation (2):
wherein the content of the first and second substances,
Figure RE-GDA0002394594050000082
representing the average resource utilization, I represents the number of acquisitions of the real-time resource utilization over a period of time (every N seconds, I is recommended>30) And J denotes the number of pod that a certain algorithm runs.
In one embodiment, the prediction module is configured to predict the pod number required for the next period of time according to the average resource utilization rate of the period of time, the pod number used by a certain algorithm, the maximum utilization rate, and the minimum utilization rate.
Specifically, the prediction module calculates the pod number required by a certain algorithm in the next period of time through the formula (3);
Figure RE-GDA0002394594050000083
wherein, among others,
Figure RE-GDA0002394594050000084
represents the average resource utilization, Z represents the number of pod required for a period of time next to an algorithm, Zo is the number of pods used by an algorithm, pmaxDenotes the maximum utilization (empirical value), pminIndicating the minimum utilization (empirical value). In the embodiment, the high minimum utilization rate is set, so that the up-and-down jitter of the expansion and contraction capacity in a short time during scheduling is prevented.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may comprise any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, etc.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A scheduling method for improving GPU utilization rate based on kubernets is characterized by comprising the following steps:
acquiring the real-time resource utilization rate in a period of time;
sampling the real-time resource utilization rate in the period of time to obtain the average resource utilization rate in the period of time;
predicting to obtain the pod number required by the next period of time according to the average resource utilization rate of the period of time, the pod number used by an algorithm, the maximum utilization rate and the minimum utilization rate;
and adjusting the pod quantity according to the pod quantity required by the certain algorithm in the next period of time and the pod quantity used by the certain algorithm.
2. The kubernets-based scheduling method for improving GPU utilization according to claim 1, wherein the obtaining real-time resource utilization over a period of time comprises:
respectively acquiring the number of pictures analyzed by a certain algorithm in a first time period and a second time period;
and obtaining the real-time resource utilization rate according to the number of pictures analyzed by a certain algorithm in the first time period, the number of pictures analyzed by the certain algorithm in the second time period and the maximum number of pictures analyzed by the certain algorithm in 1 second.
3. The kubernets-based scheduling method for improving GPU utilization as claimed in claim 2, wherein the real-time resource utilization is expressed as:
Figure FDA0002340597370000011
wherein, UijRepresenting the real-time resource utilization rate of the GPU, i, j is respectively a first time period, a second time period, i>j,CiRepresenting the number of pictures, C, analyzed by an algorithm during a first time periodjThe number of pictures analyzed by a certain algorithm in the second time period is shown, and M shows the maximum number of pictures analyzed by the certain algorithm in 1 second.
4. The kubernets-based GPU utilization enhancement scheduling method of claim 3, wherein the average resource utilization is calculated by,
Figure 1
wherein the content of the first and second substances,
Figure FDA0002340597370000013
the average resource utilization rate is represented, I represents the sampling times of the real-time resource utilization rate in a period of time, and J represents the number of pod operated by a certain algorithm.
5. The kubernets-based scheduling method for improving GPU utilization according to claim 1, wherein the pod number required for a certain algorithm in a next period of time is calculated by the following formula;
Figure FDA0002340597370000021
wherein the content of the first and second substances,
Figure FDA0002340597370000022
representing the average resource utilization, Z representing the number of pod required for a period of time under an algorithm, ZoThe number of pod used for an algorithm, pmaxDenotes the maximum utilization, pminIndicating a minimum utilization.
6. A kubernets-based scheduling apparatus for improving GPU utilization, the apparatus comprising:
the real-time resource utilization rate acquisition module is used for acquiring the real-time resource utilization rate in a period of time;
the average resource utilization rate calculation module is used for sampling the real-time resource utilization rate in the period of time to obtain the average resource utilization rate in the period of time;
the prediction module is used for predicting the pod number required by the next period of time according to the average resource utilization rate of the period of time, the pod number used by an algorithm, the maximum utilization rate and the minimum utilization rate;
and the scheduling module is used for adjusting the pod number according to the pod number required by the certain algorithm in the next period of time and the pod number used by the certain algorithm.
7. The kubernets-based scheduling device for improving GPU utilization as claimed in claim 6, wherein the real-time resource utilization obtaining module:
the first picture acquisition sub-module is used for acquiring the number of pictures analyzed by a certain algorithm in a first time period;
the second picture acquisition sub-module is used for acquiring the number of pictures analyzed by a certain algorithm in a second time period;
and the real-time resource utilization rate calculation submodule is used for obtaining the real-time resource utilization rate according to the number of pictures analyzed by a certain algorithm in the first time period, the number of pictures analyzed by the certain algorithm in the second time period and the maximum number of pictures analyzed by the certain algorithm in 1 second.
8. The kubernets-based scheduling device for improving GPU utilization of claim 7, wherein the real-time resource utilization obtaining module obtains the real-time resource utilization by the following formula:
Figure FDA0002340597370000031
wherein, UijRepresenting the real-time resource utilization rate of the GPU, i, j is respectively a first time period, a second time period, i>j,CiRepresenting the number of pictures, C, analyzed by an algorithm during a first time periodjThe number of pictures analyzed by a certain algorithm in the second time period is shown, and M shows the maximum number of pictures analyzed by the certain algorithm in 1 second.
9. The kubernets-based scheduling device for improving GPU utilization of claim 8, wherein the average resource utilization calculating module calculates the average resource utilization by the following formula,
Figure 1
wherein the content of the first and second substances,
Figure FDA0002340597370000033
the average resource utilization rate is represented, I represents the sampling times of the real-time resource utilization rate in a period of time, and J represents the number of pod operated by a certain algorithm.
10. The kubernets-based scheduling device for improving GPU utilization as claimed in claim 6, wherein the prediction module calculates the pod number required by an algorithm in the next period of time by the following formula;
Figure FDA0002340597370000034
wherein the content of the first and second substances,
Figure FDA0002340597370000035
representing the average resource utilization, Z representing the number of pod required for a period of time under an algorithm, ZoThe number of pod used for an algorithm, pmaxDenotes the maximum utilization, pminIndicating a minimum utilization.
CN201911374630.4A 2019-12-27 2019-12-27 Kubernetes-based scheduling method and device for improving GPU utilization rate Active CN111158908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911374630.4A CN111158908B (en) 2019-12-27 2019-12-27 Kubernetes-based scheduling method and device for improving GPU utilization rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911374630.4A CN111158908B (en) 2019-12-27 2019-12-27 Kubernetes-based scheduling method and device for improving GPU utilization rate

Publications (2)

Publication Number Publication Date
CN111158908A true CN111158908A (en) 2020-05-15
CN111158908B CN111158908B (en) 2021-05-25

Family

ID=70557078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911374630.4A Active CN111158908B (en) 2019-12-27 2019-12-27 Kubernetes-based scheduling method and device for improving GPU utilization rate

Country Status (1)

Country Link
CN (1) CN111158908B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625358A (en) * 2020-05-25 2020-09-04 浙江大华技术股份有限公司 Resource allocation method and device, electronic equipment and storage medium
CN113674137A (en) * 2021-08-30 2021-11-19 浩鲸云计算科技股份有限公司 Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562545A (en) * 2017-09-11 2018-01-09 南京奥之云信息技术有限公司 A kind of container dispatching method based on Docker technologies
CN107908457A (en) * 2017-11-08 2018-04-13 河海大学 A kind of containerization cloud resource distribution method based on stable matching
CN108293127A (en) * 2015-09-25 2018-07-17 诺基亚技术有限公司 For Video coding and decoded device, method and computer program
CN108874542A (en) * 2018-06-07 2018-11-23 桂林电子科技大学 Kubernetes method for optimizing scheduling neural network based
CN109117248A (en) * 2018-07-19 2019-01-01 郑州云海信息技术有限公司 A kind of deep learning task elastic telescopic system and method based on kubernetes platform
US10191778B1 (en) * 2015-11-16 2019-01-29 Turbonomic, Inc. Systems, apparatus and methods for management of software containers
CN109918184A (en) * 2019-03-01 2019-06-21 腾讯科技(深圳)有限公司 Picture processing system, method and relevant apparatus and equipment
CN110287029A (en) * 2019-06-27 2019-09-27 中国—东盟信息港股份有限公司 A method of it is adjusted based on kubernetes container resource dynamic
CN110515730A (en) * 2019-08-22 2019-11-29 北京宝兰德软件股份有限公司 Resource secondary dispatching method and device based on kubernetes container arranging system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108293127A (en) * 2015-09-25 2018-07-17 诺基亚技术有限公司 For Video coding and decoded device, method and computer program
US10191778B1 (en) * 2015-11-16 2019-01-29 Turbonomic, Inc. Systems, apparatus and methods for management of software containers
CN107562545A (en) * 2017-09-11 2018-01-09 南京奥之云信息技术有限公司 A kind of container dispatching method based on Docker technologies
CN107908457A (en) * 2017-11-08 2018-04-13 河海大学 A kind of containerization cloud resource distribution method based on stable matching
CN108874542A (en) * 2018-06-07 2018-11-23 桂林电子科技大学 Kubernetes method for optimizing scheduling neural network based
CN109117248A (en) * 2018-07-19 2019-01-01 郑州云海信息技术有限公司 A kind of deep learning task elastic telescopic system and method based on kubernetes platform
CN109918184A (en) * 2019-03-01 2019-06-21 腾讯科技(深圳)有限公司 Picture processing system, method and relevant apparatus and equipment
CN110287029A (en) * 2019-06-27 2019-09-27 中国—东盟信息港股份有限公司 A method of it is adjusted based on kubernetes container resource dynamic
CN110515730A (en) * 2019-08-22 2019-11-29 北京宝兰德软件股份有限公司 Resource secondary dispatching method and device based on kubernetes container arranging system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈雁,黄嘉鑫: "基于Kubernetes应用的弹性伸缩策略", 《计算机系统应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625358A (en) * 2020-05-25 2020-09-04 浙江大华技术股份有限公司 Resource allocation method and device, electronic equipment and storage medium
CN111625358B (en) * 2020-05-25 2023-06-20 浙江大华技术股份有限公司 Resource allocation method and device, electronic equipment and storage medium
CN113674137A (en) * 2021-08-30 2021-11-19 浩鲸云计算科技股份有限公司 Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy

Also Published As

Publication number Publication date
CN111158908B (en) 2021-05-25

Similar Documents

Publication Publication Date Title
US8638937B2 (en) Methods and systems for distributed processing on consumer devices
CN111045814B (en) Resource scheduling method and terminal equipment
CN111158908B (en) Kubernetes-based scheduling method and device for improving GPU utilization rate
CN111260037B (en) Convolution operation method and device of image data, electronic equipment and storage medium
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
CN108012156A (en) A kind of method for processing video frequency and control platform
US11153526B2 (en) Detection of photosensitive triggers in video content
CN116188808B (en) Image feature extraction method and system, storage medium and electronic device
CN111858827A (en) Map point position rarefying display method and device and computer equipment
CN108595211A (en) Method and apparatus for output data
CN116761018B (en) Real-time rendering system based on cloud platform
CN111369557B (en) Image processing method, device, computing equipment and storage medium
CN116168045A (en) Method and system for dividing sweeping lens, storage medium and electronic equipment
CN112040090A (en) Video stream processing method and device, electronic equipment and storage medium
CN113362090A (en) User behavior data processing method and device
CN115827236A (en) Method and system for optimizing load performance of live-action three-dimensional cloud release process
CN110348367A (en) Video classification methods, method for processing video frequency, device, mobile terminal and medium
CN111191612B (en) Video image matching method, device, terminal equipment and readable storage medium
CN114969409A (en) Image display method and device and readable medium
CN111160283B (en) Data access method, device, equipment and medium
CN114356712A (en) Data processing method, device, equipment, readable storage medium and program product
CN117197706B (en) Method and system for dividing progressive lens, storage medium and electronic device
CN113542808B (en) Video processing method, apparatus, device and computer readable medium
CN112668474B (en) Plane generation method and device, storage medium and electronic equipment
CN112905351B (en) GPU and CPU load scheduling method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant