CN113674137A - Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy - Google Patents
Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy Download PDFInfo
- Publication number
- CN113674137A CN113674137A CN202111001401.5A CN202111001401A CN113674137A CN 113674137 A CN113674137 A CN 113674137A CN 202111001401 A CN202111001401 A CN 202111001401A CN 113674137 A CN113674137 A CN 113674137A
- Authority
- CN
- China
- Prior art keywords
- time
- utilization rate
- model
- period
- video memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000015654 memory Effects 0.000 title claims abstract description 67
- 238000011068 loading method Methods 0.000 title claims abstract description 17
- 238000004458 analytical method Methods 0.000 claims abstract description 21
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000012544 monitoring process Methods 0.000 claims description 12
- 238000000034 method Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 4
- 230000008676 import Effects 0.000 description 11
- 238000004519 manufacturing process Methods 0.000 description 3
- 239000010453 quartz Substances 0.000 description 3
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 3
- 239000011800 void material Substances 0.000 description 3
- 230000004069 differentiation Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a model loading method for maximally improving the utilization rate of a video memory based on an LRU (least recently used) strategy, which comprises the following steps of: constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example; starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period; calculating the use rate of the moving average video memory by scheduling the optimal resource scheduling strategy; according to the data information in the period of time, predicting the number of instances required by the next period of time through an optimal resource scheduling strategy; the number of instances is adjusted according to the number of instances required by the model in the next period of time and the number of instances used by the model. Has the advantages that: through the LRU scheduling strategy, the models are dynamically started and stopped, the pain point of low utilization rate of the multi-model shared video memory is solved, the utilization rate of the video memory is improved, and resources are saved.
Description
Technical Field
The invention relates to the technical field of video memory, in particular to a model loading method for maximizing the utilization rate of the video memory based on an LRU (least recently used) strategy.
Background
When a large enterprise carries out digital transformation, an AI scene is bound to face, the requirements of AI application and AI capacity are met, in the production process of real AI capacity, the calling of AI capacity is bound to exist, usually, API realization is provided for the outside based on an AI capacity open platform, the uploading and the deployment of AI capacity are carried out based on a model version, in the capacity deployment, single model and multi-model combined deployment exist, obviously, the value of resource utilization can be better reflected by the multi-model combined deployment, and on the basis of the multi-model deployment, the problem of resource sharing of a CPU, a GPU, a memory and a video memory needs to be solved. In the daily production process of AI (multiple models), the differentiation demands on the model calling amount in different time periods in application certainly exist, and the differentiation between intensive calling of an A model and scattered calling of a B model in the same AI capacity or even zero calling needs to be solved, so that the resources of the A model are insufficient, and the resources of the B model are wasted; and the requirement of model replacement in a running state exists, namely the same capability comprises a plurality of models (A, B, C), each model starts a plurality of instances, the former resources can only support the A and B non-calling requests with calling quantity, and the later production operation can have the requirement that the B non-calling quantity and the C have the calling quantity, so that the occupation and the waste of resources are caused.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a model loading method for maximizing the utilization rate of the video memory based on the LRU strategy, so as to overcome the technical problems in the prior related art.
Therefore, the invention adopts the following specific technical scheme:
the model loading method for maximizing and improving the utilization rate of the video memory based on the LRU strategy comprises the following steps:
constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example;
starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period;
calculating the use rate of the moving average video memory by scheduling the optimal resource scheduling strategy;
according to the data information in the period of time, predicting the number of instances required by the next period of time through an optimal resource scheduling strategy;
adjusting the number of the examples according to the number of the examples needed by the model in the next period of time and the number of the examples used by the model;
and finally realizing the maximization of the video memory utilization rate through the optimal resource scheduling strategy.
Further, the construction and deployment of the three models of face recognition, portrait comparison and human body analysis and the configuration of the example comprise the following steps:
three model capabilities of face recognition, portrait comparison and human body analysis are configured through an AI platform;
6 elastically telescopic examples are respectively configured for three models of face recognition, portrait comparison and human body analysis;
configuring three models of face recognition, portrait comparison and human body analysis to the same display card;
and deploying and starting three models of face recognition, portrait comparison and human body analysis through a container management platform.
Further, the starting of the timing task, obtaining the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period includes the following steps:
starting a timing task, and acquiring the real-time resource utilization rate of the GPU in the period of time by a resource monitoring tool every 10 minutes;
storing the acquired GPU real-time utilization rate for the scheduling of the optimal resource scheduling policy (LRU);
the optimal resource scheduling strategy scheduling center circularly obtains data of a certain period of time from the remote dictionary service, samples the real-time utilization rate of the GPU in the period of time, and obtains the average GPU utilization rate in the period of time through calculation.
Further, the step of obtaining the real-time resource utilization rate of the GPU in the period of time by the resource monitoring tool every 10 minutes includes the following steps:
respectively acquiring the number of pictures analyzed by the three models in a first time period and a second time period;
and respectively obtaining the number of the pictures analyzed by the three models in the first time period, the number of the pictures analyzed by the three models in the second time period and the maximum number of the pictures analyzed by the three models in 1 second, and calculating to obtain the GPU real-time resource utilization rate.
Further, the formula for calculating the real-time resource utilization rate of the GPU is as follows:
wherein A represents the real-time resource utilization rate of the GPU, i, j are respectively a first time period and a second time period, and i>j,CiRepresenting the number of pictures, C, analyzed by the model during a first time periodjRepresenting the number of pictures j that the model analyzed during the second time period, and M representing the maximum number of pictures that the model can analyze in 1 second.
Further, the calculation formula for obtaining the average GPU utilization within the period of time through calculation is as follows:
wherein the content of the first and second substances,the average GPU utilization rate is represented, I represents the sampling times of the real-time GPU utilization rate in a period of time, and J represents the number of model operation instances.
Further, the formula for calculating the running average video memory usage rate by scheduling the optimal resource scheduling policy is as follows:
wherein the content of the first and second substances,for moving average display memory of model in t periodThe utilization rate of the water-based paint is improved,average GPU utilization for the model over a period of t, and when a moving average model is not used=Beta is a weighted random number of 0 to 1, where beta is set to 0.9;
and the above formula can be expanded as follows:
filling the usage rate of each time from t to 1 into a formula to calculate UtThe running average video memory usage at time t to 1.
Further, the data information includes an average resource utilization rate, a number of used instances of each model, a maximum GPU utilization rate, and a minimum GPU utilization rate.
Further, the calculation formula for predicting the number of instances required for the next period of time by the optimal resource scheduling policy (LRU policy) is as follows:
wherein Z represents the number of instances required for the next period of time of the model,indicating the running average video memory usage, ZoFor the number of pod that the model has used, p represents the maximum utilization and p represents the minimum utilization.
The invention has the beneficial effects that: aiming at the scene of multi-model shared video memory, the models are dynamically started and stopped through an LRU (least recently used) scheduling strategy, so that the problem that the multi-model shared video memory has low utilization rate is solved, namely, the video memory occupation of the multi-model is effectively distributed, less video memory resources are distributed to the models with low utilization rate, more video memory resources are provided for the models with high utilization rate, the utilization rate of the video memory is improved, and the resources are saved; the real-time performance of container switching is improved through the real-time monitoring of the glanches; and the high speed of model switching is improved through redis fast cache.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flowchart of a model loading method for maximizing utilization of a graphics memory based on an LRU policy according to an embodiment of the present invention;
fig. 2 is a flowchart of a technical implementation of a model loading method for maximally increasing utilization of a video memory based on an LRU policy according to an embodiment of the present invention.
Detailed Description
For further explanation of the various embodiments, the drawings which form a part of the disclosure and which are incorporated in and constitute a part of this specification, illustrate embodiments and, together with the description, serve to explain the principles of operation of the embodiments, and to enable others of ordinary skill in the art to understand the various embodiments and advantages of the invention, and, by reference to these figures, reference is made to the accompanying drawings, which are not to scale and wherein like reference numerals generally refer to like elements.
According to the embodiment of the invention, a model loading method for maximally improving the utilization rate of a video memory based on an LRU (least recently used) strategy is provided.
Referring to the drawings and the detailed description, the present invention is further described, as shown in fig. 1, a model loading method for maximally increasing utilization rate of a video memory based on an LRU policy according to an embodiment of the present invention includes the following steps:
s1, constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example;
wherein, step S1 includes the following steps:
s11, configuring three model capabilities of face recognition, portrait comparison and human body analysis through an AI platform;
s12, respectively configuring 6 elastically telescopic examples for three models of face recognition, portrait comparison and human body analysis;
s13, configuring three models of face recognition, portrait comparison and human body analysis to the same display card;
and S14, deploying and starting three models of face recognition, portrait comparison and human body analysis through a container management platform (rancher).
S2, starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period;
wherein, step S2 includes the following steps:
s21, starting timing tasks, and acquiring the real-time resource utilization rate of the GPU in the period of time by a resource monitoring tool (Glances) every 10 minutes;
further, step S21 includes the steps of:
s211, respectively obtaining the number of the pictures analyzed by the three models in a first time period and a second time period;
the number of pictures processed in 1-10 minutes in the face recognition model is C1: 12021, number of pictures processed in 10-20 minutes C2: 8782 sheets;
figure contrast model, number of pictures processed in 1-10 minutes C1: 49389, number of pictures processed in 10-20 min C2: 30287 sheets of paper;
human analytical model, number of pictures processed in 1-10 minutes C1: 120789 sheets, number of pictures processed in 10-20 minutes C2: 152573 pieces.
S212, respectively obtaining the number of the pictures analyzed by the three models in the first time period, the number of the pictures analyzed by the three models in the second time period and the maximum number of the pictures analyzed by the three models in 1 second, and calculating to obtain the GPU real-time resource utilization rate, wherein the calculation formula is as follows:
wherein A represents the real-time resource utilization rate of the GPU, i, j are respectively a first time period and a second time period, and i>j,CiRepresenting the number of pictures, C, analyzed by the model during a first time periodjRepresenting the number of pictures j that the model analyzed during the second time period, and M representing the maximum number of pictures that the model can analyze in 1 second.
Furthermore, the face recognition model processes a maximum number of pictures M (per second): 50 sheets;
portrait contrast model maximum number of picture processing M (per second): 112 sheets;
human analysis model maximum number of picture processing M (per second): 258 pieces.
S22, storing the acquired GPU real-time utilization rate for the scheduling of the following optimal resource scheduling policy (LRU);
s23, circularly obtaining data of a certain period of time from a remote dictionary service (redis) by an optimal resource scheduling policy (LRU) scheduling center, sampling the real-time utilization rate of the GPU in the period of time, and obtaining the average GPU utilization rate in the period of time through calculation, wherein the calculation formula is as follows:
wherein the content of the first and second substances,the average GPU utilization rate is represented, I represents the sampling times of the real-time GPU utilization rate in a period of time, and J represents the number of model operation instances.
In addition, the average GPU resource utilization U of the face recognition model: 35.20 percent;
average GPU resource utilization U of the portrait comparison model: 81.67 percent;
average GPU resource utilization U of the human body analysis model: 88.29 percent.
S3, calculating the running average video memory utilization rate through the optimal resource scheduling strategy scheduling, wherein the calculation formula is as follows:
wherein the content of the first and second substances,is the running average video memory usage rate of the model in the t period,average GPU utilization for the model over a period of t, and when a moving average model is not used=Beta is a weighted random number of 0 to 1, where beta is set to 0.9;
and the above formula can be expanded as follows:
filling the usage rate of each time from t to 1 into a formula to calculate UtThe running average video memory usage at time t to 1.
S4, predicting the number of instances required by the next period of time through an optimal resource scheduling policy (LRU policy) according to the data information in the period of time;
the data information comprises average resource utilization rate, the number of used examples of each model, GPU maximum utilization rate and GPU minimum utilization rate.
The calculation formula for predicting the number of instances required for the next period of time by the optimal resource scheduling policy (LRU policy) is as follows:
wherein Z represents the number of instances required for the next period of time of the model,indicating the running average video memory usage, ZoFor the number of pod that the model has used, p represents the maximum utilization and p represents the minimum utilization.
S5, adjusting the number of the instances according to the number of the instances required by the model in the next period of time and the number of the instances used by the model;
and S6, finally realizing the maximization of the utilization rate of the video memory through an optimal resource scheduling strategy (LRU).
As shown in fig. 2, the method is further explained and explained by the following specific technical means and procedures:
and calling the Glances interface every 10 minutes through the timing task to obtain the video memory use condition of each model. Glanches can well monitor the use condition of the model video memory and provide an interface for real-time feedback to an application end.
And obtaining Glances return and writing the Glances return into a redis cache. The LinkedHashMap of Java realizes an LRU algorithm, and the principle is that a linked list is transformed when elements are inserted and accessed based on a rule of inserting and accessing records of a bi-directional linked list. Linked HashMap defaults to insert as sorting, accessOrder can be set as True, so that the sorting is similar to HashMap according to access conditions, specific internal implementation logic can be mainly rewritten access of newNode and afternode access according to the insertion and access sorting, the method realizes operation on a double linked list, elements are updated to the tail of the linked list when the elements are inserted, and data are updated to the head of the linked list when the elements are accessed.
And the timing task acquires the video memory occupancy rate of each model in the LRU cache every minute, calls a ran cher interface, reduces the number of instances of the model which uses the video memory least recently or less frequently, and even stops the model, so as to achieve the optimal utilization of the video memory. The ranker self-forms a set of container modules comprising a network, a storage, a load balance and dns, which run on the Linux, provide unified infrastructure services for the upper layers, and very conveniently provide an interface and an interface to manage the containers.
The monitoring task code is implemented as follows:
package com.iwhalecloud.aiFactory.aiinference;
import com.iwhalecloud.aiFactory.aiGateway.common.RancherUtil;
import com.iwhalecloud.aiFactory.aiGateway.common.interceptor.GpuUseInfo;
import com.iwhalecloud.aiFactory.aiResource.aiCmdb.host.vo.GpuData;
import com.iwhalecloud.aiFactory.aiinference.AirModelService;
import org.quartz.Job;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;
import java.util.List;
/**
* @author zj
description @ Description: monitoring the use condition of the model video memory at regular time, and starting and stopping the model according to the video memory occupancy rate
* @since 2021/5/20 14:24
*/
public class LRUJob implements Job {
/**
Monitoring the use condition of the model video memory at fixed time, and starting and stopping the model according to the video memory occupancy rate
**/
@Override
public void execute(JobExecutionContext context) throws JobExecutionException {
I/1 query all video memories in use
List<GpuData> gpuDataList = getGpuList();
for (GpuData gpuData : gpuDataList) {
I/2 query model lists sharing the same video memory
List<AirModelService> airModelServiceList = getModelByGpu(gpuData);
for (AirModelService airModelService : airModelServiceList) {
V/3 calling Glances interface to inquire the video memory occupancy rate of the model
GpuUseInfo gpuUseInfo = getModelGpuInfoByGlances(airModelService);
V/4, writing the model video memory temporary rate into the redis cache
putModelGpuUseInfo(gpuData.getId().toString() + "-" + airModelService.getId().toString(), gpuUseInfo);
}
V/5 starting and stopping the model according to the recent use condition of the model
dealModelByGpu(gpuData, airModelServiceList);
}
}
/**
Starting and stopping model according to recent use condition of model
**/
private void dealModelByGpu(GpuData gpuData, List<AirModelService> airModelServiceList) {
for (AirModelService airModelService : airModelServiceList) {
if (! isstart & & isLRUStart (gpuData, airModeservice)) {// model is in the stopped state and the start condition is reached
//5.1 Start-Up model
RancherUtil.start(airModelService);
}
else if (isStart (airModelservice) & & isLRUStop (gpuData, airModelservice)) {// model is in the startup state and the stop condition is reached
//5.2 stop model
RancherUtil.stop(airModelService);
}
}
}
}
Glanches monitoring data and interfaces are shown in Table 1:
TABLE 1
Glances provides a monitoring data acquisition interface, calls the Glances interface to store the use condition of the container video memory into a redis cache, and provides data support for the following LRU scheduling.
The LRU cache realizes that:
package com.iwhalecloud.aiFactory.aiinference;
import java.util.LinkedHashMap;
import java.util.Map;
/**
* @author zj
description @ Description: LRU cache
* @since 2021/5/20 15:11
*/
public class LRUCache {
private int cacheSize;
private LinkedHashMap<Integer,Integer> linkedHashMap;
public LRUCache(int capacity) {
this.cacheSize = capacity;
linkedHashMap = new LinkedHashMap<Integer,Integer>(capacity,0.75F,true){
@Override
protected boolean removeEldestEntry(Map.Entry eldest) {
return size()>cacheSize;
}
};
}
public int get(int key) {
return this.linkedHashMap.getOrDefault(key,-1);
}
public void put(int key,int value) {
this.linkedHashMap.put(key,value);
}
}
According to the utilization rate of the display memory, judging the start-stop code implementation by using LRU strategy cache:
package com.iwhalecloud.aiFactory.aiinference;
import com.iwhalecloud.aiFactory.aiinference.AirModelService;
public class RancherUtil {
// Start model
public static boolean start(AirModelService airModelService) {
v/Call ran cher interface, Start model
return startModelByRancher(airModelService);
}
// stop model
public static boolean stop(AirModelService airModelService) {
// Call ran interface, stop model
return sotpModelByRancher(airModelService);
}
}
In summary, by means of the above technical solution of the present invention, for a scenario of sharing a video memory with multiple models, a model is dynamically started and stopped by an LRU scheduling policy, so as to solve a pain point of low utilization rate of the video memory shared by multiple models, that is, occupation of the video memory of multiple models is effectively allocated, less video memory resources are allocated to the model with low utilization rate, and more video memory resources are provided to the model with high utilization rate, thereby improving utilization rate of the video memory and further saving resources; the real-time performance of container switching is improved through the real-time monitoring of the glanches; and the high speed of model switching is improved through redis fast cache.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (9)
1. The model loading method for maximizing and improving the utilization rate of the video memory based on the LRU strategy is characterized by comprising the following steps of:
constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example;
starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period;
calculating the use rate of the moving average video memory by scheduling the optimal resource scheduling strategy;
according to the data information in the period of time, predicting the number of instances required by the next period of time through an optimal resource scheduling strategy;
adjusting the number of the examples according to the number of the examples needed by the model in the next period of time and the number of the examples used by the model;
and finally realizing the maximization of the video memory utilization rate through the optimal resource scheduling strategy.
2. The model loading method for maximally improving video memory utilization rate based on LRU policy of claim 1, wherein the constructing and deploying three models of face recognition, portrait comparison and human body analysis and configuring the instance comprises the following steps:
three model capabilities of face recognition, portrait comparison and human body analysis are configured through an AI platform;
6 elastically telescopic examples are respectively configured for three models of face recognition, portrait comparison and human body analysis;
configuring three models of face recognition, portrait comparison and human body analysis to the same display card;
and deploying and starting three models of face recognition, portrait comparison and human body analysis through a container management platform.
3. A model loading method for maximizing and improving video memory utilization rate based on LRU strategy as claimed in claim 2, wherein said starting the timing task to obtain the real-time GPU utilization rate in the time period every 10 minutes and calculating the average GPU utilization rate in the time period comprises the following steps:
starting a timing task, and acquiring the real-time resource utilization rate of the GPU in the period of time by a resource monitoring tool every 10 minutes;
storing the acquired GPU real-time utilization rate for scheduling and using a subsequent optimal resource scheduling strategy;
the optimal resource scheduling strategy scheduling center circularly obtains data of a certain period of time from the remote dictionary service, samples the real-time utilization rate of the GPU in the period of time, and obtains the average GPU utilization rate in the period of time through calculation.
4. A model loading method for maximally improving utilization rate of a video memory based on an LRU policy according to claim 3, wherein the step of obtaining real-time resource utilization rate of the GPU in the period of time by the resource monitoring tool every 10 minutes comprises the following steps:
respectively acquiring the number of pictures analyzed by the three models in a first time period and a second time period;
and respectively obtaining the number of the pictures analyzed by the three models in the first time period, the number of the pictures analyzed by the three models in the second time period and the maximum number of the pictures analyzed by the three models in 1 second, and calculating to obtain the GPU real-time resource utilization rate.
5. A model loading method for maximally improving utilization rate of video memory based on LRU policy according to claim 4, wherein the formula for calculating real-time resource utilization rate of GPU is as follows:
wherein A represents the real-time resource utilization rate of the GPU, i, j are respectively a first time period and a second time period, and i>j,CiGraph representing analysis of a model over a first time periodNumber of pieces, CjRepresenting the number of pictures j that the model analyzed during the second time period, and M representing the maximum number of pictures that the model can analyze in 1 second.
6. A model loading method for maximally improving video memory utilization rate based on LRU policy according to claim 5, wherein the calculation formula for obtaining the average GPU utilization rate in the period of time through calculation is as follows:
7. The model loading method for maximizing and improving video memory utilization rate based on the LRU policy of claim 6, wherein the formula for calculating the moving average video memory utilization rate by scheduling the optimal resource scheduling policy is as follows:
wherein the content of the first and second substances,is the running average video memory usage rate of the model in the t period,average GPU utilization for the model over a period of t, and when a moving average model is not used=Beta is a weighted random number of 0 to 1, where beta is set to 0.9;
and the above formula can be expanded as follows:
filling the usage rate of each time from t to 1 into a formula to calculate UtThe running average video memory usage at time t to 1.
8. A method as recited in claim 7, wherein the data information includes average resource utilization, number of instances used per model, maximum GPU utilization, and minimum GPU utilization.
9. The method of claim 8, wherein the calculation formula for predicting the number of instances required for the next period of time by the optimal resource scheduling policy is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111001401.5A CN113674137A (en) | 2021-08-30 | 2021-08-30 | Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111001401.5A CN113674137A (en) | 2021-08-30 | 2021-08-30 | Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113674137A true CN113674137A (en) | 2021-11-19 |
Family
ID=78547341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111001401.5A Pending CN113674137A (en) | 2021-08-30 | 2021-08-30 | Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113674137A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117687802A (en) * | 2024-02-02 | 2024-03-12 | 湖南马栏山视频先进技术研究院有限公司 | Deep learning parallel scheduling method and device based on cloud platform and cloud platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170195247A1 (en) * | 2015-12-31 | 2017-07-06 | EMC IP Holding Company LLC | Method and apparatus for cloud system |
CN111158908A (en) * | 2019-12-27 | 2020-05-15 | 重庆紫光华山智安科技有限公司 | Kubernetes-based scheduling method and device for improving GPU utilization rate |
CN111506404A (en) * | 2020-04-07 | 2020-08-07 | 上海德拓信息技术股份有限公司 | Kubernetes-based shared GPU (graphics processing Unit) scheduling method |
CN113051060A (en) * | 2021-04-10 | 2021-06-29 | 作业帮教育科技(北京)有限公司 | GPU dynamic scheduling method and device based on real-time load and electronic equipment |
-
2021
- 2021-08-30 CN CN202111001401.5A patent/CN113674137A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170195247A1 (en) * | 2015-12-31 | 2017-07-06 | EMC IP Holding Company LLC | Method and apparatus for cloud system |
CN111158908A (en) * | 2019-12-27 | 2020-05-15 | 重庆紫光华山智安科技有限公司 | Kubernetes-based scheduling method and device for improving GPU utilization rate |
CN111506404A (en) * | 2020-04-07 | 2020-08-07 | 上海德拓信息技术股份有限公司 | Kubernetes-based shared GPU (graphics processing Unit) scheduling method |
CN113051060A (en) * | 2021-04-10 | 2021-06-29 | 作业帮教育科技(北京)有限公司 | GPU dynamic scheduling method and device based on real-time load and electronic equipment |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117687802A (en) * | 2024-02-02 | 2024-03-12 | 湖南马栏山视频先进技术研究院有限公司 | Deep learning parallel scheduling method and device based on cloud platform and cloud platform |
CN117687802B (en) * | 2024-02-02 | 2024-04-30 | 湖南马栏山视频先进技术研究院有限公司 | Deep learning parallel scheduling method and device based on cloud platform and cloud platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10990540B2 (en) | Memory management method and apparatus | |
US7665090B1 (en) | System, method, and computer program product for group scheduling of computer resources | |
CN108848039B (en) | Server, message distribution method and storage medium | |
US6442661B1 (en) | Self-tuning memory management for computer systems | |
US8195798B2 (en) | Application server scalability through runtime restrictions enforcement in a distributed application execution system | |
US8078574B1 (en) | Network acceleration device cache supporting multiple historical versions of content | |
CN113674133B (en) | GPU cluster shared video memory system, method, device and equipment | |
CN105512053B (en) | The mirror cache method of mobile transparent computing system server end multi-user access | |
US9086920B2 (en) | Device for managing data buffers in a memory space divided into a plurality of memory elements | |
EP1782205A2 (en) | Autonomically tuning the virtual memory subsystem of a computer operating system | |
US6286088B1 (en) | Memory management system and method for relocating memory | |
CN100361094C (en) | Method for saving global varible internal memory space | |
US7904688B1 (en) | Memory management unit for field programmable gate array boards | |
CN113674137A (en) | Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy | |
CN111984425A (en) | Memory management method, device and equipment for operating system | |
CN108038062B (en) | Memory management method and device of embedded system | |
US6631446B1 (en) | Self-tuning buffer management | |
CN111857992A (en) | Thread resource allocation method and device in Radosgw module | |
US6807588B2 (en) | Method and apparatus for maintaining order in a queue by combining entry weights and queue weights | |
CN117271137A (en) | Multithreading data slicing parallel method | |
US20090019097A1 (en) | System and method for memory allocation management | |
CN114327862B (en) | Memory allocation method and device, electronic equipment and storage medium | |
CN107924363A (en) | Use the automated storing device management of memory management unit | |
CN117435343A (en) | Memory management method and device | |
CN109408412B (en) | Memory prefetch control method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211119 |
|
RJ01 | Rejection of invention patent application after publication |