CN113674137A - Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy - Google Patents

Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy Download PDF

Info

Publication number
CN113674137A
CN113674137A CN202111001401.5A CN202111001401A CN113674137A CN 113674137 A CN113674137 A CN 113674137A CN 202111001401 A CN202111001401 A CN 202111001401A CN 113674137 A CN113674137 A CN 113674137A
Authority
CN
China
Prior art keywords
time
utilization rate
model
period
video memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111001401.5A
Other languages
Chinese (zh)
Inventor
钟靖
吴小炎
吴名朝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Whale Cloud Technology Co Ltd
Original Assignee
Whale Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whale Cloud Technology Co Ltd filed Critical Whale Cloud Technology Co Ltd
Priority to CN202111001401.5A priority Critical patent/CN113674137A/en
Publication of CN113674137A publication Critical patent/CN113674137A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a model loading method for maximally improving the utilization rate of a video memory based on an LRU (least recently used) strategy, which comprises the following steps of: constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example; starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period; calculating the use rate of the moving average video memory by scheduling the optimal resource scheduling strategy; according to the data information in the period of time, predicting the number of instances required by the next period of time through an optimal resource scheduling strategy; the number of instances is adjusted according to the number of instances required by the model in the next period of time and the number of instances used by the model. Has the advantages that: through the LRU scheduling strategy, the models are dynamically started and stopped, the pain point of low utilization rate of the multi-model shared video memory is solved, the utilization rate of the video memory is improved, and resources are saved.

Description

Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy
Technical Field
The invention relates to the technical field of video memory, in particular to a model loading method for maximizing the utilization rate of the video memory based on an LRU (least recently used) strategy.
Background
When a large enterprise carries out digital transformation, an AI scene is bound to face, the requirements of AI application and AI capacity are met, in the production process of real AI capacity, the calling of AI capacity is bound to exist, usually, API realization is provided for the outside based on an AI capacity open platform, the uploading and the deployment of AI capacity are carried out based on a model version, in the capacity deployment, single model and multi-model combined deployment exist, obviously, the value of resource utilization can be better reflected by the multi-model combined deployment, and on the basis of the multi-model deployment, the problem of resource sharing of a CPU, a GPU, a memory and a video memory needs to be solved. In the daily production process of AI (multiple models), the differentiation demands on the model calling amount in different time periods in application certainly exist, and the differentiation between intensive calling of an A model and scattered calling of a B model in the same AI capacity or even zero calling needs to be solved, so that the resources of the A model are insufficient, and the resources of the B model are wasted; and the requirement of model replacement in a running state exists, namely the same capability comprises a plurality of models (A, B, C), each model starts a plurality of instances, the former resources can only support the A and B non-calling requests with calling quantity, and the later production operation can have the requirement that the B non-calling quantity and the C have the calling quantity, so that the occupation and the waste of resources are caused.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a model loading method for maximizing the utilization rate of the video memory based on the LRU strategy, so as to overcome the technical problems in the prior related art.
Therefore, the invention adopts the following specific technical scheme:
the model loading method for maximizing and improving the utilization rate of the video memory based on the LRU strategy comprises the following steps:
constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example;
starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period;
calculating the use rate of the moving average video memory by scheduling the optimal resource scheduling strategy;
according to the data information in the period of time, predicting the number of instances required by the next period of time through an optimal resource scheduling strategy;
adjusting the number of the examples according to the number of the examples needed by the model in the next period of time and the number of the examples used by the model;
and finally realizing the maximization of the video memory utilization rate through the optimal resource scheduling strategy.
Further, the construction and deployment of the three models of face recognition, portrait comparison and human body analysis and the configuration of the example comprise the following steps:
three model capabilities of face recognition, portrait comparison and human body analysis are configured through an AI platform;
6 elastically telescopic examples are respectively configured for three models of face recognition, portrait comparison and human body analysis;
configuring three models of face recognition, portrait comparison and human body analysis to the same display card;
and deploying and starting three models of face recognition, portrait comparison and human body analysis through a container management platform.
Further, the starting of the timing task, obtaining the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period includes the following steps:
starting a timing task, and acquiring the real-time resource utilization rate of the GPU in the period of time by a resource monitoring tool every 10 minutes;
storing the acquired GPU real-time utilization rate for the scheduling of the optimal resource scheduling policy (LRU);
the optimal resource scheduling strategy scheduling center circularly obtains data of a certain period of time from the remote dictionary service, samples the real-time utilization rate of the GPU in the period of time, and obtains the average GPU utilization rate in the period of time through calculation.
Further, the step of obtaining the real-time resource utilization rate of the GPU in the period of time by the resource monitoring tool every 10 minutes includes the following steps:
respectively acquiring the number of pictures analyzed by the three models in a first time period and a second time period;
and respectively obtaining the number of the pictures analyzed by the three models in the first time period, the number of the pictures analyzed by the three models in the second time period and the maximum number of the pictures analyzed by the three models in 1 second, and calculating to obtain the GPU real-time resource utilization rate.
Further, the formula for calculating the real-time resource utilization rate of the GPU is as follows:
Figure 117371DEST_PATH_IMAGE001
wherein A represents the real-time resource utilization rate of the GPU, i, j are respectively a first time period and a second time period, and i>j,CiRepresenting the number of pictures, C, analyzed by the model during a first time periodjRepresenting the number of pictures j that the model analyzed during the second time period, and M representing the maximum number of pictures that the model can analyze in 1 second.
Further, the calculation formula for obtaining the average GPU utilization within the period of time through calculation is as follows:
Figure 726207DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 837382DEST_PATH_IMAGE003
the average GPU utilization rate is represented, I represents the sampling times of the real-time GPU utilization rate in a period of time, and J represents the number of model operation instances.
Further, the formula for calculating the running average video memory usage rate by scheduling the optimal resource scheduling policy is as follows:
Figure 785747DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 753703DEST_PATH_IMAGE005
for moving average display memory of model in t periodThe utilization rate of the water-based paint is improved,
Figure 286053DEST_PATH_IMAGE006
average GPU utilization for the model over a period of t, and when a moving average model is not used
Figure 997657DEST_PATH_IMAGE005
=
Figure 800528DEST_PATH_IMAGE006
Beta is a weighted random number of 0 to 1, where beta is set to 0.9;
and the above formula can be expanded as follows:
Figure 939385DEST_PATH_IMAGE007
filling the usage rate of each time from t to 1 into a formula to calculate UtThe running average video memory usage at time t to 1.
Further, the data information includes an average resource utilization rate, a number of used instances of each model, a maximum GPU utilization rate, and a minimum GPU utilization rate.
Further, the calculation formula for predicting the number of instances required for the next period of time by the optimal resource scheduling policy (LRU policy) is as follows:
Figure 460496DEST_PATH_IMAGE008
wherein Z represents the number of instances required for the next period of time of the model,
Figure 710212DEST_PATH_IMAGE009
indicating the running average video memory usage, ZoFor the number of pod that the model has used, p represents the maximum utilization and p represents the minimum utilization.
The invention has the beneficial effects that: aiming at the scene of multi-model shared video memory, the models are dynamically started and stopped through an LRU (least recently used) scheduling strategy, so that the problem that the multi-model shared video memory has low utilization rate is solved, namely, the video memory occupation of the multi-model is effectively distributed, less video memory resources are distributed to the models with low utilization rate, more video memory resources are provided for the models with high utilization rate, the utilization rate of the video memory is improved, and the resources are saved; the real-time performance of container switching is improved through the real-time monitoring of the glanches; and the high speed of model switching is improved through redis fast cache.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flowchart of a model loading method for maximizing utilization of a graphics memory based on an LRU policy according to an embodiment of the present invention;
fig. 2 is a flowchart of a technical implementation of a model loading method for maximally increasing utilization of a video memory based on an LRU policy according to an embodiment of the present invention.
Detailed Description
For further explanation of the various embodiments, the drawings which form a part of the disclosure and which are incorporated in and constitute a part of this specification, illustrate embodiments and, together with the description, serve to explain the principles of operation of the embodiments, and to enable others of ordinary skill in the art to understand the various embodiments and advantages of the invention, and, by reference to these figures, reference is made to the accompanying drawings, which are not to scale and wherein like reference numerals generally refer to like elements.
According to the embodiment of the invention, a model loading method for maximally improving the utilization rate of a video memory based on an LRU (least recently used) strategy is provided.
Referring to the drawings and the detailed description, the present invention is further described, as shown in fig. 1, a model loading method for maximally increasing utilization rate of a video memory based on an LRU policy according to an embodiment of the present invention includes the following steps:
s1, constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example;
wherein, step S1 includes the following steps:
s11, configuring three model capabilities of face recognition, portrait comparison and human body analysis through an AI platform;
s12, respectively configuring 6 elastically telescopic examples for three models of face recognition, portrait comparison and human body analysis;
s13, configuring three models of face recognition, portrait comparison and human body analysis to the same display card;
and S14, deploying and starting three models of face recognition, portrait comparison and human body analysis through a container management platform (rancher).
S2, starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period;
wherein, step S2 includes the following steps:
s21, starting timing tasks, and acquiring the real-time resource utilization rate of the GPU in the period of time by a resource monitoring tool (Glances) every 10 minutes;
further, step S21 includes the steps of:
s211, respectively obtaining the number of the pictures analyzed by the three models in a first time period and a second time period;
the number of pictures processed in 1-10 minutes in the face recognition model is C1: 12021, number of pictures processed in 10-20 minutes C2: 8782 sheets;
figure contrast model, number of pictures processed in 1-10 minutes C1: 49389, number of pictures processed in 10-20 min C2: 30287 sheets of paper;
human analytical model, number of pictures processed in 1-10 minutes C1: 120789 sheets, number of pictures processed in 10-20 minutes C2: 152573 pieces.
S212, respectively obtaining the number of the pictures analyzed by the three models in the first time period, the number of the pictures analyzed by the three models in the second time period and the maximum number of the pictures analyzed by the three models in 1 second, and calculating to obtain the GPU real-time resource utilization rate, wherein the calculation formula is as follows:
Figure 633169DEST_PATH_IMAGE001
wherein A represents the real-time resource utilization rate of the GPU, i, j are respectively a first time period and a second time period, and i>j,CiRepresenting the number of pictures, C, analyzed by the model during a first time periodjRepresenting the number of pictures j that the model analyzed during the second time period, and M representing the maximum number of pictures that the model can analyze in 1 second.
Furthermore, the face recognition model processes a maximum number of pictures M (per second): 50 sheets;
portrait contrast model maximum number of picture processing M (per second): 112 sheets;
human analysis model maximum number of picture processing M (per second): 258 pieces.
S22, storing the acquired GPU real-time utilization rate for the scheduling of the following optimal resource scheduling policy (LRU);
s23, circularly obtaining data of a certain period of time from a remote dictionary service (redis) by an optimal resource scheduling policy (LRU) scheduling center, sampling the real-time utilization rate of the GPU in the period of time, and obtaining the average GPU utilization rate in the period of time through calculation, wherein the calculation formula is as follows:
Figure 138667DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 209391DEST_PATH_IMAGE003
the average GPU utilization rate is represented, I represents the sampling times of the real-time GPU utilization rate in a period of time, and J represents the number of model operation instances.
In addition, the average GPU resource utilization U of the face recognition model: 35.20 percent;
average GPU resource utilization U of the portrait comparison model: 81.67 percent;
average GPU resource utilization U of the human body analysis model: 88.29 percent.
S3, calculating the running average video memory utilization rate through the optimal resource scheduling strategy scheduling, wherein the calculation formula is as follows:
Figure 200481DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 977944DEST_PATH_IMAGE005
is the running average video memory usage rate of the model in the t period,
Figure 87633DEST_PATH_IMAGE006
average GPU utilization for the model over a period of t, and when a moving average model is not used
Figure 317757DEST_PATH_IMAGE005
=
Figure 174854DEST_PATH_IMAGE006
Beta is a weighted random number of 0 to 1, where beta is set to 0.9;
and the above formula can be expanded as follows:
Figure 806824DEST_PATH_IMAGE007
filling the usage rate of each time from t to 1 into a formula to calculate UtThe running average video memory usage at time t to 1.
S4, predicting the number of instances required by the next period of time through an optimal resource scheduling policy (LRU policy) according to the data information in the period of time;
the data information comprises average resource utilization rate, the number of used examples of each model, GPU maximum utilization rate and GPU minimum utilization rate.
The calculation formula for predicting the number of instances required for the next period of time by the optimal resource scheduling policy (LRU policy) is as follows:
Figure 701793DEST_PATH_IMAGE008
wherein Z represents the number of instances required for the next period of time of the model,
Figure 684793DEST_PATH_IMAGE009
indicating the running average video memory usage, ZoFor the number of pod that the model has used, p represents the maximum utilization and p represents the minimum utilization.
S5, adjusting the number of the instances according to the number of the instances required by the model in the next period of time and the number of the instances used by the model;
and S6, finally realizing the maximization of the utilization rate of the video memory through an optimal resource scheduling strategy (LRU).
As shown in fig. 2, the method is further explained and explained by the following specific technical means and procedures:
and calling the Glances interface every 10 minutes through the timing task to obtain the video memory use condition of each model. Glanches can well monitor the use condition of the model video memory and provide an interface for real-time feedback to an application end.
And obtaining Glances return and writing the Glances return into a redis cache. The LinkedHashMap of Java realizes an LRU algorithm, and the principle is that a linked list is transformed when elements are inserted and accessed based on a rule of inserting and accessing records of a bi-directional linked list. Linked HashMap defaults to insert as sorting, accessOrder can be set as True, so that the sorting is similar to HashMap according to access conditions, specific internal implementation logic can be mainly rewritten access of newNode and afternode access according to the insertion and access sorting, the method realizes operation on a double linked list, elements are updated to the tail of the linked list when the elements are inserted, and data are updated to the head of the linked list when the elements are accessed.
And the timing task acquires the video memory occupancy rate of each model in the LRU cache every minute, calls a ran cher interface, reduces the number of instances of the model which uses the video memory least recently or less frequently, and even stops the model, so as to achieve the optimal utilization of the video memory. The ranker self-forms a set of container modules comprising a network, a storage, a load balance and dns, which run on the Linux, provide unified infrastructure services for the upper layers, and very conveniently provide an interface and an interface to manage the containers.
The monitoring task code is implemented as follows:
package com.iwhalecloud.aiFactory.aiinference;
import com.iwhalecloud.aiFactory.aiGateway.common.RancherUtil;
import com.iwhalecloud.aiFactory.aiGateway.common.interceptor.GpuUseInfo;
import com.iwhalecloud.aiFactory.aiResource.aiCmdb.host.vo.GpuData;
import com.iwhalecloud.aiFactory.aiinference.AirModelService;
import org.quartz.Job;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;
import java.util.List;
/**
* @author zj
description @ Description: monitoring the use condition of the model video memory at regular time, and starting and stopping the model according to the video memory occupancy rate
* @since 2021/5/20 14:24
*/
public class LRUJob implements Job {
/**
Monitoring the use condition of the model video memory at fixed time, and starting and stopping the model according to the video memory occupancy rate
**/
@Override
public void execute(JobExecutionContext context) throws JobExecutionException {
I/1 query all video memories in use
List<GpuData> gpuDataList = getGpuList();
for (GpuData gpuData : gpuDataList) {
I/2 query model lists sharing the same video memory
List<AirModelService> airModelServiceList = getModelByGpu(gpuData);
for (AirModelService airModelService : airModelServiceList) {
V/3 calling Glances interface to inquire the video memory occupancy rate of the model
GpuUseInfo gpuUseInfo = getModelGpuInfoByGlances(airModelService);
V/4, writing the model video memory temporary rate into the redis cache
putModelGpuUseInfo(gpuData.getId().toString() + "-" + airModelService.getId().toString(), gpuUseInfo);
}
V/5 starting and stopping the model according to the recent use condition of the model
dealModelByGpu(gpuData, airModelServiceList);
}
}
/**
Starting and stopping model according to recent use condition of model
**/
private void dealModelByGpu(GpuData gpuData, List<AirModelService> airModelServiceList) {
for (AirModelService airModelService : airModelServiceList) {
if (! isstart & & isLRUStart (gpuData, airModeservice)) {// model is in the stopped state and the start condition is reached
//5.1 Start-Up model
RancherUtil.start(airModelService);
}
else if (isStart (airModelservice) & & isLRUStop (gpuData, airModelservice)) {// model is in the startup state and the stop condition is reached
//5.2 stop model
RancherUtil.stop(airModelService);
}
}
}
}
Glanches monitoring data and interfaces are shown in Table 1:
TABLE 1
Figure 80002DEST_PATH_IMAGE010
Glances provides a monitoring data acquisition interface, calls the Glances interface to store the use condition of the container video memory into a redis cache, and provides data support for the following LRU scheduling.
The LRU cache realizes that:
package com.iwhalecloud.aiFactory.aiinference;
import java.util.LinkedHashMap;
import java.util.Map;
/**
* @author zj
description @ Description: LRU cache
* @since 2021/5/20 15:11
*/
public class LRUCache {
private int cacheSize;
private LinkedHashMap<Integer,Integer> linkedHashMap;
public LRUCache(int capacity) {
this.cacheSize = capacity;
linkedHashMap = new LinkedHashMap<Integer,Integer>(capacity,0.75F,true){
@Override
protected boolean removeEldestEntry(Map.Entry eldest) {
return size()>cacheSize;
}
};
}
public int get(int key) {
return this.linkedHashMap.getOrDefault(key,-1);
}
public void put(int key,int value) {
this.linkedHashMap.put(key,value);
}
}
According to the utilization rate of the display memory, judging the start-stop code implementation by using LRU strategy cache:
package com.iwhalecloud.aiFactory.aiinference;
import com.iwhalecloud.aiFactory.aiinference.AirModelService;
public class RancherUtil {
// Start model
public static boolean start(AirModelService airModelService) {
v/Call ran cher interface, Start model
return startModelByRancher(airModelService);
}
// stop model
public static boolean stop(AirModelService airModelService) {
// Call ran interface, stop model
return sotpModelByRancher(airModelService);
}
}
In summary, by means of the above technical solution of the present invention, for a scenario of sharing a video memory with multiple models, a model is dynamically started and stopped by an LRU scheduling policy, so as to solve a pain point of low utilization rate of the video memory shared by multiple models, that is, occupation of the video memory of multiple models is effectively allocated, less video memory resources are allocated to the model with low utilization rate, and more video memory resources are provided to the model with high utilization rate, thereby improving utilization rate of the video memory and further saving resources; the real-time performance of container switching is improved through the real-time monitoring of the glanches; and the high speed of model switching is improved through redis fast cache.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. The model loading method for maximizing and improving the utilization rate of the video memory based on the LRU strategy is characterized by comprising the following steps of:
constructing and deploying three models of face recognition, portrait comparison and human body analysis, and configuring an example;
starting a timing task, acquiring the real-time utilization rate of the GPU in the time period every 10 minutes, and calculating the average GPU utilization rate in the time period;
calculating the use rate of the moving average video memory by scheduling the optimal resource scheduling strategy;
according to the data information in the period of time, predicting the number of instances required by the next period of time through an optimal resource scheduling strategy;
adjusting the number of the examples according to the number of the examples needed by the model in the next period of time and the number of the examples used by the model;
and finally realizing the maximization of the video memory utilization rate through the optimal resource scheduling strategy.
2. The model loading method for maximally improving video memory utilization rate based on LRU policy of claim 1, wherein the constructing and deploying three models of face recognition, portrait comparison and human body analysis and configuring the instance comprises the following steps:
three model capabilities of face recognition, portrait comparison and human body analysis are configured through an AI platform;
6 elastically telescopic examples are respectively configured for three models of face recognition, portrait comparison and human body analysis;
configuring three models of face recognition, portrait comparison and human body analysis to the same display card;
and deploying and starting three models of face recognition, portrait comparison and human body analysis through a container management platform.
3. A model loading method for maximizing and improving video memory utilization rate based on LRU strategy as claimed in claim 2, wherein said starting the timing task to obtain the real-time GPU utilization rate in the time period every 10 minutes and calculating the average GPU utilization rate in the time period comprises the following steps:
starting a timing task, and acquiring the real-time resource utilization rate of the GPU in the period of time by a resource monitoring tool every 10 minutes;
storing the acquired GPU real-time utilization rate for scheduling and using a subsequent optimal resource scheduling strategy;
the optimal resource scheduling strategy scheduling center circularly obtains data of a certain period of time from the remote dictionary service, samples the real-time utilization rate of the GPU in the period of time, and obtains the average GPU utilization rate in the period of time through calculation.
4. A model loading method for maximally improving utilization rate of a video memory based on an LRU policy according to claim 3, wherein the step of obtaining real-time resource utilization rate of the GPU in the period of time by the resource monitoring tool every 10 minutes comprises the following steps:
respectively acquiring the number of pictures analyzed by the three models in a first time period and a second time period;
and respectively obtaining the number of the pictures analyzed by the three models in the first time period, the number of the pictures analyzed by the three models in the second time period and the maximum number of the pictures analyzed by the three models in 1 second, and calculating to obtain the GPU real-time resource utilization rate.
5. A model loading method for maximally improving utilization rate of video memory based on LRU policy according to claim 4, wherein the formula for calculating real-time resource utilization rate of GPU is as follows:
Figure 31661DEST_PATH_IMAGE001
wherein A represents the real-time resource utilization rate of the GPU, i, j are respectively a first time period and a second time period, and i>j,CiGraph representing analysis of a model over a first time periodNumber of pieces, CjRepresenting the number of pictures j that the model analyzed during the second time period, and M representing the maximum number of pictures that the model can analyze in 1 second.
6. A model loading method for maximally improving video memory utilization rate based on LRU policy according to claim 5, wherein the calculation formula for obtaining the average GPU utilization rate in the period of time through calculation is as follows:
Figure 997343DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 116609DEST_PATH_IMAGE003
the average GPU utilization rate is represented, I represents the sampling times of the real-time GPU utilization rate in a period of time, and J represents the number of model operation instances.
7. The model loading method for maximizing and improving video memory utilization rate based on the LRU policy of claim 6, wherein the formula for calculating the moving average video memory utilization rate by scheduling the optimal resource scheduling policy is as follows:
Figure 306282DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 143788DEST_PATH_IMAGE005
is the running average video memory usage rate of the model in the t period,
Figure 146117DEST_PATH_IMAGE006
average GPU utilization for the model over a period of t, and when a moving average model is not used
Figure 447785DEST_PATH_IMAGE005
=
Figure 746042DEST_PATH_IMAGE006
Beta is a weighted random number of 0 to 1, where beta is set to 0.9;
and the above formula can be expanded as follows:
Figure 133161DEST_PATH_IMAGE007
filling the usage rate of each time from t to 1 into a formula to calculate UtThe running average video memory usage at time t to 1.
8. A method as recited in claim 7, wherein the data information includes average resource utilization, number of instances used per model, maximum GPU utilization, and minimum GPU utilization.
9. The method of claim 8, wherein the calculation formula for predicting the number of instances required for the next period of time by the optimal resource scheduling policy is as follows:
Figure 440646DEST_PATH_IMAGE008
wherein Z represents the number of instances required for the next period of time of the model,
Figure 268925DEST_PATH_IMAGE009
represents the running average video memory usage, ZoFor the number of pod that the model has used, pmax represents the maximum utilization and pmin represents the minimum utilization.
CN202111001401.5A 2021-08-30 2021-08-30 Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy Pending CN113674137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111001401.5A CN113674137A (en) 2021-08-30 2021-08-30 Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111001401.5A CN113674137A (en) 2021-08-30 2021-08-30 Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy

Publications (1)

Publication Number Publication Date
CN113674137A true CN113674137A (en) 2021-11-19

Family

ID=78547341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111001401.5A Pending CN113674137A (en) 2021-08-30 2021-08-30 Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy

Country Status (1)

Country Link
CN (1) CN113674137A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117687802A (en) * 2024-02-02 2024-03-12 湖南马栏山视频先进技术研究院有限公司 Deep learning parallel scheduling method and device based on cloud platform and cloud platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170195247A1 (en) * 2015-12-31 2017-07-06 EMC IP Holding Company LLC Method and apparatus for cloud system
CN111158908A (en) * 2019-12-27 2020-05-15 重庆紫光华山智安科技有限公司 Kubernetes-based scheduling method and device for improving GPU utilization rate
CN111506404A (en) * 2020-04-07 2020-08-07 上海德拓信息技术股份有限公司 Kubernetes-based shared GPU (graphics processing Unit) scheduling method
CN113051060A (en) * 2021-04-10 2021-06-29 作业帮教育科技(北京)有限公司 GPU dynamic scheduling method and device based on real-time load and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170195247A1 (en) * 2015-12-31 2017-07-06 EMC IP Holding Company LLC Method and apparatus for cloud system
CN111158908A (en) * 2019-12-27 2020-05-15 重庆紫光华山智安科技有限公司 Kubernetes-based scheduling method and device for improving GPU utilization rate
CN111506404A (en) * 2020-04-07 2020-08-07 上海德拓信息技术股份有限公司 Kubernetes-based shared GPU (graphics processing Unit) scheduling method
CN113051060A (en) * 2021-04-10 2021-06-29 作业帮教育科技(北京)有限公司 GPU dynamic scheduling method and device based on real-time load and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117687802A (en) * 2024-02-02 2024-03-12 湖南马栏山视频先进技术研究院有限公司 Deep learning parallel scheduling method and device based on cloud platform and cloud platform
CN117687802B (en) * 2024-02-02 2024-04-30 湖南马栏山视频先进技术研究院有限公司 Deep learning parallel scheduling method and device based on cloud platform and cloud platform

Similar Documents

Publication Publication Date Title
US10990540B2 (en) Memory management method and apparatus
US7665090B1 (en) System, method, and computer program product for group scheduling of computer resources
CN108848039B (en) Server, message distribution method and storage medium
US6442661B1 (en) Self-tuning memory management for computer systems
US8195798B2 (en) Application server scalability through runtime restrictions enforcement in a distributed application execution system
US8078574B1 (en) Network acceleration device cache supporting multiple historical versions of content
CN113674133B (en) GPU cluster shared video memory system, method, device and equipment
CN105512053B (en) The mirror cache method of mobile transparent computing system server end multi-user access
US9086920B2 (en) Device for managing data buffers in a memory space divided into a plurality of memory elements
EP1782205A2 (en) Autonomically tuning the virtual memory subsystem of a computer operating system
US6286088B1 (en) Memory management system and method for relocating memory
CN100361094C (en) Method for saving global varible internal memory space
US7904688B1 (en) Memory management unit for field programmable gate array boards
CN113674137A (en) Model loading method for maximizing and improving video memory utilization rate based on LRU (least recently used) strategy
CN111984425A (en) Memory management method, device and equipment for operating system
CN108038062B (en) Memory management method and device of embedded system
US6631446B1 (en) Self-tuning buffer management
CN111857992A (en) Thread resource allocation method and device in Radosgw module
US6807588B2 (en) Method and apparatus for maintaining order in a queue by combining entry weights and queue weights
CN117271137A (en) Multithreading data slicing parallel method
US20090019097A1 (en) System and method for memory allocation management
CN114327862B (en) Memory allocation method and device, electronic equipment and storage medium
CN107924363A (en) Use the automated storing device management of memory management unit
CN117435343A (en) Memory management method and device
CN109408412B (en) Memory prefetch control method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211119

RJ01 Rejection of invention patent application after publication