CN114025163A - Image decoding method, image decoding system and related device - Google Patents

Image decoding method, image decoding system and related device Download PDF

Info

Publication number
CN114025163A
CN114025163A CN202111143450.2A CN202111143450A CN114025163A CN 114025163 A CN114025163 A CN 114025163A CN 202111143450 A CN202111143450 A CN 202111143450A CN 114025163 A CN114025163 A CN 114025163A
Authority
CN
China
Prior art keywords
load
gpu
image decoding
optimal
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111143450.2A
Other languages
Chinese (zh)
Inventor
王鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN202111143450.2A priority Critical patent/CN114025163A/en
Publication of CN114025163A publication Critical patent/CN114025163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present application provides an image decoding method, comprising: monitoring GPU load and CPU load at a preset monitoring frequency; judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not; if yes, judging whether the GPU load is higher than a load threshold value; if the GPU load is higher than the load threshold value, adopting the CPU load to decode the image; and if the average load of the GPU load and the CPU load is higher than the load threshold value, or the average load of the GPU load and the CPU load is lower than the load threshold value and the GPU load is lower than the load threshold value, selecting the device with the lower load from the GPU load and the CPU load to execute image decoding. According to the method and the device, the decoding main body is switched according to the actual load conditions of the GPU and the CPU, so that self-adaptive adjustment according to the GPU load and the CPU load is realized, and the image decoding efficiency is improved. The present application also provides an image decoding system, a computer-readable storage medium, and an electronic device, having the above-mentioned advantageous effects.

Description

Image decoding method, image decoding system and related device
Technical Field
The present application relates to the field of image processing, and in particular, to an image decoding method, an image decoding system, and a related apparatus.
Background
The video structuring refers to that unstructured two-dimensional image stream data is processed according to a certain mode to obtain structured information of image contents in a video. At present, the deployment of video structured application in the industry is generally divided into two deployment modes of edge measurement and center measurement, and no matter what mode, a heterogeneous platform based on a CPU/GPU almost becomes a standard platform of the video structured application. But the decoding process is done by a single hard or soft decoding, which loses flexibility in video structuring applications. The pressure of the CPU end and the GPU end is not constant all the time in the video structuring processing process, and the pressure of the two ends presents a variation trend along with the addition of components such as detection, tracking, coding, plug flow and the like. In addition, the current decoding process belongs to an unconstrained process, namely, the decoding speed is processed as fast as possible according to the hardware performance, and the too fast decoding increases the memory pressure to seize the system resources, so that the overall application efficiency is reduced.
Disclosure of Invention
An object of the present application is to provide an image decoding method, an image decoding system, a computer-readable storage medium, and an electronic device, which can improve decoding efficiency.
In order to solve the above technical problem, the present application provides an image decoding method, which has the following specific technical scheme:
monitoring GPU load and CPU load at a preset monitoring frequency;
judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not;
if yes, judging whether the GPU load is higher than the load threshold value;
if the GPU load is higher than the load threshold value, adopting a CPU load to decode the image;
and if the average load of the GPU load and the CPU load is higher than the load threshold value, or the average load of the GPU load and the CPU load is lower than the load threshold value and the GPU load is lower than the load threshold value, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.
Optionally, the method further includes:
when the queue length of the decoded data reaches a queue length threshold, pausing image decoding;
and restoring image decoding until the queue length is smaller than the queue length threshold value.
Optionally, after the image decoding is performed by using the CPU load or the GPU load, the method further includes:
initializing the number of model instances, and performing local search by taking BS as 1; BS is the number of samples adopted by single training;
increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition;
increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition;
determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; the detection model and the attribute model are both decoding models in the image decoding process;
and running the detection model by using the first optimal BS and the first optimal instantiated number, and running the attribute model by using the second optimal BS and the second optimal instantiated number.
Optionally, the first constraint condition is ND × dec _ mem + NAMi × att _ mem _ i < a first preset percentage of the GPU video memory;
wherein ND is the number of instantiations of the detection model, dec _ mem is the video memory occupation corresponding to the detection model, NAMi is the number of instantiations of the attribute model, att _ mem _ i is the video memory occupation corresponding to the attribute model;
the second constraint is Dec _ f Dec _ u _ BSD + max (0,1-Dec _ u _ BSD + ND) + att _ f att _ u _ BSAMi + max (0,1-att _ u _ BSD + NAMi) < decoding fps + second predetermined percentage;
where, Dec _ f is the detection frequency of the detection model, Dec _ u _ BSD is the utilization rate of the detection model, att _ f is the recognition rate of the attribute model, and att _ u _ BSAMi is the utilization rate of the attribute model.
Optionally, the detection model adopts a yolov5 structure, and the attribute model adopts an acceptance multi-branch classification structure.
The present application also provides an image decoding system, comprising:
the monitoring module is used for monitoring the GPU load and the CPU load at a preset monitoring frequency;
the first judgment module is used for judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not;
the second judgment module is used for judging whether the GPU load is higher than the load threshold value or not when the judgment result of the first judgment module is yes;
the decoding module is used for decoding the image by adopting the CPU load when the judgment result of the second judgment module is yes; and when the judgment result of the first judgment module is negative or the judgment result of the second judgment module is negative, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.
Optionally, the method further includes:
the queue length control module is used for pausing image decoding when the queue length of the decoded data reaches a queue length threshold value; and restoring image decoding until the queue length is smaller than the queue length threshold value.
Optionally, the method further includes:
a decoding parameter setting module for initializing the number of model instances and performing local search with BS as 1; BS is the number of samples adopted by single training; increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition; increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition; determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; running the detection model with the first optimal BS and the first optimal instantiation number, and running the attribute model with the second optimal BS and the second optimal instantiation number; the detection model and the attribute model are both decoding models in the image decoding process.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method as set forth above.
The present application further provides an electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method described above when calling the computer program in the memory.
The present application provides an image decoding method, comprising: monitoring GPU load and CPU load at a preset monitoring frequency; judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not; if yes, judging whether the GPU load is higher than the load threshold value; if the GPU load is higher than the load threshold value, adopting a CPU load to decode the image; and if the average load of the GPU load and the CPU load is higher than the load threshold value, or the average load of the GPU load and the CPU load is lower than the load threshold value and the GPU load is lower than the load threshold value, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.
The method and the device use the CPU and the GPU for combined heterogeneous decoding, the problem that the application performance of single equipment for decoding is reduced can be solved, the actual conditions of the GPU load and the CPU load are fully considered in the combined heterogeneous decoding process, and the decoding main body is switched according to the actual load conditions of the GPU load and the CPU load, so that self-adaptive adjustment according to the GPU load and the CPU load is realized, and the image decoding efficiency is improved.
The present application further provides an image decoding system, a computer-readable storage medium, and an electronic device, which have the above-mentioned advantages and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an image decoding method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an image decoding system according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of an image decoding method according to an embodiment of the present disclosure, the method including:
s101: monitoring GPU load and CPU load at a preset monitoring frequency;
s102: judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not; if not, entering S105; if yes, entering S103;
s103: determining whether the GPU load is above the load threshold; if yes, entering S104; if not, entering S105;
s104: decoding the image by adopting a CPU load;
s105: and selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.
The GPU load and the CPU load need to be periodically monitored in this embodiment, and the preset monitoring frequency is not limited herein. It will be readily appreciated that the monitoring frequencies for the GPU and CPU may be set the same, or that one of the monitoring cycles is a subset of the other. For example, when the GPU load is large, the monitoring frequency for the GPU load may be set to be high, and the monitoring frequency for the CPU load is set to be low, for example, the monitoring frequency for the GPU is 1 time in 1 minute, and the monitoring frequency for the CPU load is once in 2 minutes, so that it can be ensured that corresponding GPU load data exists during each CPU load monitoring, so as to execute subsequent steps.
In step S102, it is determined whether the average value of the two is lower than the load threshold, which is not limited in detail herein. It is easy to understand that, when S102 is executed, the GPU load and the CPU load should be the same time load, and comparison can be performed. And both can determine the respective loads by percentage and the calculation should be the same.
Then the following load mode is adopted according to the comparison result:
if the average load is larger than the load threshold value, selecting the execution with smaller current load from the GPU load and the CPU load;
if the average load is smaller than the load threshold value, but the GPU load is higher than the load threshold value, obviously, the CPU load is smaller than the load threshold value, and the difference value between the GPU load and the load threshold value is smaller than the difference value between the CPU load and the load threshold value, and then the CPU load is selected to be executed;
if the average load is less than the load threshold and the GPU load is less than the load threshold, selecting the execution with smaller current load from the GPU load and the CPU load.
The embodiment of the application uses the CPU and the GPU for combined heterogeneous decoding, the problem that the application performance of decoding performed by a single device is reduced can be solved, in the combined heterogeneous decoding process, the actual conditions of the GPU load and the CPU load are fully considered, and the decoding main body is switched according to the actual load conditions of the GPU load and the CPU load, so that self-adaptive adjustment according to the GPU load and the CPU load is realized, and the image decoding efficiency is improved.
On the basis of the embodiment, a decoding queue may also be set for the decoding process, and is used to store data to be decoded, and when the queue length of the decoded data reaches the queue length threshold, the image decoding is suspended until the queue length is smaller than the queue length threshold, and the image decoding is resumed. By setting the decoding queue, the decoding pressure when the load is serious is relieved.
The current-stage video structuring application based on the heterogeneous platform is generally divided into the following steps: (1) video access, which is to import video data through a camera or a video access platform; (2) decoding, namely decoding a video code stream into a picture in a specified format aiming at the picture in the RGB format which is mainly used in the image processing at present; (3) target detection, also called primary reasoning, uses a deep learning method to detect and position an object of interest in the decoded image frame; (4) target tracking, namely positioning and tracking the detected object; (5) the module generally comprises the identification of multiple attributes of multiple targets and a plurality of deep learning models; (6) data analysis, namely performing logic analysis processing such as track analysis, behavior analysis and the like on the target according to the results of detection, tracking and attribute identification; (7) image superposition, namely superposing the processing result to the original image for later-stage viewing; (8) coding, namely coding the image stream into a video stream; (9) and RTMP forwarding, namely forwarding the video stream obtained by coding to equipment such as a streaming media server and the like. Which includes an inference process that employs an attribute model and a detection model. On the basis of the above embodiment, as a preferred embodiment, parameter configuration and optimization can be performed for an attribute model and a detection model, and the specific process is as follows:
firstly, initializing the number of model instances, and performing local search by taking BS as 1; BS is the number of samples adopted by single training;
secondly, increasing the number of the BSs of the detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition;
thirdly, increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to a second constraint condition;
fourthly, determining second optimal BS and second optimal instantiation numbers corresponding to the attribute model according to the first constraint condition and the second constraint condition;
and fifthly, operating the detection model according to the first optimal BS and the first optimal instantiation number, and operating the attribute model according to the second optimal BS and the second optimal instantiation number.
In the above, BS refers to blocksize, i.e. the number of samples taken for a single training. Neither the first constraint condition nor the second preset condition is specifically defined herein. Preferably, the first constraint may be ND × dec _ mem + NAMi × att _ mem _ i < a first preset percentage of GPU video memory;
the ND is the number of instantiations of the detection model, dec _ mem is the video memory occupation corresponding to the detection model, NAMi is the number of instantiations of the attribute model, att _ mem _ i is the video memory occupation corresponding to the attribute model;
the second constraint is Dec _ f Dec _ u _ BSD + max (0,1-Dec _ u _ BSD + ND) + att _ f att _ u _ BSAMi + max (0,1-att _ u _ BSD + NAMi) < decoding fps + second predetermined percentage;
where, Dec _ f is the detection frequency of the detection model, Dec _ u _ BSD is the utilization rate of the detection model, att _ f is the recognition rate of the attribute model, and att _ u _ BSAMi is the utilization rate of the attribute model.
The detection model can adopt yolov5 structure, and the attribute model can adopt an acceptance multi-branch classification structure.
The embodiment of the application aims to determine the relevant parameters in the model process by combining the local search algorithm with the hard setting of the relevant parameters, so as to realize the adjustment of the parameters, thereby ensuring that the operation efficiency of the attribute fuzzy detection model is the highest.
The following is a description of a specific application process for configuring and optimizing parameters of the attribute model and the detection model:
for the decoding module, the length of the decoding queue of the single-channel video is set to be 25 × 1.2-30, a certain threshold space is reserved when the length of the decoding fps is satisfied, and the decoding speed of the single-channel video is limited to be 25 fps. The threshold value of the utilization rate of the decoding core is set to 70%, the monitoring frequency is 1s, namely the GPU or the GPU utilization rate exceeds 70%, the decoding device is switched according to the frequency of once per second monitoring, and the switching rule is determined according to the above embodiment. In this example, the optimal state of the overall performance cannot be obtained by using the GPU alone or by using the CPU alone for decoding, the decoding process robs the computational resources of the CPU processing (tracking) or the GPU processing (detection + attribute) during the task running, and the overall efficiency can be improved by nearly 5% by using the scheme.
For the inference process of the detection model and the attribute model, the decoded fps is 25; when the video memory occupation of a single detection model is 900M, and the batch size is 1, 2, 4 and 8, deducing one batch, wherein the GPU utilization efficiency is respectively 62.6%, 61%, 62% and 77%
The type of the attribute model is 1, the video memory occupation of a single attribute model is 700M, and when the batch size is 1, 2, 4 and 8, a batch is inferred, and the GPU utilization rate is 64%, 96%, 189% and 1130% respectively.
The frame extraction frequency s is 3, and can be obtained according to the formula:
(1)ND*800+NAM*700<15109*0.8
(2)25/(3*BSD)*dec_u_BS+max(0,1-ND*dec_u_BS)+(3*1/5)/(NAM*BSAM)*att_u_BSAM+*max(0,1-NAM*att_u_BSAM)<25*100%;
and (3) maximizing the left side of the formula (2) to obtain the optimal ND, NAM, BSD and BSAM, wherein the calculation mode is as follows, firstly assigning all the variables to be 1, then keeping other variables unchanged, firstly increasing the BS of the detection model, and solving the best BS of the detection model according to the utilization rate of the corresponding BS.
The detection model bs ═ 1:
1*800+1*700<12087
25/(3*1)*0.626+(1-1*0.626)+(3*1/5)/(1)*0.64+(1-1*0.64)<25
obtaining:
1500<12087
6.33<25
the detection model BS is 2:
1*800+1*700<12087
25/(3*2)*0.61+(1-1*0.61)+(3*1/5)/(1)*0.64+(1-1*0.64)<25
obtaining:
1500<12087
3.68<25
the detection model bs 4/8:
the utilization factor values are respectively
25/(3*4)*0.62+(1-1*0.62)+(3*1/5)/(1)*0.64+(1-1*0.64)<25
25/(3*8)*0.77+(1-1*0.77)+(3*1/5)/(1)*0.64+(1-1*0.64)<25
In the same way, the best result can be obtained when the BS is equal to 1.
And then keeping the detection model BS as 1, increasing the number of the model instances, and still obtaining the optimal number of the detection model instances as 1 according to the steps.
Using 25/(3 × 1) × 0.626+ max (0, 1-ND × 0.626) + (3 × 1/5)/(1) × 0.64+ (1-0.64) <25, the best number of examples of the model can be obtained as 1. continuing to find BS, 25/(3 × 1) × 0.626+ (1-1 × 0.626) + (3 × 1/5)/(BSAM) ((64%, 96%, 189%, 1130%) +1 × 1 (64-1 × 96%, 189%, 1130%)) <25 of the attribute model, the highest efficiency of BS 8 is obtained. Since the fixed BS is 8 and the number of instances of the obtained best attribute model is 1, ND, BSD, NAM is 1 and BSAM is 8 are obtained by calculation as described above, the actual utilization rate of the GPU is improved by 10% compared with the initial parameter setting in the final parameter setting.
In the following, an image decoding system provided by an embodiment of the present application is introduced, and the image decoding system described below and the image decoding method described above may be referred to correspondingly.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an image decoding system according to an embodiment of the present application, and the present application further provides an image decoding system, including:
the monitoring module is used for monitoring the GPU load and the CPU load at a preset monitoring frequency;
the first judgment module is used for judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not;
the second judgment module is used for judging whether the GPU load is higher than the load threshold value or not when the judgment result of the first judgment module is yes;
the decoding module is used for decoding the image by adopting the CPU load when the judgment result of the second judgment module is yes; and when the judgment result of the first judgment module is negative or the judgment result of the second judgment module is negative, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.
Based on the above embodiment, as a preferred embodiment, the method further includes:
the queue length control module is used for pausing image decoding when the queue length of the decoded data reaches a queue length threshold value; and restoring image decoding until the queue length is smaller than the queue length threshold value.
Optionally, the method further includes:
a decoding parameter setting module for initializing the number of model instances and performing local search with BS as 1; BS is the number of samples adopted by single training; increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition; increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition; determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; and running the detection model by using the first optimal BS and the first optimal instantiated number, and running the attribute model by using the second optimal BS and the second optimal instantiated number.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. An image decoding method, comprising:
monitoring GPU load and CPU load at a preset monitoring frequency;
judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not;
if yes, judging whether the GPU load is higher than the load threshold value;
if the GPU load is higher than the load threshold value, adopting a CPU load to decode the image;
and if the average load of the GPU load and the CPU load is higher than the load threshold value, or the average load of the GPU load and the CPU load is lower than the load threshold value and the GPU load is lower than the load threshold value, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.
2. The image decoding method according to claim 1, further comprising:
when the queue length of the decoded data reaches a queue length threshold, pausing image decoding;
and restoring image decoding until the queue length is smaller than the queue length threshold value.
3. The image decoding method according to claim 1, wherein after the image decoding is performed by using a CPU load or a GPU load, the method further comprises:
initializing the number of model instances, and performing local search by taking BS as 1; BS is the number of samples adopted by single training;
increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition;
increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition;
determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; the detection model and the attribute model are both decoding models in the image decoding process;
and running the detection model by using the first optimal BS and the first optimal instantiated number, and running the attribute model by using the second optimal BS and the second optimal instantiated number.
4. The image decoding method of claim 3, wherein the first constraint is ND DEC _ mem + NAMi att _ mem _ i < a first predetermined percentage of GPU video memory;
wherein ND is the number of instantiations of the detection model, dec _ mem is the video memory occupation corresponding to the detection model, NAMi is the number of instantiations of the attribute model, att _ mem _ i is the video memory occupation corresponding to the attribute model;
the second constraint is Dec _ f Dec _ u _ BSD + max (0,1-Dec _ u _ BSD + ND) + att _ f att _ u _ BSAMi + max (0,1-att _ u _ BSD + NAMi) < decoding fps + second predetermined percentage;
where, Dec _ f is the detection frequency of the detection model, Dec _ u _ BSD is the utilization rate of the detection model, att _ f is the recognition rate of the attribute model, and att _ u _ BSAMi is the utilization rate of the attribute model.
5. The image decoding method according to claim 1, wherein the detection model adopts a yolov5 structure, and the attribute model adopts an acceptance multi-branch classification structure.
6. An image decoding system, comprising:
the monitoring module is used for monitoring the GPU load and the CPU load at a preset monitoring frequency;
the first judgment module is used for judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not;
the second judgment module is used for judging whether the GPU load is higher than the load threshold value or not when the judgment result of the first judgment module is yes;
the decoding module is used for decoding the image by adopting the CPU load when the judgment result of the second judgment module is yes; and when the judgment result of the first judgment module is negative or the judgment result of the second judgment module is negative, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.
7. The image decoding system according to claim 6, further comprising:
the queue length control module is used for pausing image decoding when the queue length of the decoded data reaches a queue length threshold value; and restoring image decoding until the queue length is smaller than the queue length threshold value.
8. The image decoding system according to claim 6, further comprising:
a decoding parameter setting module for initializing the number of model instances and performing local search with BS as 1; BS is the number of samples adopted by single training; increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition; increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition; determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; the detection model and the attribute model are both decoding models in the image decoding process; and running the detection model by using the first optimal BS and the first optimal instantiated number, and running the attribute model by using the second optimal BS and the second optimal instantiated number.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image decoding method according to any one of claims 1 to 5.
10. An electronic device, comprising a memory in which a computer program is stored and a processor which, when called into the memory, implements the steps of the image decoding method according to any one of claims 1 to 5.
CN202111143450.2A 2021-09-28 2021-09-28 Image decoding method, image decoding system and related device Pending CN114025163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111143450.2A CN114025163A (en) 2021-09-28 2021-09-28 Image decoding method, image decoding system and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111143450.2A CN114025163A (en) 2021-09-28 2021-09-28 Image decoding method, image decoding system and related device

Publications (1)

Publication Number Publication Date
CN114025163A true CN114025163A (en) 2022-02-08

Family

ID=80055009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111143450.2A Pending CN114025163A (en) 2021-09-28 2021-09-28 Image decoding method, image decoding system and related device

Country Status (1)

Country Link
CN (1) CN114025163A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116402674A (en) * 2023-04-03 2023-07-07 摩尔线程智能科技(北京)有限责任公司 GPU command processing method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116402674A (en) * 2023-04-03 2023-07-07 摩尔线程智能科技(北京)有限责任公司 GPU command processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111241985A (en) Video content identification method and device, storage medium and electronic equipment
CN114025163A (en) Image decoding method, image decoding system and related device
CN116777730B (en) GPU efficiency improvement method based on resource scheduling
CN112163468A (en) Image processing method and device based on multiple threads
WO2017072854A1 (en) Monitoring device, monitoring system and monitoring method
Polakovič et al. Adaptive multimedia content delivery in 5G networks using DASH and saliency information
CN115062709A (en) Model optimization method, device, equipment, storage medium and program product
Tsai et al. Intelligent moving objects detection via adaptive frame differencing method
US20230085979A1 (en) Method and apparatus for dynamically managing shared memory pool
CN115617532B (en) Target tracking processing method, system and related device
CN116485188A (en) Risk identification method, device and equipment
CN115130569A (en) Audio processing method and device, computer equipment, storage medium and program product
CN114003370A (en) Computing power scheduling method and related device
CN112910732A (en) Method and equipment for resetting edge computing server
CN109685101B (en) Multi-dimensional data self-adaptive acquisition method and system
KR101932130B1 (en) Apparatus and method for improving quality of experience of remote display
CN112084371A (en) Film multi-label classification method and device, electronic equipment and storage medium
CN112492379A (en) Audio and video multi-path concurrent decoding method and device and computer equipment
CN113194298B (en) Method, apparatus, system and medium for realizing image structuring of non-intelligent camera
Mazinani et al. An Adaptive Porn Video Detection Based on Consecutive Frames Using Deep Learning.
CN115617421B (en) Intelligent process scheduling method and device, readable storage medium and embedded equipment
CN112163985B (en) Image processing method, image processing device, storage medium and electronic equipment
CN117336548A (en) Video coding processing method, device, equipment and storage medium
CN111047042B (en) Operation method and device of reasoning service model
CN110458009B (en) Processing method for picture information, face detection and picture searching by picture and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination