CN114025163A

CN114025163A - Image decoding method, image decoding system and related device

Info

Publication number: CN114025163A
Application number: CN202111143450.2A
Authority: CN
Inventors: 王鹏飞
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2022-02-08

Abstract

The present application provides an image decoding method, comprising: monitoring GPU load and CPU load at a preset monitoring frequency; judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not; if yes, judging whether the GPU load is higher than a load threshold value; if the GPU load is higher than the load threshold value, adopting the CPU load to decode the image; and if the average load of the GPU load and the CPU load is higher than the load threshold value, or the average load of the GPU load and the CPU load is lower than the load threshold value and the GPU load is lower than the load threshold value, selecting the device with the lower load from the GPU load and the CPU load to execute image decoding. According to the method and the device, the decoding main body is switched according to the actual load conditions of the GPU and the CPU, so that self-adaptive adjustment according to the GPU load and the CPU load is realized, and the image decoding efficiency is improved. The present application also provides an image decoding system, a computer-readable storage medium, and an electronic device, having the above-mentioned advantageous effects.

Description

Image decoding method, image decoding system and related device

Technical Field

The present application relates to the field of image processing, and in particular, to an image decoding method, an image decoding system, and a related apparatus.

Background

The video structuring refers to that unstructured two-dimensional image stream data is processed according to a certain mode to obtain structured information of image contents in a video. At present, the deployment of video structured application in the industry is generally divided into two deployment modes of edge measurement and center measurement, and no matter what mode, a heterogeneous platform based on a CPU/GPU almost becomes a standard platform of the video structured application. But the decoding process is done by a single hard or soft decoding, which loses flexibility in video structuring applications. The pressure of the CPU end and the GPU end is not constant all the time in the video structuring processing process, and the pressure of the two ends presents a variation trend along with the addition of components such as detection, tracking, coding, plug flow and the like. In addition, the current decoding process belongs to an unconstrained process, namely, the decoding speed is processed as fast as possible according to the hardware performance, and the too fast decoding increases the memory pressure to seize the system resources, so that the overall application efficiency is reduced.

Disclosure of Invention

An object of the present application is to provide an image decoding method, an image decoding system, a computer-readable storage medium, and an electronic device, which can improve decoding efficiency.

In order to solve the above technical problem, the present application provides an image decoding method, which has the following specific technical scheme:

monitoring GPU load and CPU load at a preset monitoring frequency;

judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not;

if yes, judging whether the GPU load is higher than the load threshold value;

if the GPU load is higher than the load threshold value, adopting a CPU load to decode the image;

and if the average load of the GPU load and the CPU load is higher than the load threshold value, or the average load of the GPU load and the CPU load is lower than the load threshold value and the GPU load is lower than the load threshold value, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.

Optionally, the method further includes:

when the queue length of the decoded data reaches a queue length threshold, pausing image decoding;

and restoring image decoding until the queue length is smaller than the queue length threshold value.

Optionally, after the image decoding is performed by using the CPU load or the GPU load, the method further includes:

initializing the number of model instances, and performing local search by taking BS as 1; BS is the number of samples adopted by single training;

increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition;

increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition;

determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; the detection model and the attribute model are both decoding models in the image decoding process;

and running the detection model by using the first optimal BS and the first optimal instantiated number, and running the attribute model by using the second optimal BS and the second optimal instantiated number.

Optionally, the first constraint condition is ND × dec _ mem + NAMi × att _ mem _ i < a first preset percentage of the GPU video memory;

wherein ND is the number of instantiations of the detection model, dec _ mem is the video memory occupation corresponding to the detection model, NAMi is the number of instantiations of the attribute model, att _ mem _ i is the video memory occupation corresponding to the attribute model;

the second constraint is Dec _ f Dec _ u _ BSD + max (0,1-Dec _ u _ BSD + ND) + att _ f att _ u _ BSAMi + max (0,1-att _ u _ BSD + NAMi) < decoding fps + second predetermined percentage;

where, Dec _ f is the detection frequency of the detection model, Dec _ u _ BSD is the utilization rate of the detection model, att _ f is the recognition rate of the attribute model, and att _ u _ BSAMi is the utilization rate of the attribute model.

Optionally, the detection model adopts a yolov5 structure, and the attribute model adopts an acceptance multi-branch classification structure.

The present application also provides an image decoding system, comprising:

the monitoring module is used for monitoring the GPU load and the CPU load at a preset monitoring frequency;

the first judgment module is used for judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not;

the second judgment module is used for judging whether the GPU load is higher than the load threshold value or not when the judgment result of the first judgment module is yes;

the decoding module is used for decoding the image by adopting the CPU load when the judgment result of the second judgment module is yes; and when the judgment result of the first judgment module is negative or the judgment result of the second judgment module is negative, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.

Optionally, the method further includes:

the queue length control module is used for pausing image decoding when the queue length of the decoded data reaches a queue length threshold value; and restoring image decoding until the queue length is smaller than the queue length threshold value.

Optionally, the method further includes:

a decoding parameter setting module for initializing the number of model instances and performing local search with BS as 1; BS is the number of samples adopted by single training; increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition; increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition; determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; running the detection model with the first optimal BS and the first optimal instantiation number, and running the attribute model with the second optimal BS and the second optimal instantiation number; the detection model and the attribute model are both decoding models in the image decoding process.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method as set forth above.

The present application further provides an electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method described above when calling the computer program in the memory.

The present application provides an image decoding method, comprising: monitoring GPU load and CPU load at a preset monitoring frequency; judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not; if yes, judging whether the GPU load is higher than the load threshold value; if the GPU load is higher than the load threshold value, adopting a CPU load to decode the image; and if the average load of the GPU load and the CPU load is higher than the load threshold value, or the average load of the GPU load and the CPU load is lower than the load threshold value and the GPU load is lower than the load threshold value, selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.

The method and the device use the CPU and the GPU for combined heterogeneous decoding, the problem that the application performance of single equipment for decoding is reduced can be solved, the actual conditions of the GPU load and the CPU load are fully considered in the combined heterogeneous decoding process, and the decoding main body is switched according to the actual load conditions of the GPU load and the CPU load, so that self-adaptive adjustment according to the GPU load and the CPU load is realized, and the image decoding efficiency is improved.

The present application further provides an image decoding system, a computer-readable storage medium, and an electronic device, which have the above-mentioned advantages and are not described herein again.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of an image decoding method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an image decoding system according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of an image decoding method according to an embodiment of the present disclosure, the method including:

s101: monitoring GPU load and CPU load at a preset monitoring frequency;

s102: judging whether the average load of the GPU load and the CPU load is lower than a load threshold value or not; if not, entering S105; if yes, entering S103;

s103: determining whether the GPU load is above the load threshold; if yes, entering S104; if not, entering S105;

s104: decoding the image by adopting a CPU load;

s105: and selecting the equipment with lower load from the GPU load and the CPU load to execute image decoding.

The GPU load and the CPU load need to be periodically monitored in this embodiment, and the preset monitoring frequency is not limited herein. It will be readily appreciated that the monitoring frequencies for the GPU and CPU may be set the same, or that one of the monitoring cycles is a subset of the other. For example, when the GPU load is large, the monitoring frequency for the GPU load may be set to be high, and the monitoring frequency for the CPU load is set to be low, for example, the monitoring frequency for the GPU is 1 time in 1 minute, and the monitoring frequency for the CPU load is once in 2 minutes, so that it can be ensured that corresponding GPU load data exists during each CPU load monitoring, so as to execute subsequent steps.

In step S102, it is determined whether the average value of the two is lower than the load threshold, which is not limited in detail herein. It is easy to understand that, when S102 is executed, the GPU load and the CPU load should be the same time load, and comparison can be performed. And both can determine the respective loads by percentage and the calculation should be the same.

Then the following load mode is adopted according to the comparison result:

if the average load is larger than the load threshold value, selecting the execution with smaller current load from the GPU load and the CPU load;

if the average load is smaller than the load threshold value, but the GPU load is higher than the load threshold value, obviously, the CPU load is smaller than the load threshold value, and the difference value between the GPU load and the load threshold value is smaller than the difference value between the CPU load and the load threshold value, and then the CPU load is selected to be executed;

if the average load is less than the load threshold and the GPU load is less than the load threshold, selecting the execution with smaller current load from the GPU load and the CPU load.

The embodiment of the application uses the CPU and the GPU for combined heterogeneous decoding, the problem that the application performance of decoding performed by a single device is reduced can be solved, in the combined heterogeneous decoding process, the actual conditions of the GPU load and the CPU load are fully considered, and the decoding main body is switched according to the actual load conditions of the GPU load and the CPU load, so that self-adaptive adjustment according to the GPU load and the CPU load is realized, and the image decoding efficiency is improved.

On the basis of the embodiment, a decoding queue may also be set for the decoding process, and is used to store data to be decoded, and when the queue length of the decoded data reaches the queue length threshold, the image decoding is suspended until the queue length is smaller than the queue length threshold, and the image decoding is resumed. By setting the decoding queue, the decoding pressure when the load is serious is relieved.

The current-stage video structuring application based on the heterogeneous platform is generally divided into the following steps: (1) video access, which is to import video data through a camera or a video access platform; (2) decoding, namely decoding a video code stream into a picture in a specified format aiming at the picture in the RGB format which is mainly used in the image processing at present; (3) target detection, also called primary reasoning, uses a deep learning method to detect and position an object of interest in the decoded image frame; (4) target tracking, namely positioning and tracking the detected object; (5) the module generally comprises the identification of multiple attributes of multiple targets and a plurality of deep learning models; (6) data analysis, namely performing logic analysis processing such as track analysis, behavior analysis and the like on the target according to the results of detection, tracking and attribute identification; (7) image superposition, namely superposing the processing result to the original image for later-stage viewing; (8) coding, namely coding the image stream into a video stream; (9) and RTMP forwarding, namely forwarding the video stream obtained by coding to equipment such as a streaming media server and the like. Which includes an inference process that employs an attribute model and a detection model. On the basis of the above embodiment, as a preferred embodiment, parameter configuration and optimization can be performed for an attribute model and a detection model, and the specific process is as follows:

firstly, initializing the number of model instances, and performing local search by taking BS as 1; BS is the number of samples adopted by single training;

secondly, increasing the number of the BSs of the detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition;

thirdly, increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to a second constraint condition;

fourthly, determining second optimal BS and second optimal instantiation numbers corresponding to the attribute model according to the first constraint condition and the second constraint condition;

and fifthly, operating the detection model according to the first optimal BS and the first optimal instantiation number, and operating the attribute model according to the second optimal BS and the second optimal instantiation number.

In the above, BS refers to blocksize, i.e. the number of samples taken for a single training. Neither the first constraint condition nor the second preset condition is specifically defined herein. Preferably, the first constraint may be ND × dec _ mem + NAMi × att _ mem _ i < a first preset percentage of GPU video memory;

the ND is the number of instantiations of the detection model, dec _ mem is the video memory occupation corresponding to the detection model, NAMi is the number of instantiations of the attribute model, att _ mem _ i is the video memory occupation corresponding to the attribute model;

The detection model can adopt yolov5 structure, and the attribute model can adopt an acceptance multi-branch classification structure.

The embodiment of the application aims to determine the relevant parameters in the model process by combining the local search algorithm with the hard setting of the relevant parameters, so as to realize the adjustment of the parameters, thereby ensuring that the operation efficiency of the attribute fuzzy detection model is the highest.

The following is a description of a specific application process for configuring and optimizing parameters of the attribute model and the detection model:

for the decoding module, the length of the decoding queue of the single-channel video is set to be 25 × 1.2-30, a certain threshold space is reserved when the length of the decoding fps is satisfied, and the decoding speed of the single-channel video is limited to be 25 fps. The threshold value of the utilization rate of the decoding core is set to 70%, the monitoring frequency is 1s, namely the GPU or the GPU utilization rate exceeds 70%, the decoding device is switched according to the frequency of once per second monitoring, and the switching rule is determined according to the above embodiment. In this example, the optimal state of the overall performance cannot be obtained by using the GPU alone or by using the CPU alone for decoding, the decoding process robs the computational resources of the CPU processing (tracking) or the GPU processing (detection + attribute) during the task running, and the overall efficiency can be improved by nearly 5% by using the scheme.

For the inference process of the detection model and the attribute model, the decoded fps is 25; when the video memory occupation of a single detection model is 900M, and the batch size is 1, 2, 4 and 8, deducing one batch, wherein the GPU utilization efficiency is respectively 62.6%, 61%, 62% and 77%

The type of the attribute model is 1, the video memory occupation of a single attribute model is 700M, and when the batch size is 1, 2, 4 and 8, a batch is inferred, and the GPU utilization rate is 64%, 96%, 189% and 1130% respectively.

The frame extraction frequency s is 3, and can be obtained according to the formula:

(1)ND*800+NAM*700<15109*0.8

(2)25/(3*BSD)*dec_u_BS+max(0,1-ND*dec_u_BS)+(3*1/5)/(NAM*BSAM)*att_u_BSAM+*max(0,1-NAM*att_u_BSAM)<25*100％；

and (3) maximizing the left side of the formula (2) to obtain the optimal ND, NAM, BSD and BSAM, wherein the calculation mode is as follows, firstly assigning all the variables to be 1, then keeping other variables unchanged, firstly increasing the BS of the detection model, and solving the best BS of the detection model according to the utilization rate of the corresponding BS.

The detection model bs ═ 1:

1*800+1*700<12087

25/(3*1)*0.626+(1-1*0.626)+(3*1/5)/(1)*0.64+(1-1*0.64)<25

obtaining:

1500<12087

6.33<25

the detection model BS is 2:

1*800+1*700<12087

25/(3*2)*0.61+(1-1*0.61)+(3*1/5)/(1)*0.64+(1-1*0.64)<25

obtaining:

1500<12087

3.68<25

the detection model bs 4/8:

the utilization factor values are respectively

25/(3*4)*0.62+(1-1*0.62)+(3*1/5)/(1)*0.64+(1-1*0.64)<25

25/(3*8)*0.77+(1-1*0.77)+(3*1/5)/(1)*0.64+(1-1*0.64)<25

In the same way, the best result can be obtained when the BS is equal to 1.

And then keeping the detection model BS as 1, increasing the number of the model instances, and still obtaining the optimal number of the detection model instances as 1 according to the steps.

Using 25/(3 × 1) × 0.626+ max (0, 1-ND × 0.626) + (3 × 1/5)/(1) × 0.64+ (1-0.64) <25, the best number of examples of the model can be obtained as 1. continuing to find BS, 25/(3 × 1) × 0.626+ (1-1 × 0.626) + (3 × 1/5)/(BSAM) ((64%, 96%, 189%, 1130%) +1 × 1 (64-1 × 96%, 189%, 1130%)) <25 of the attribute model, the highest efficiency of BS 8 is obtained. Since the fixed BS is 8 and the number of instances of the obtained best attribute model is 1, ND, BSD, NAM is 1 and BSAM is 8 are obtained by calculation as described above, the actual utilization rate of the GPU is improved by 10% compared with the initial parameter setting in the final parameter setting.

In the following, an image decoding system provided by an embodiment of the present application is introduced, and the image decoding system described below and the image decoding method described above may be referred to correspondingly.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an image decoding system according to an embodiment of the present application, and the present application further provides an image decoding system, including:

Based on the above embodiment, as a preferred embodiment, the method further includes:

Optionally, the method further includes:

a decoding parameter setting module for initializing the number of model instances and performing local search with BS as 1; BS is the number of samples adopted by single training; increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition; increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition; determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; and running the detection model by using the first optimal BS and the first optimal instantiated number, and running the attribute model by using the second optimal BS and the second optimal instantiated number.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. An image decoding method, comprising:

monitoring GPU load and CPU load at a preset monitoring frequency;

if yes, judging whether the GPU load is higher than the load threshold value;

2. The image decoding method according to claim 1, further comprising:

3. The image decoding method according to claim 1, wherein after the image decoding is performed by using a CPU load or a GPU load, the method further comprises:

4. The image decoding method of claim 3, wherein the first constraint is ND DEC _ mem + NAMi att _ mem _ i < a first predetermined percentage of GPU video memory;

5. The image decoding method according to claim 1, wherein the detection model adopts a yolov5 structure, and the attribute model adopts an acceptance multi-branch classification structure.

6. An image decoding system, comprising:

7. The image decoding system according to claim 6, further comprising:

8. The image decoding system according to claim 6, further comprising:

a decoding parameter setting module for initializing the number of model instances and performing local search with BS as 1; BS is the number of samples adopted by single training; increasing the number of BSs of a detection model by using a hill climbing method, and determining a first optimal BS of the detection model under the condition of meeting a first constraint condition; increasing the number of instantiations of the detection model one by one, and obtaining a first optimal number of instantiations according to the second constraint condition; determining a second optimal BS and a second optimal instantiation number corresponding to the attribute model according to the first constraint condition and the second constraint condition; the detection model and the attribute model are both decoding models in the image decoding process; and running the detection model by using the first optimal BS and the first optimal instantiated number, and running the attribute model by using the second optimal BS and the second optimal instantiated number.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image decoding method according to any one of claims 1 to 5.

10. An electronic device, comprising a memory in which a computer program is stored and a processor which, when called into the memory, implements the steps of the image decoding method according to any one of claims 1 to 5.