CN113453010B

CN113453010B - Processing method based on high-performance concurrent video real-time processing framework

Info

Publication number: CN113453010B
Application number: CN202111009878.8A
Authority: CN
Inventors: 刘必振; 丁皓
Original assignee: Zhijian Technology Jiangsu Co ltd
Current assignee: Jiangsu Liangjie Data Technology Co ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-12-10
Anticipated expiration: 2041-08-31
Also published as: CN113453010A

Abstract

The invention relates to a processing method based on a high-performance concurrent video real-time processing framework, and belongs to the technical field of video analysis. The method comprises the following steps: a: hardware accelerated decoding is carried out on the high-performance concurrent video stream; b: CPU preprocessing is carried out on the decoded video frame; c: accumulating the pretreatment results, and performing GPU model batch reasoning after a certain number of pretreatment results are obtained; d: performing CPU post-processing on the result after model inference; e: and (4) analyzing results: and finally analyzing and processing a series of model reasoning results according to the business logic and rules. If a certain path of video analysis task is completed, notifying a video management submodule in a video decoding module to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, continuously performing hard decoding. The video decoding, preprocessing, model reasoning and post-processing modules of the invention are all separated, which is convenient for dynamic addition/deletion/modification, and the module functions can be flexibly modified according to the service requirements, and the invention is simple to use.

Description

Processing method based on high-performance concurrent video real-time processing framework

Technical Field

The invention relates to a processing method based on a high-performance concurrent video real-time processing framework, and belongs to the technical field of video analysis.

Background

The video analysis technology is used for extracting specific events or specific behaviors of monitoring targets occurring in video scenes by processing, analyzing and understanding the content of video signals, and is widely applied to numerous scenes such as intelligent security, entertainment live broadcast and the like. From an application point of view, two problems need to be solved with emphasis: firstly, the real-time performance of the processing analysis and secondly, the number of video paths processed simultaneously. NVIDIA corporation provides a data stream analysis tool, DeepStream, to perceive a scene through multi-sensor data processing and intelligent video analysis. Developers do not need to design an end-to-end solution from the beginning, only need to concentrate on building a core deep learning network, and the framework provides a hardware acceleration module. However, the deep stream tool also has some defects, and if the standard plug-in function cannot meet the requirement, self-writing is extremely complicated, and dynamic deletion/addition/replacement of the plug-ins in Pipeline is troublesome.

Disclosure of Invention

The invention aims to overcome the problems in the prior art and provide a processing method based on a high-performance concurrent video real-time processing framework, wherein video decoding, preprocessing, model reasoning and post-processing modules are all separated, so that dynamic addition/deletion/modification is facilitated, the functions of the modules can be flexibly modified according to business requirements, and the method is simple to use.

In order to solve the above problems, the processing method based on the high-performance concurrent video real-time processing framework of the present invention comprises the following steps:

a: hardware accelerated decoding is carried out on the high-performance concurrent video stream;

b: CPU preprocessing is carried out on the decoded video frame;

c: accumulating the pretreatment results, and performing GPU model batch reasoning after a certain number of pretreatment results are obtained;

d: performing CPU post-processing on the result after model batch reasoning;

e: and (4) analyzing results: and finally analyzing and processing a series of model reasoning results according to the business logic and rules.

Further, the step a specifically includes the following steps:

a1: performing video management on the video, wherein the video management comprises the access, change, termination and scheduling distribution of video source data, and distributing the video source to a corresponding GPU decoding process/thread according to a set rule;

a2: performing GPU decoding on the Video, wherein tools for the GPU decoding comprise FFmpeg, Video Processing Framework;

a3: and dispatching and distributing the video frames to a CPU pretreatment module: and the scheduling and distributing submodule carries out quality assurance and frame disorder processing on the data generated by the decoding process/thread and distributes the data to a subsequent GPU reasoning module according to a modulo remainder rule.

Further, the CPU preprocessing includes the following steps:

b1: b, dispatching and distributing the video decoded in the step A to a plurality of input queue sub-modules preprocessed by the CPU;

b2: the data are transmitted to a corresponding CPU preprocessing process/thread through a preprocessing input queue submodule, and enter a CPU scheduling and distributing submodule after preprocessing including image scaling, image graying and pixel value standardization is carried out;

b3: and the scheduling and distributing submodule of the CPU distributes the result generated by the preprocessing process to the subsequent GPU reasoning submodule according to the modulo remainder rule.

Further, the GPU model batch inference specifically includes the following steps:

c1: establishing a temporary list, and setting batch inference data size batch _ size;

c2: reading data from an input queue of the CPU in a non-waiting mode, if the data is successful, saving the data to a temporary list, and performing the step C3, otherwise, jumping to the step C4;

c3: judging whether the length of the list is equal to the batch _ size, if so, performing the step C4, otherwise, repeating the step C2;

c4: performing model batch reasoning on the temporary list data;

c5: the scheduling distribution module of the GPU stores the batch reasoning result of the GPU model and pushes the batch reasoning result to the CPU post-processing module;

c6: the temporary list data is emptied and step C2 is repeated.

Further, the CPU post-processing specifically includes the steps of:

d1: dispatching and distributing the data subjected to GPU batch reasoning to a plurality of input queue sub-modules processed by CPUs;

d2: the model reasoning result is transmitted to a CPU post-processing process submodule for post-processing through an input queue submodule of the CPU post-processing;

d3: and distributing the result generated by the CPU post-processing process to a model reasoning module of the next stage according to a modulo remainder rule or sending the result to a final result analysis module.

Further, the specific step of result analysis is as follows, if a certain path of video analysis task is completed, the video management submodule in the video decoding module is notified to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, hard decoding is continued.

Further, the video formats include MPEG, AVI, MOV, WMV, 3GP, RM/RMVB, FLV/F4V, and H.264/H.265.

The invention has the beneficial effects that: 1) each processing module is decoupled, and the operation is convenient. The video decoding, preprocessing, model processing and post-processing modules involved in the processing frame are relatively independent, and data flow is carried out in a shared queue mode, so that the adding, deleting and modifying operations of the modules are easy to realize.

2) And a multi-process/multi-thread mode is flexibly selected and matched, so that the processing efficiency is maximally improved. Through experimental comparison, the decoding module adopts a multi-thread mode, the pre-processing module, the model reasoning module and the post-processing module adopt a lock-free multi-process mode, the processing speed can be obviously improved, and the processes/threads started by each module can be flexibly matched according to different business requirements.

3) The framework has strong expandability. The implementation of the framework is not bound to a specific language, and Python language which is friendly to users and low in starting threshold can be selected, C/C + + language with high operation efficiency can be selected, or the combination of the Python language and the C/C + + language can be selected.

Drawings

FIG. 1 is a flow chart of a processing method based on a high-performance concurrent video real-time processing framework according to the present invention;

FIG. 2 is a block diagram of a video decoding process according to the present invention;

FIG. 3 is a block diagram of a CPU preprocessing flow of the present invention;

FIG. 4 is a block diagram of a process for GPU batch inference in the present invention;

FIG. 5 is a flowchart of the GPU model batch inference operation steps of the present invention;

FIG. 6 is a block diagram of the flow of the CPU post-processing of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

As shown in fig. 1, the processing method based on the high-performance concurrent video real-time processing framework of the present invention includes the following steps:

a: hardware accelerated decoding is carried out on the high-performance concurrent video stream; video formats include MPEG, AVI, MOV, WMV, 3GP, RM/RMVB, FLV/F4V, and H.264/H.265.

As shown in fig. 2, step a specifically includes the following steps:

a2: performing GPU decoding on the Video, wherein tools for the GPU decoding comprise FFmpeg, Video Processing Framework (VPF);

VPF is an open source video processing framework suitable for Python by NVIDIA corporation, and can handle tasks such as video encoding, decoding, color space and pixel format conversion. Whether FFmpeg or VPF, hardware accelerated decoding of input video stream data by the GPU is supported. To meet the real-time requirement of high-performance concurrent video processing, the FFmpeg needs to start a multi-process mode, and the VPF supports multi-thread calling. The VPF tool is more advantageous from the perspective of system resource occupation and consumption.

If the number of the video paths to be processed exceeds the processing capacity of the system, determining the priority according to the service requirement, and selecting and inputting the video source according to the priority. After a certain path of video is analyzed, the input channel resources occupied by the certain path of video need to be released, and the certain path of video needs to be switched into other paths of videos to be processed. In addition, the video source is distributed to the corresponding GPU decoding process/thread according to a certain rule, such as a modulo remainder strategy. Suppose the video source path number and the GPU decoding process/thread are respectively

And

let us order

Then will be

The way video is distributed to the jth decoding process/thread. Here 1 decoding process/thread can handle high performance concurrencyAnd (6) video.

As shown in fig. 3, B: CPU preprocessing is carried out on the decoded video frame; the CPU preprocessing is the preprocessing of the video frame obtained in the previous step according to the business and model reasoning requirements, and specifically comprises the following steps:

each preprocessing process reads the video frame from the special input queue, and the contention waiting is avoided. To further improve the data reading efficiency, a non-waiting mode is adopted, such as Python language:

while True：

try：

input_data = queue_decode.get_nowait()

except：

pass

b3: and the scheduling and distributing submodule of the CPU distributes the result generated by the preprocessing process to the subsequent GPU reasoning submodule according to the modulo remainder rule. The GPU model inference comprises 3 submodules of input queue, GPU model inference, scheduling and distribution, and a multi-process mode is started, the processing flow is shown in figure 4,

as shown in fig. 5, the specific operation steps are as follows:

c4: performing model batch reasoning on the temporary list data;

c5: emptying the temporary list data, and repeating the step C2;

c6: and the scheduling distribution module of the GPU stores the batch reasoning result of the GPU model and pushes the batch reasoning result to the CPU post-processing module.

As shown in fig. 6, D: performing CPU post-processing on the result after model batch reasoning;

E: and (4) analyzing results: performing final analysis processing on a series of model reasoning results according to business logic and rules;

and the specific step of result analysis is as follows, if a certain path of video analysis task is completed, the video management submodule in the video decoding module is informed to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, hard decoding is continued.

The invention adopts a general process control multithreading mode to poll and call the decoding module, and accelerates hardware, thereby occupying the least CPU; and performing CPU pre-processing, GPU model reasoning and CPU post-processing on the decoded video data in a lock-free multi-process mode, so that an operating system is prevented from being frequently switched in and out, and parallelism is really realized.

The process/thread number of each module can be flexibly matched to achieve the maximum efficiency; the submodules of video decoding, CPU pretreatment, GPU inference and CPU post-treatment are all separated and can be repeatedly called according to the service requirement; and a shared cache is established to store the intermediate processing result, so that the frame processing efficiency is prevented from being reduced by data copying among processes.

The invention covers modules of video decoding, preprocessing, model reasoning, postprocessing and the like, and a method for transferring data among different modules. Through the editing module function, video content analysis such as target detection, attitude estimation and the like can be carried out.

By adopting a multi-process/multi-thread method to construct each processing module of the framework and flexibly matching the number of processes/threads started by each module, the high-performance concurrent video processing speed is improved as much as possible, and the real-time application requirement is met. Meanwhile, the frame is not limited to a specific language, and the expansibility is strong.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. A processing method based on a high-performance concurrent video real-time processing framework is characterized by comprising the following steps:

the step A specifically comprises the following steps:

a3: and dispatching and distributing the video frames to a CPU pretreatment module: the scheduling and distributing submodule carries out quality assurance and frame disorder processing on data generated by the decoding process/thread and distributes the data to a subsequent GPU reasoning module according to a modulo remainder rule;

b: CPU preprocessing is carried out on the decoded video frame;

the CPU pretreatment comprises the following steps:

b3: the scheduling and distributing submodule of the CPU distributes a result generated by the preprocessing process to a subsequent GPU reasoning submodule according to a modulo remainder rule;

the GPU model batch reasoning specifically comprises the following steps:

c4: performing model batch reasoning on the temporary list data;

c6: emptying the temporary list data, and repeating the step C2;

d: performing CPU post-processing on the result after model batch reasoning;

d3: distributing a result generated by the CPU post-processing process to a model reasoning module of the next stage according to a modulo remainder rule or sending the result to a final result analysis module;

2. The processing method based on the high-performance concurrent video real-time processing framework according to claim 1, wherein: the procedure for analysis of the results was as follows: if a certain path of video analysis task is completed, notifying a video management submodule in a video decoding module to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, continuously performing hard decoding.

3. The processing method based on the high-performance concurrent video real-time processing framework according to claim 1, wherein: video formats include MPEG, AVI, MOV, WMV, 3GP, RM/RMVB, FLV/F4V, and H.264/H.265.