CN113453010B - Processing method based on high-performance concurrent video real-time processing framework - Google Patents

Processing method based on high-performance concurrent video real-time processing framework Download PDF

Info

Publication number
CN113453010B
CN113453010B CN202111009878.8A CN202111009878A CN113453010B CN 113453010 B CN113453010 B CN 113453010B CN 202111009878 A CN202111009878 A CN 202111009878A CN 113453010 B CN113453010 B CN 113453010B
Authority
CN
China
Prior art keywords
video
processing
cpu
reasoning
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111009878.8A
Other languages
Chinese (zh)
Other versions
CN113453010A (en
Inventor
刘必振
丁皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Liangjie Data Technology Co ltd
Original Assignee
Zhijian Technology Jiangsu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhijian Technology Jiangsu Co ltd filed Critical Zhijian Technology Jiangsu Co ltd
Priority to CN202111009878.8A priority Critical patent/CN113453010B/en
Publication of CN113453010A publication Critical patent/CN113453010A/en
Application granted granted Critical
Publication of CN113453010B publication Critical patent/CN113453010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a processing method based on a high-performance concurrent video real-time processing framework, and belongs to the technical field of video analysis. The method comprises the following steps: a: hardware accelerated decoding is carried out on the high-performance concurrent video stream; b: CPU preprocessing is carried out on the decoded video frame; c: accumulating the pretreatment results, and performing GPU model batch reasoning after a certain number of pretreatment results are obtained; d: performing CPU post-processing on the result after model inference; e: and (4) analyzing results: and finally analyzing and processing a series of model reasoning results according to the business logic and rules. If a certain path of video analysis task is completed, notifying a video management submodule in a video decoding module to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, continuously performing hard decoding. The video decoding, preprocessing, model reasoning and post-processing modules of the invention are all separated, which is convenient for dynamic addition/deletion/modification, and the module functions can be flexibly modified according to the service requirements, and the invention is simple to use.

Description

Processing method based on high-performance concurrent video real-time processing framework
Technical Field
The invention relates to a processing method based on a high-performance concurrent video real-time processing framework, and belongs to the technical field of video analysis.
Background
The video analysis technology is used for extracting specific events or specific behaviors of monitoring targets occurring in video scenes by processing, analyzing and understanding the content of video signals, and is widely applied to numerous scenes such as intelligent security, entertainment live broadcast and the like. From an application point of view, two problems need to be solved with emphasis: firstly, the real-time performance of the processing analysis and secondly, the number of video paths processed simultaneously. NVIDIA corporation provides a data stream analysis tool, DeepStream, to perceive a scene through multi-sensor data processing and intelligent video analysis. Developers do not need to design an end-to-end solution from the beginning, only need to concentrate on building a core deep learning network, and the framework provides a hardware acceleration module. However, the deep stream tool also has some defects, and if the standard plug-in function cannot meet the requirement, self-writing is extremely complicated, and dynamic deletion/addition/replacement of the plug-ins in Pipeline is troublesome.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a processing method based on a high-performance concurrent video real-time processing framework, wherein video decoding, preprocessing, model reasoning and post-processing modules are all separated, so that dynamic addition/deletion/modification is facilitated, the functions of the modules can be flexibly modified according to business requirements, and the method is simple to use.
In order to solve the above problems, the processing method based on the high-performance concurrent video real-time processing framework of the present invention comprises the following steps:
a: hardware accelerated decoding is carried out on the high-performance concurrent video stream;
b: CPU preprocessing is carried out on the decoded video frame;
c: accumulating the pretreatment results, and performing GPU model batch reasoning after a certain number of pretreatment results are obtained;
d: performing CPU post-processing on the result after model batch reasoning;
e: and (4) analyzing results: and finally analyzing and processing a series of model reasoning results according to the business logic and rules.
Further, the step a specifically includes the following steps:
a1: performing video management on the video, wherein the video management comprises the access, change, termination and scheduling distribution of video source data, and distributing the video source to a corresponding GPU decoding process/thread according to a set rule;
a2: performing GPU decoding on the Video, wherein tools for the GPU decoding comprise FFmpeg, Video Processing Framework;
a3: and dispatching and distributing the video frames to a CPU pretreatment module: and the scheduling and distributing submodule carries out quality assurance and frame disorder processing on the data generated by the decoding process/thread and distributes the data to a subsequent GPU reasoning module according to a modulo remainder rule.
Further, the CPU preprocessing includes the following steps:
b1: b, dispatching and distributing the video decoded in the step A to a plurality of input queue sub-modules preprocessed by the CPU;
b2: the data are transmitted to a corresponding CPU preprocessing process/thread through a preprocessing input queue submodule, and enter a CPU scheduling and distributing submodule after preprocessing including image scaling, image graying and pixel value standardization is carried out;
b3: and the scheduling and distributing submodule of the CPU distributes the result generated by the preprocessing process to the subsequent GPU reasoning submodule according to the modulo remainder rule.
Further, the GPU model batch inference specifically includes the following steps:
c1: establishing a temporary list, and setting batch inference data size batch _ size;
c2: reading data from an input queue of the CPU in a non-waiting mode, if the data is successful, saving the data to a temporary list, and performing the step C3, otherwise, jumping to the step C4;
c3: judging whether the length of the list is equal to the batch _ size, if so, performing the step C4, otherwise, repeating the step C2;
c4: performing model batch reasoning on the temporary list data;
c5: the scheduling distribution module of the GPU stores the batch reasoning result of the GPU model and pushes the batch reasoning result to the CPU post-processing module;
c6: the temporary list data is emptied and step C2 is repeated.
Further, the CPU post-processing specifically includes the steps of:
d1: dispatching and distributing the data subjected to GPU batch reasoning to a plurality of input queue sub-modules processed by CPUs;
d2: the model reasoning result is transmitted to a CPU post-processing process submodule for post-processing through an input queue submodule of the CPU post-processing;
d3: and distributing the result generated by the CPU post-processing process to a model reasoning module of the next stage according to a modulo remainder rule or sending the result to a final result analysis module.
Further, the specific step of result analysis is as follows, if a certain path of video analysis task is completed, the video management submodule in the video decoding module is notified to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, hard decoding is continued.
Further, the video formats include MPEG, AVI, MOV, WMV, 3GP, RM/RMVB, FLV/F4V, and H.264/H.265.
The invention has the beneficial effects that: 1) each processing module is decoupled, and the operation is convenient. The video decoding, preprocessing, model processing and post-processing modules involved in the processing frame are relatively independent, and data flow is carried out in a shared queue mode, so that the adding, deleting and modifying operations of the modules are easy to realize.
2) And a multi-process/multi-thread mode is flexibly selected and matched, so that the processing efficiency is maximally improved. Through experimental comparison, the decoding module adopts a multi-thread mode, the pre-processing module, the model reasoning module and the post-processing module adopt a lock-free multi-process mode, the processing speed can be obviously improved, and the processes/threads started by each module can be flexibly matched according to different business requirements.
3) The framework has strong expandability. The implementation of the framework is not bound to a specific language, and Python language which is friendly to users and low in starting threshold can be selected, C/C + + language with high operation efficiency can be selected, or the combination of the Python language and the C/C + + language can be selected.
Drawings
FIG. 1 is a flow chart of a processing method based on a high-performance concurrent video real-time processing framework according to the present invention;
FIG. 2 is a block diagram of a video decoding process according to the present invention;
FIG. 3 is a block diagram of a CPU preprocessing flow of the present invention;
FIG. 4 is a block diagram of a process for GPU batch inference in the present invention;
FIG. 5 is a flowchart of the GPU model batch inference operation steps of the present invention;
FIG. 6 is a block diagram of the flow of the CPU post-processing of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
As shown in fig. 1, the processing method based on the high-performance concurrent video real-time processing framework of the present invention includes the following steps:
a: hardware accelerated decoding is carried out on the high-performance concurrent video stream; video formats include MPEG, AVI, MOV, WMV, 3GP, RM/RMVB, FLV/F4V, and H.264/H.265.
As shown in fig. 2, step a specifically includes the following steps:
a1: performing video management on the video, wherein the video management comprises the access, change, termination and scheduling distribution of video source data, and distributing the video source to a corresponding GPU decoding process/thread according to a set rule;
a2: performing GPU decoding on the Video, wherein tools for the GPU decoding comprise FFmpeg, Video Processing Framework (VPF);
VPF is an open source video processing framework suitable for Python by NVIDIA corporation, and can handle tasks such as video encoding, decoding, color space and pixel format conversion. Whether FFmpeg or VPF, hardware accelerated decoding of input video stream data by the GPU is supported. To meet the real-time requirement of high-performance concurrent video processing, the FFmpeg needs to start a multi-process mode, and the VPF supports multi-thread calling. The VPF tool is more advantageous from the perspective of system resource occupation and consumption.
A3: and dispatching and distributing the video frames to a CPU pretreatment module: and the scheduling and distributing submodule carries out quality assurance and frame disorder processing on the data generated by the decoding process/thread and distributes the data to a subsequent GPU reasoning module according to a modulo remainder rule.
If the number of the video paths to be processed exceeds the processing capacity of the system, determining the priority according to the service requirement, and selecting and inputting the video source according to the priority. After a certain path of video is analyzed, the input channel resources occupied by the certain path of video need to be released, and the certain path of video needs to be switched into other paths of videos to be processed. In addition, the video source is distributed to the corresponding GPU decoding process/thread according to a certain rule, such as a modulo remainder strategy. Suppose the video source path number and the GPU decoding process/thread are respectively
Figure 195701DEST_PATH_IMAGE001
And
Figure 611639DEST_PATH_IMAGE002
let us order
Figure 81935DEST_PATH_IMAGE003
Then will be
Figure 625042DEST_PATH_IMAGE004
The way video is distributed to the jth decoding process/thread. Here 1 decoding process/thread can handle high performance concurrencyAnd (6) video.
As shown in fig. 3, B: CPU preprocessing is carried out on the decoded video frame; the CPU preprocessing is the preprocessing of the video frame obtained in the previous step according to the business and model reasoning requirements, and specifically comprises the following steps:
b1: b, dispatching and distributing the video decoded in the step A to a plurality of input queue sub-modules preprocessed by the CPU;
b2: the data are transmitted to a corresponding CPU preprocessing process/thread through a preprocessing input queue submodule, and enter a CPU scheduling and distributing submodule after preprocessing including image scaling, image graying and pixel value standardization is carried out;
each preprocessing process reads the video frame from the special input queue, and the contention waiting is avoided. To further improve the data reading efficiency, a non-waiting mode is adopted, such as Python language:
while True:
try:
input_data = queue_decode.get_nowait()
except:
pass
b3: and the scheduling and distributing submodule of the CPU distributes the result generated by the preprocessing process to the subsequent GPU reasoning submodule according to the modulo remainder rule. The GPU model inference comprises 3 submodules of input queue, GPU model inference, scheduling and distribution, and a multi-process mode is started, the processing flow is shown in figure 4,
c: accumulating the pretreatment results, and performing GPU model batch reasoning after a certain number of pretreatment results are obtained;
as shown in fig. 5, the specific operation steps are as follows:
c1: establishing a temporary list, and setting batch inference data size batch _ size;
c2: reading data from an input queue of the CPU in a non-waiting mode, if the data is successful, saving the data to a temporary list, and performing the step C3, otherwise, jumping to the step C4;
c3: judging whether the length of the list is equal to the batch _ size, if so, performing the step C4, otherwise, repeating the step C2;
c4: performing model batch reasoning on the temporary list data;
c5: emptying the temporary list data, and repeating the step C2;
c6: and the scheduling distribution module of the GPU stores the batch reasoning result of the GPU model and pushes the batch reasoning result to the CPU post-processing module.
As shown in fig. 6, D: performing CPU post-processing on the result after model batch reasoning;
d1: dispatching and distributing the data subjected to GPU batch reasoning to a plurality of input queue sub-modules processed by CPUs;
d2: the model reasoning result is transmitted to a CPU post-processing process submodule for post-processing through an input queue submodule of the CPU post-processing;
d3: and distributing the result generated by the CPU post-processing process to a model reasoning module of the next stage according to a modulo remainder rule or sending the result to a final result analysis module.
E: and (4) analyzing results: performing final analysis processing on a series of model reasoning results according to business logic and rules;
and the specific step of result analysis is as follows, if a certain path of video analysis task is completed, the video management submodule in the video decoding module is informed to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, hard decoding is continued.
The invention adopts a general process control multithreading mode to poll and call the decoding module, and accelerates hardware, thereby occupying the least CPU; and performing CPU pre-processing, GPU model reasoning and CPU post-processing on the decoded video data in a lock-free multi-process mode, so that an operating system is prevented from being frequently switched in and out, and parallelism is really realized.
The process/thread number of each module can be flexibly matched to achieve the maximum efficiency; the submodules of video decoding, CPU pretreatment, GPU inference and CPU post-treatment are all separated and can be repeatedly called according to the service requirement; and a shared cache is established to store the intermediate processing result, so that the frame processing efficiency is prevented from being reduced by data copying among processes.
The invention covers modules of video decoding, preprocessing, model reasoning, postprocessing and the like, and a method for transferring data among different modules. Through the editing module function, video content analysis such as target detection, attitude estimation and the like can be carried out.
By adopting a multi-process/multi-thread method to construct each processing module of the framework and flexibly matching the number of processes/threads started by each module, the high-performance concurrent video processing speed is improved as much as possible, and the real-time application requirement is met. Meanwhile, the frame is not limited to a specific language, and the expansibility is strong.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (3)

1. A processing method based on a high-performance concurrent video real-time processing framework is characterized by comprising the following steps:
a: hardware accelerated decoding is carried out on the high-performance concurrent video stream;
the step A specifically comprises the following steps:
a1: performing video management on the video, wherein the video management comprises the access, change, termination and scheduling distribution of video source data, and distributing the video source to a corresponding GPU decoding process/thread according to a set rule;
a2: performing GPU decoding on the Video, wherein tools for the GPU decoding comprise FFmpeg, Video Processing Framework;
a3: and dispatching and distributing the video frames to a CPU pretreatment module: the scheduling and distributing submodule carries out quality assurance and frame disorder processing on data generated by the decoding process/thread and distributes the data to a subsequent GPU reasoning module according to a modulo remainder rule;
b: CPU preprocessing is carried out on the decoded video frame;
the CPU pretreatment comprises the following steps:
b1: b, dispatching and distributing the video decoded in the step A to a plurality of input queue sub-modules preprocessed by the CPU;
b2: the data are transmitted to a corresponding CPU preprocessing process/thread through a preprocessing input queue submodule, and enter a CPU scheduling and distributing submodule after preprocessing including image scaling, image graying and pixel value standardization is carried out;
b3: the scheduling and distributing submodule of the CPU distributes a result generated by the preprocessing process to a subsequent GPU reasoning submodule according to a modulo remainder rule;
c: accumulating the pretreatment results, and performing GPU model batch reasoning after a certain number of pretreatment results are obtained;
the GPU model batch reasoning specifically comprises the following steps:
c1: establishing a temporary list, and setting batch inference data size batch _ size;
c2: reading data from an input queue of the CPU in a non-waiting mode, if the data is successful, saving the data to a temporary list, and performing the step C3, otherwise, jumping to the step C4;
c3: judging whether the length of the list is equal to the batch _ size, if so, performing the step C4, otherwise, repeating the step C2;
c4: performing model batch reasoning on the temporary list data;
c5: the scheduling distribution module of the GPU stores the batch reasoning result of the GPU model and pushes the batch reasoning result to the CPU post-processing module;
c6: emptying the temporary list data, and repeating the step C2;
d: performing CPU post-processing on the result after model batch reasoning;
d1: dispatching and distributing the data subjected to GPU batch reasoning to a plurality of input queue sub-modules processed by CPUs;
d2: the model reasoning result is transmitted to a CPU post-processing process submodule for post-processing through an input queue submodule of the CPU post-processing;
d3: distributing a result generated by the CPU post-processing process to a model reasoning module of the next stage according to a modulo remainder rule or sending the result to a final result analysis module;
e: and (4) analyzing results: and finally analyzing and processing a series of model reasoning results according to the business logic and rules.
2. The processing method based on the high-performance concurrent video real-time processing framework according to claim 1, wherein: the procedure for analysis of the results was as follows: if a certain path of video analysis task is completed, notifying a video management submodule in a video decoding module to change or terminate the path of video source and release the shared memory occupied by the path of video source, otherwise, continuously performing hard decoding.
3. The processing method based on the high-performance concurrent video real-time processing framework according to claim 1, wherein: video formats include MPEG, AVI, MOV, WMV, 3GP, RM/RMVB, FLV/F4V, and H.264/H.265.
CN202111009878.8A 2021-08-31 2021-08-31 Processing method based on high-performance concurrent video real-time processing framework Active CN113453010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111009878.8A CN113453010B (en) 2021-08-31 2021-08-31 Processing method based on high-performance concurrent video real-time processing framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111009878.8A CN113453010B (en) 2021-08-31 2021-08-31 Processing method based on high-performance concurrent video real-time processing framework

Publications (2)

Publication Number Publication Date
CN113453010A CN113453010A (en) 2021-09-28
CN113453010B true CN113453010B (en) 2021-12-10

Family

ID=77819275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111009878.8A Active CN113453010B (en) 2021-08-31 2021-08-31 Processing method based on high-performance concurrent video real-time processing framework

Country Status (1)

Country Link
CN (1) CN113453010B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008003833A1 (en) * 2006-07-07 2008-01-10 Linkotec Oy Media content transcoding
CN112087632A (en) * 2019-06-12 2020-12-15 阿里巴巴集团控股有限公司 Video processing system, method, storage medium and computer device
CN110493626B (en) * 2019-09-10 2020-12-01 海信集团有限公司 Video data processing method and device
CN112637634B (en) * 2020-12-24 2022-08-05 北京睿芯高通量科技有限公司 High-concurrency video processing method and system for multi-process shared data

Also Published As

Publication number Publication date
CN113453010A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN105163127B (en) video analysis method and device
EP3244621B1 (en) Video encoding method, system and server
CN109711323B (en) Real-time video stream analysis acceleration method, device and equipment
CN113221706A (en) Multi-process-based multi-channel video stream AI analysis method and system
CN109769115A (en) A kind of method, apparatus and equipment of Intelligent Optimal video analysis performance
CN110851255B (en) Method for processing video stream based on cooperation of terminal equipment and edge server
EP4262205A1 (en) Video predictive coding method and apparatus
CN102802024A (en) Transcoding method and transcoding system realized in server
CN113535366A (en) High-performance distributed combined multi-channel video real-time processing method
CN113286175A (en) Video stream processing method, device and storage medium
CN101682761B (en) A system and method for time optimized encoding
CN113905196B (en) Video frame management method, video recorder, and computer-readable storage medium
CN114222166B (en) Multi-channel video code stream real-time processing and on-screen playing method and related system
CN113453010B (en) Processing method based on high-performance concurrent video real-time processing framework
WO2017162015A1 (en) Data processing method and apparatus, and storage medium
CN112839239B (en) Audio and video processing method and device and server
CN105323593A (en) Multimedia transcoding scheduling method and multimedia transcoding scheduling device
CN112543374A (en) Transcoding control method and device and electronic equipment
US6369848B1 (en) Picture data transmission device and picture signal coding method thereof
WO2008031039A2 (en) Audio/video recording and encoding
CN105338371A (en) Multimedia transcoding scheduling method and apparatus
CN112637538B (en) Smart tag method, system, medium, and terminal for optimizing video analysis
CN114640854A (en) Real-time high-speed decoding method for multi-channel video stream
CN102833620B (en) System and method for time optimized encoding
CN110969672A (en) Image compression method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230810

Address after: L6069, Floor 6, Youzhi Building, 29 Kejian Road, Jiangning District, Nanjing, Jiangsu Province, 210000

Patentee after: Jiangsu Liangjie Data Technology Co.,Ltd.

Address before: 211100 l6035, 6 / F, Youzhi building, No. 29, Kejian Road, Jiangning District, Nanjing City, Jiangsu Province

Patentee before: Zhijian Technology (Jiangsu) Co.,Ltd.

TR01 Transfer of patent right