CN112637634B

CN112637634B - High-concurrency video processing method and system for multi-process shared data

Info

Publication number: CN112637634B
Application number: CN202011554999.6A
Authority: CN
Inventors: 葛长恩; 罗鑫
Original assignee: Beijing Ruixin High Throughput Technology Co ltd
Current assignee: Beijing Zhongke Flux Technology Co ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2022-08-05
Anticipated expiration: 2040-12-24
Also published as: CN112637634A

Abstract

The invention discloses a high-concurrency video processing method and a high-concurrency video processing system for multi-process shared data, wherein the processing method comprises the following steps: s1: inputting video segments with different sizes into a session manager for managing different video programs; s2: the method comprises the steps that a fragment-based multi-process decoding task scheduling load balancing method is used, data distribution is carried out on an input video segment to queues corresponding to different decoding processes according to the video fragment with the smallest weight unit, and then corresponding video segment data in the queues are sent to the corresponding decoding processes in a socket mode; s3: the decoding process decodes the received video clip and places the decoded data into the corresponding shared memory queue; s4: the main process acquires YUV data from different shared memory queues, correspondingly processes each acquired frame of YUV data, and returns the YUV data to a corresponding buffer area after the YUV data is processed; s5: and returning the processing result to the session manager.

Description

High-concurrency video processing method and system for multi-process shared data

Technical Field

The invention relates to the field of data processing, in particular to a method and a system for processing high-concurrency video sharing data by multiple processes, and more particularly to a method and a system capable of carrying out balanced processing on high-concurrency video load and realizing shared memory communication of data among multiple processes.

Background

In recent years, the global internet traffic scale is continuously increasing at a high speed, and according to the statistics of Chinese telecommunications, the future video traffic will dominate the internet traffic, and the video traffic is expected to account for more than 82% of the total internet traffic by 2021. On-line video materials are usually compressed by using a codec, and at an equipment end, in order to be able to correctly browse the materials, a user can directly download and install a codec package to realize decompression and playing of videos, while at a cloud video processing center, a large number of different videos need to be decoded and analyzed every day, but the current video decoding schemes and technologies based on a data center are relatively few.

The video decoding scheme of the existing video processing data center usually adopts a single-process multi-channel video decoding method. The management of video conversation and the decoding of different videos and the processing and analysis of decoded data are realized in a single process, the memories accessed in the video processing process of each module in the process can be shared, the use is convenient, and the realization difficulty is low. In addition, many data centers use hardware acceleration devices such as the great GPU to perform decoding and video analysis of multiple videos. However, in the existing scheme for implementing video processing by adopting a single process, when a process is in error or crashed or hardware equipment is in error, the whole video processing service process is stopped; if a single-process video processing process is directly divided into multiple processes by modules or by an acceleration device, data copy overhead of inter-process communication is increased. Therefore, a method capable of balancing the load of processing tasks of a plurality of decoding devices or decoding processes and dividing the processes according to modules is needed to realize cooperative work among different processes and transfer of YUV data among different processes, where YUV is a color coding format, and the decoded data is usually data in the YUV format.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a method and a system for processing multi-process shared data high-concurrency videos, which realize load balance of multi-process decoding tasks by distributing and scheduling a plurality of paths of videos to a plurality of different processes, reduce the influence on a main process when the process exits due to abnormal decoding process or failure of hardware acceleration equipment, and enhance the fault tolerance of the system; meanwhile, through the processing of data sharing between the multiple decoding processes and the decoded YUV data processing process, the coupling between different processes of the system is reduced, and the zero copy of the YUV data among the processes is realized, so that the overhead in the aspect of data communication brought by the multi-process cooperative work is reduced.

In order to achieve the above object, the present invention provides a method for processing highly concurrent video with shared data by multiple processes, which comprises the following steps:

s1: inputting video segments with different sizes into a session manager for managing different video programs;

s2: the method comprises the steps that a fragment-based multi-process decoding task scheduling load balancing method is used, data distribution is carried out on an input video segment to queues corresponding to different decoding processes according to a video fragment with the smallest weight unit, and then corresponding video fragment data in the queues are sent to the corresponding decoding processes in a socket mode;

s3: the decoding process decodes the received video clip and places the decoded data into the corresponding shared memory queue;

s4: the main process acquires YUV data from different shared memory queues, correspondingly processes each acquired frame of YUV data, and returns the YUV data to a corresponding buffer area after the YUV data is processed;

s5: and returning the processing result to the session manager.

In an embodiment of the present invention, the specific steps of performing data distribution in S2 are as follows:

s21, acquiring any video segment and acquiring current video segment information, wherein the video segment information comprises: the occupied memory space size, the video session information, the video bandwidth and the video height are high;

s22, calculating the number of the frame number of the current video segment and the number of the key frame I through multimedia processing software, and calculating the weight W of the current video segment through the frame number, the video width and the video height:

W＝frame_num×width×height

wherein, frame _ num is the frame number of the current video segment, width is the video width, and height is the video height;

s23, splitting the video segment into a plurality of video segments with the minimum weight unit according to the weight W of the current video segment;

s24, distributing the split video clips, and the specific process is as follows: comparing the total weight of the queues corresponding to all the processes to obtain a minimum total weight queue, adding the video segment of the 1 st minimum weight unit split in the S23 into the minimum total weight queue, and completing the distribution of the video segment of the 1 st minimum weight unit to the decoding process corresponding to the minimum total weight queue;

and S25, repeating the step S24 until all the video clips with the minimum weight unit are distributed to the decoding process corresponding to the queue with the minimum total weight.

In an embodiment of the present invention, the specific process of S23 is as follows:

s2301: checking the number of key frames I of the current video segment, wherein if the number of the I frames is 1, the current video segment is the video segment with the minimum weight unit; if the number of I frames is greater than 1, comparing the weight W of the current video segment calculated in S22 with a preset minimum weight unit;

s2302: if the weight W of the current video section is larger than the minimum weight unit, calculating the weight of a GOP according to the size of a GOP (group of pictures) in which any I frame in the video sections is positioned, wherein the GOP is all video frames including the current I frame from the current I frame to the next I frame, the size of the GOP is the number of the video frames included in the current GOP, and calculating the weight W of the current GOP according to the size of the current GOP _GOP The calculation method comprises the following steps:

W _GOP size of GOP × width × height

Wherein, width is the video width, height is the video height;

s2303: calculating the current GOP weight W of the 1 st GOP of the current video segment _GOP1 Previously, the total weight sumW of the current video segment ₀ Is 0;

s2304: calculating the current GOP weight W of the 2 nd GOP of the current video clip _GOP2 Previously, the total weight sumW of the current video segment ₁ ＝W _GOP1 +sumW ₀ The same applies to the calculation of the current GOP weight W of the xth GOP _GOPx Previously, the total weight sumW of the current video segment _x-1 ＝W _GOPx-1 +sumW _x-2 ；

S2305: when W is _GOPx +sumW _x-1 When the value of (1) is greater than the minimum weight unit, dividing the video sections from the 1 st GOP to the x-th GOP as a single video section, resetting the sumW to 0, and repeating S2303 and S2304 with the next GOP as the 1 st GOP; otherwise, entering the next step;

s2306: if the nth GOP is calculated and n < x is not available for calculation, the video clips from the 1 st GOP to the nth GOP are directly split as a single video clip.

In an embodiment of the present invention, the specific process of S3 is as follows:

s31: the main process allocates a shared memory according to the name of each decoding process, and divides the shared memory into two queues, namely a busy queue and an empty queue, wherein each queue comprises a plurality of shared memory blocks and is used for storing the decoded YUV data, the empty queue is a queue for managing an idle YUV buffer area, and the busy queue is a queue for managing a buffer area for caching the YUV data after decoding;

s32: when the decoding process decodes new YUV data, the decoding process acquires an idle YUV buffer area from the empty queue;

s33: and the decoding process fills the YUV data information into a YUV buffer area according to the video segment information corresponding to the current YUV data, wherein the YUV data information comprises: program information, image width, image height, image size, image format;

s34: and the decoding process puts the YUV buffer area filled with the data into a busy queue to finish the storage of the YUV data in the shared memory.

In an embodiment of the present invention, the specific process of S4 is as follows:

s41: the main process sets the state of a busy queue corresponding to the decoding process for each decoding process by a plurality of image processing threads, and when the busy queue is inquired to have data, corresponding YUV data nodes are directly taken out from the head of the busy queue;

s42: the main process directly performs corresponding analysis processing on the YUV data in the YUV data node taken out in the S41 according to the specific service requirement of the obtained YUV data node;

s43: and after the analysis processing of the YUV data in the S42 is completed by the main process, the YUV buffer area is placed in the empty queue corresponding to the decoding process of the YUV data again, and the YUV buffer area is reused by the decoding process.

In an embodiment of the present invention, the main process implements state query on each decoding process through the monitoring thread, and when any decoding process exits abnormally, the main process directly destroys the shared memory corresponding to the decoding process.

In order to achieve the above object, the present invention further provides a system for highly concurrent video processing with shared data by multiple processes, configured to perform the foregoing method, including:

the session management module comprises a session manager and a video processing module, wherein the session manager is used for managing video segment data input into the system and decoded YUV data;

the data distribution module is in data connection with the session management module and is used for distributing different video segments to queues corresponding to different decoding processes by executing a multi-process decoding task scheduling load balancing method;

the decoding module comprises a plurality of decoding processes, is in data connection with the data distribution module and is used for decoding the distributed corresponding video clips through different decoding processes;

the shared memory module comprises a plurality of shared memories, each shared memory is distributed by the main process according to the name of each decoding process, is in data connection with the decoding module and is used for managing the decoded YUV data in a queue mode;

and the image processing module comprises an image processor, is in data connection with the shared memory module and the session management module, and is used for correspondingly processing the YUV data acquired from the shared memory module according to service requirements and returning a processing result to the session management module.

In an embodiment of the present invention, each shared memory includes two queues, which are a busy queue and an empty queue, respectively, each queue includes a plurality of shared memory blocks, each shared memory block is used as a buffer for storing decoded YUV data, the empty queue is used to manage a buffer for idle YUV data, and the busy queue is used to manage a buffer for buffering decoded YUV data.

Compared with the prior art, the invention has at least the following advantages:

(1) the multi-process decoding task scheduling load balancing method based on the fragments is used, an effective task balancing scheduling method is provided for multi-channel video decoding under the high concurrency condition, and load balancing of all decoding processes is achieved;

(2) by adopting a management mode of sharing the memory by multiple queues, the coupling among different processes of the system is reduced, zero copy of decoded data is realized, and the overhead in data communication caused by the cooperative work of the multiple processes is reduced;

(3) by cooperatively decoding a plurality of processes, normal use of a main process is not influenced when the process exits due to the exception of one decoding process or the fault of hardware acceleration equipment used by the process, so that the fault tolerance of the system can be enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a block diagram of a video processing system architecture according to the present invention;

FIG. 3 is a flowchart of a multi-process decode task scheduling process of the present invention;

description of reference numerals: 10-a session management module; 20-a data distribution module; 30-a decoding module; 40-shared memory module; 50-image processing module.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Fig. 1 is a flowchart of a method of the present invention, and as shown in fig. 1, the present invention provides a method for processing a high-concurrency video with shared data by multiple processes, which includes the following steps:

s1: inputting video segments of various sizes into a session manager (session manager) for managing different video programs;

s2: the method comprises the steps of using a fragment-based multi-process decoding task scheduling load balancing method, distributing data of an input video segment to queues corresponding to different decoding processes according to a video fragment with a minimum weight unit, and then sending corresponding video fragment data in the queues to the corresponding decoding processes in a socket mode, wherein a socket (Chinese is called socket) is a set of network scheme for java to tcp layer communication encapsulation, and socket connection is used for realizing long connection between a server and a client;

for decoding and scheduling of multi-channel videos among multiple processes, the embodiment of the invention adopts a method for dividing according to video segments, each video segment is divided into video segments with different sizes according to the preset minimum weight unit of the video segment, different queues are adopted for managing different video segments, and then the different queues are bound to different processes, so that data distribution of different processes is realized. Meanwhile, in order to ensure the load balance of the system, each video segment has a corresponding weight, and the queue of each video segment has a corresponding total weight, so that the total weight of the tasks distributed by each video decoding process can be quickly inquired and compared during the load balance, and the data distribution is performed on the new decoding tasks according to the total weight, thereby realizing the load balance of the task data distribution.

The specific steps of data distribution in S2 are as follows:

s21, acquiring any video segment and acquiring current video segment information, wherein the video segment information comprises: the occupied memory space size, the video session information, the video bandwidth, the video height and the like;

s22, calculating the frame number (frame _ num) and the number of key frame I frames of the current video segment through multimedia processing software (ffmpeg or other video decoding software), and calculating the weight W of the current video segment through the frame number (frame _ num), the video width (width) and the video height (height):

W＝frame_num×width×height；

the specific process of splitting the video segment into a plurality of video segments with the minimum weight unit is as follows:

s2302: if the weight W of the current video segment is larger than the minimum weight unit, the weight of the GOP is calculated according to the size of a GOP (Group of Pictures, which is called as a picture Group) where any I frame in the video segment is located, wherein the GOP is all video frames including the current I frame from the current I frame to the next I frame, the size of the GOP is the number of the video frames included in the current GOP, and the calculation method for calculating the weight of the current GOP according to the size of the current GOP comprises the following steps:

current GOP weight W _GOP Size of GOP × width × height

Wherein, width is the video width, height is the video height;

s2304: calculating the current GOP weight W of the 2 nd GOP of the current video segment _GOP2 Previously, the total weight sumW of the current video segment ₁ ＝W _GOP1 +sumW ₀ The same applies to the calculation of the current GOP weight W of the xth GOP _GOPx Previously, the total weight sumW of the current video segment _x-1 ＝W _GOPx-1 +sumW _x-2 ；

S2305: when W is _GOPx +sumW _x-1 When the value of (a) is greater than the minimum weight unit, dividing the video segments from the 1 st GOP to the xth GOP as a single video segment, resetting sumW to 0, and repeating S2303 and S2304 with the next GOP (xth +1 GOP) as the 1 st GOP; otherwise, entering the next step;

s2306: if the nth GOP is calculated and n is less than x, namely no subsequent GOP can be used for calculation, the video clips from the 1 st GOP to the nth GOP are directly split as a single video clip;

s24, distributing the split video clips, and the specific process is as follows: comparing the total weight of the queues corresponding to all the processes to obtain a minimum total weight queue, adding the video segment of the 1 st minimum weight unit split in the step S23 into the minimum total weight queue, and completing the distribution of the video segment of the 1 st minimum weight unit to the decoding process corresponding to the minimum total weight queue;

and S25, repeating S24 until all video clips with the minimum weight unit are distributed to the decoding process corresponding to the minimum total weight queue.

By the steps, the video segments can be segmented and distributed, so that the total weight of decoding tasks loaded by all decoding processes is basically consistent, and the loads of all decoding processes are balanced.

S3: the decoding process decodes the received video clips and places the decoded data into corresponding shared memory queues;

wherein, the specific process of S3 is as follows:

s31: the method comprises the steps that a main process allocates a shared memory according to the name of each decoding process, the shared memory is divided into two queues, namely a busy queue and an empty queue, each queue comprises a plurality of shared memory blocks and is used for storing decoded YUV data, the empty queue is used for managing an idle YUV buffer area, and the busy queue is used for managing a buffer area for caching the YUV data after decoding;

s32: when the decoding process decodes new YUV data, the decoding process acquires an idle YUV buffer area from an empty queue (empty queue);

s33: and the decoding process fills the YUV data information into a YUV buffer area according to the video segment information corresponding to the current YUV data, wherein the YUV data information comprises: program information, image width, image height, image size, image format, and the like;

s34: and the decoding process puts the YUV buffer area filled with the data into a busy queue (busy queue) to finish the storage of the YUV data into the shared memory.

In this embodiment, the decoding process implements management of the shared memory in a queue manner.

S4: the main process acquires YUV data from different shared memory queues, correspondingly processes each frame of acquired YUV data, and returns a corresponding buffer area to an empty queue (empty queue) after the processing is finished, so that the shared memory buffer area is reused;

wherein, the specific process of S4 is as follows:

s41: the main process sets a plurality of image processing threads for each decoding process to continuously inquire the state of a busy queue (busy queue) corresponding to the corresponding decoding process, and when the busy queue (busy queue) is inquired to have data, a corresponding YUV data node is directly taken out from the head of the busy queue (busy queue);

s43: after the YUV data analysis processing in S42 is completed by the host process, the YUV buffer is placed again in an empty queue (empty queue) corresponding to the YUV data decoding process, and the YUV buffer is reused by the decoding process.

In an embodiment of the present invention, the main process queries the state of the decoding process through the monitoring thread, and when a certain decoding process exits abnormally, the main process directly destroys the shared memory corresponding to the decoding process, so as to enhance the fault tolerance of the system.

In this embodiment, the main process and the decoding process continuously obtain corresponding nodes in the empty queue and the busy queue, so that the YUV data is transmitted between two different processes, for the same decoding process, all the YUV data are always in the same buffer, and no data copy is made on the YUV data between the main process and the decoding process, thereby reducing the overhead in data communication caused by multi-process cooperative work. Meanwhile, the buffer processing of the YUV data and the state identification of the YUV data are realized by setting two queues for each decoding process.

S5: the processing result is returned to the Session manager.

Fig. 2 is a schematic diagram of a video processing system according to the present invention, and as shown in fig. 2, the present invention further provides a method for processing high-concurrency video with shared data by multiple processes, which includes:

a Session management module (10) including a Session manager (Session manager) for managing video segment data input into the system and decoded YUV data;

the data distribution module (20) (Dispatch) is in data connection with the session management module and is used for distributing different video segments to queues corresponding to different decoding processes by executing a multi-process decoding task scheduling load balancing method;

a decoding module (30) which comprises a plurality of decoding processes, is in data connection with the data distribution module (Dispatch), and is used for decoding the distributed corresponding video clips through different decoding processes;

the shared memory module (40) comprises a plurality of shared memories, each shared memory is distributed by the main process according to the name of each decoding process, is in data connection with the decoding module, and is used for managing the decoded YUV data in a queue mode;

each shared memory comprises two queues, namely a busy queue (busy queue) and an empty queue (empty queue), wherein each queue comprises a plurality of shared memory blocks and is used as a buffer area for storing decoded YUV data, the empty queue (empty queue) is used for managing the buffer area of the idle YUV data, and the busy queue (busy queue) is used for managing the buffer area for caching the YUV data after decoding;

and the Image processing module (50) comprises an Image processor (Image processor), is in data connection with the shared memory module and the session management module, and is used for correspondingly processing the YUV data acquired from different shared memory queues of the shared memory module according to business requirements and returning a processing result to the session management module.

According to the invention, by distributing and scheduling the tasks of the multi-path video to various different processes, the load balance of the multi-process decoding task is realized, the influence on the main process when the process exits due to the abnormal decoding process or the fault of hardware acceleration equipment is reduced, and the fault tolerance of the system is enhanced; meanwhile, through the processing of data sharing between the multiple decoding processes and the decoded YUV data processing process, the coupling between different processes of the system is reduced, and the zero copy of the YUV data among the processes is realized, so that the overhead in the aspect of data communication brought by the multi-process cooperative work is reduced.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A high-concurrency video processing method for multi-process shared data is characterized by comprising the following steps:

s2: the method comprises the steps of using a fragment-based multi-process decoding task scheduling load balancing method, carrying out data distribution on an input video segment to queues corresponding to different decoding processes according to a video fragment with a minimum weight unit, and then sending corresponding video segment data in the queues to the corresponding decoding processes in a socket mode, wherein the specific steps of carrying out data distribution are as follows:

s21: acquiring any video segment and acquiring current video segment information, wherein the video segment information comprises: the occupied memory space size, the video session information, the video bandwidth and the video height are high;

s22: calculating the number of the frame of the current video segment and the number of the key frame I frames through multimedia processing software, and calculating the weight W of the current video segment through the frame number, the video width and the video height:

W＝frame_num×width×height

s23: according to the weight W of the current video segment, splitting the video segment into a plurality of video segments with the minimum weight unit, which comprises the following specific processes:

W _GOP size of GOP × width × height

Wherein, width is the video width, height is the video height;

s2303: calculating the current GOP weight W of the 1 st GOP of the current video clip _GOP1 Previously, the total weight sumW of the current video segment ₀ Is 0;

s2304: calculating the current GOP weight W of the 2 nd GOP of the current video segment _GOP2 Total weight sumW of previous, current video segment ₁ ＝W _GOP1 +sumW ₀ The same applies to the calculation of the current GOP weight W of the xth GOP _GOPx Previously, the total weight sumW of the current video segment _x-1 ＝W _GOPx-1 +sumW _x-2 ；

s24: distributing the split video clips;

s3: the decoding process decodes the received video segments and places the decoded data into the corresponding shared memory queue, and the specific process is as follows:

s34: the decoding process puts the YUV buffer area filled with the data into a busy queue to finish the storage of the YUV data in the shared memory;

s4: the main process acquires YUV data from different shared memory queues, correspondingly processes each acquired frame of YUV data, and returns the YUV data to a corresponding buffer area after the YUV data is processed, and the specific process comprises the following steps:

s43: after the analysis processing of the YUV data in the S42 is completed by the main process, the YUV buffer area is placed in the empty queue corresponding to the decoding process of the YUV data again, and the YUV buffer area is reused by the decoding process;

s5: and returning the processing result to the session manager.

2. The video processing method according to claim 1, wherein the specific process of S24 is: comparing the total weight of the queues corresponding to all the processes to obtain a minimum total weight queue, adding the video segment of the 1 st minimum weight unit split in the S23 into the minimum total weight queue, and completing the distribution of the video segment of the 1 st minimum weight unit to the decoding process corresponding to the minimum total weight queue;

then, execution of S25: and repeating the step S24 until all the video clips with the minimum weight unit are distributed to the decoding process corresponding to the minimum total weight queue.

3. The video processing method according to claim 1, wherein the main process queries the state of each decoding process through the monitoring thread, and when any decoding process exits abnormally, the main process directly destroys the shared memory corresponding to the decoding process.

4. A multi-process data sharing high-concurrency video processing system for performing the method of any one of claims 1 to 3, comprising:

5. The video processing system of claim 4, wherein each shared memory comprises two queues, namely a busy queue and an empty queue, each queue comprises a plurality of shared memory blocks, each shared memory block is used as a buffer for storing the decoded YUV data, wherein the empty queue is used for managing a free YUV data buffer, and the busy queue is used for managing a decoded YUV data buffer.