CN115695883A

CN115695883A - Video data processing method, device, equipment and storage medium

Info

Publication number: CN115695883A
Application number: CN202211182780.7A
Authority: CN
Inventors: 孙鹏飞
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-02-03

Abstract

The embodiment of the invention provides a video data processing method, a video data processing device, video data processing equipment and a storage medium, and relates to the technical field of multimedia. The specific implementation scheme is as follows: acquiring a plurality of paths of target video streams collected aiming at a target scene; for each path of target video stream, acquiring synchronous time information corresponding to the path of target video stream from a specified frame of the path of target video stream; aligning the multiple paths of target video streams by utilizing the synchronous time information corresponding to each path of target video stream; the alignment processing is used for aligning video frames corresponding to the same synchronization time information in different target video streams; performing confluence processing on the aligned multi-path target video stream to obtain confluence data; and the confluence processing representation synthesizes the aligned video frames in each path of target video stream into video frames corresponding to multiple pictures. Therefore, the scheme can realize accurate synchronization of multiple pictures.

Description

Video data processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing video data.

Background

With the development of the 5G technology, online cloud performance and online live broadcast viewing become normal. In order to break through the traditional single live broadcast visual angle limitation, multi-picture live broadcast is carried out.

In the related art, the multi-view live broadcast often only simply synthesizes video views corresponding to multiple viewing angles into one interface for multi-view display.

The inventor finds in research that, when the related art is utilized, the corresponding time of the video frames corresponding to each view angle is not synchronous when displayed, that is, the display time in the client may be different due to the difference of the transmission speed of the video stream and the like among a plurality of video frames corresponding to the same video acquisition time. Therefore, the scheme provided by the related art causes poor association among the video pictures displayed at the same time in the client, and the viewing experience of the user is seriously influenced.

Therefore, a video data processing method is needed to achieve accurate synchronization of multiple pictures.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a device and a storage medium for processing video data, so as to implement accurate synchronization of multiple pictures. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a video data processing method, which is applied to a server, and the method includes:

acquiring a plurality of paths of target video streams collected aiming at a target scene; each path of target video stream corresponds to a video picture of a visual angle;

for each path of target video stream, acquiring synchronous time information corresponding to the path of target video stream from a specified frame of the path of target video stream; wherein the specified frame is a frame representing temporal supplementary information; the synchronous time information corresponding to each path of target video stream is the time information corresponding to the same time standard;

aligning the multiple paths of target video streams by utilizing the synchronous time information corresponding to each path of target video stream; the alignment processing is used for aligning video frames corresponding to the same synchronization time information in different target video streams;

performing confluence processing on the aligned multi-path target video stream to obtain confluence data; and the confluence processing representation synthesizes the aligned video frames in each path of target video stream into video frames corresponding to multiple pictures.

Optionally, the aligning the multiple target video streams by using the synchronization time information corresponding to each target video stream includes:

determining a reference video stream from the multiple target video streams;

for each path of target video stream except the reference video stream, determining the time offset of the path of target video stream relative to the reference video stream based on the synchronization time information corresponding to the path of target video stream and the synchronization time information corresponding to the reference video stream; wherein the time offset is used for representing the acquisition time difference of video frames aiming at the same frame sequence;

performing alignment processing on the multiple paths of target video streams by using the determined time offset

Optionally, the determining a reference video stream from the multiple target video streams includes:

and determining the target video stream with the earliest synchronization time information corresponding to the first frame of video frame in the multi-path target video streams as a reference video stream.

Optionally, the determining, based on the synchronization time information corresponding to the route target video stream and the synchronization time information corresponding to the reference video stream, a time offset of the route target video stream with respect to the reference video stream includes:

and calculating the difference value of the synchronization time information corresponding to the first frame of video frame in the route target video stream and the synchronization time information corresponding to the first frame of video frame in the reference video stream to obtain the time offset of the route target video stream relative to the reference video stream.

Optionally, the performing, by using the determined time offset, alignment processing on the multiple target video streams includes:

aiming at each path of target video stream except the reference video stream, aligning a first frame video frame in the path of target video stream with a reference frame in the reference video stream, so that each video frame in the path of target video stream is aligned with each video frame in the reference video stream;

the reference frame is the synchronous time information corresponding to the first frame of video frame in the reference video stream, and the video frame corresponding to the specified time offset is carried out; the specified time offset characterizes a time offset of the way target video stream relative to a time offset corresponding to the reference video stream.

Optionally, the method further comprises:

coding the confluence data according to a plurality of preset code rates to obtain a plurality of coded data;

aligning each video frame in the plurality of coded data frame by frame, and marking the aligned video frames with the same timestamp information; wherein the frame-by-frame alignment characterization aligns video frames of the same ordering position in each video frame of the plurality of encoded data;

and slicing the plurality of coded data according to the timestamp information corresponding to each video frame in the plurality of coded data and outputting the sliced coded data to a specified receiving end.

Optionally, the slicing the plurality of encoded data and outputting the sliced encoded data to a designated receiving end includes:

after slicing the encoded data for each encoded data, and obtaining at least two slices, the slice is output to a designated receiving end from the slice with the smallest corresponding time stamp information.

In a second aspect, an embodiment of the present invention provides a video data processing apparatus, which is applied to a server, where the apparatus includes:

the first acquisition module is used for acquiring a plurality of paths of target video streams acquired aiming at a target scene; each path of target video stream corresponds to a video picture of a visual angle;

the second acquisition module is used for acquiring the synchronous time information corresponding to each path of target video stream from the specified frame of the path of target video stream; wherein the designated frame is a frame representing temporal supplementary information; the synchronous time information corresponding to each path of target video stream is the time information corresponding to the same time standard;

the alignment module is used for aligning the multiple paths of target video streams by utilizing the synchronous time information corresponding to each path of target video stream; the alignment processing is used for aligning video frames corresponding to the same synchronization time information in different target video streams;

the confluence module is used for carrying out confluence processing on the multi-path target video stream after the alignment processing to obtain confluence data; and the confluence processing representation synthesizes the aligned video frames in each path of target video stream into video frames corresponding to multiple pictures.

Optionally, the alignment module comprises:

a first determining submodule, configured to determine a reference video stream from the multiple target video streams;

a second determining submodule configured to determine, for each target video stream other than the reference video stream, a time offset of the target video stream with respect to the reference video stream based on the synchronization time information corresponding to the target video stream and the synchronization time information corresponding to the reference video stream; wherein the time offset is used for representing the acquisition time difference of video frames aiming at the same frame sequence;

and the alignment submodule is used for performing alignment processing on the multi-path target video stream by using the determined time offset.

Optionally, the first determining submodule is specifically configured to:

Optionally, the second determining submodule is specifically configured to:

Optionally, the alignment sub-module is specifically configured to:

Optionally, the apparatus further comprises:

the encoding module is used for encoding the confluence data according to a plurality of preset code rates to obtain a plurality of encoded data;

the marking module is used for aligning each video frame in the plurality of coded data frame by frame and marking the aligned video frames with the same timestamp information; wherein the frame-by-frame alignment characterization aligns video frames of the same ordering position in each video frame of the plurality of encoded data;

and the slicing module is used for slicing the plurality of coded data according to the timestamp information corresponding to each video frame in the plurality of coded data and outputting the sliced coded data to a specified receiving end.

Optionally, the slicing module is specifically configured to:

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the video data processing method when executing the program stored in the memory.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the video data processing method described above.

The embodiment of the invention has the following beneficial effects:

according to the scheme provided by the embodiment of the invention, multiple paths of target video streams are obtained, and the synchronization time information corresponding to each path of target video stream is obtained from the obtained specified frame of the path of target video stream aiming at each path of target video stream. Therefore, after the multiple target video streams after the alignment process are subjected to the merging process, each video frame in the obtained merged data is composed of the video frames corresponding to the same time in each target video stream. And then, when the confluence data is subsequently output to a multi-picture display interface for playing, the played multi-picture can be accurately synchronized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a video data processing method according to an embodiment of the present invention;

fig. 2 is another flow chart of a video data processing method according to an embodiment of the present invention;

fig. 3 is a system block diagram of a specific example of a video data processing method according to an embodiment of the present invention;

FIG. 4 is a block diagram of a video data processing apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In the following, terms of art to which embodiments of the present invention relate will be described first.

timing Supplemental Enhancement Information (timing Supplemental Enhancement Information): text data inserted into the video stream to convey additional temporal information;

IDR (instant Decoder Refresh) frame: when the current frame is decoded, the decoder can completely clear the buffer, the subsequent frame can not decode by referring to the frame before the frame, and a new sequence is recalculated from the IDR frame to start encoding;

slicing: the video is divided into a plurality of segments, and each segment is a slice.

The traditional multi-picture live broadcast usually only simply synthesizes multiple paths of video streams corresponding to multiple visual angles, and outputs the multiple paths of video streams to a client for playing. When the user switches the visual angle or the contents of a plurality of visual angles are overlapped, the association among a plurality of pictures is poor due to the fact that the time for shooting the video by the video acquisition equipment is not corresponding, and the user experience is seriously influenced.

Based on the foregoing, in order to implement accurate synchronization of multiple pictures, embodiments of the present invention provide a video data processing method, apparatus, device, and storage medium.

First, a video data processing method according to an embodiment of the present invention will be described.

The video data processing method provided by the embodiment of the invention is applied to the server side, and the server side can be communicated with the client side and the video acquisition equipment. In practical application, the server can acquire multiple paths of video streams from video acquisition equipment corresponding to multiple viewing angles, and output confluence data obtained after processing the multiple paths of video streams to the client for playing. Illustratively, the video capture device may be a video camera, video recorder, or the like.

Specifically, the execution subject of the video data processing method may be a video data processing apparatus. Illustratively, when the video data processing method is applied to a server, the video data processing apparatus may be a computer program running in the server, and the computer program may be used to achieve accurate synchronization of multiple pictures.

The video data processing method provided by the embodiment of the disclosure may include the following steps:

for each path of target video stream, acquiring synchronous time information corresponding to the path of target video stream from a specified frame of the path of target video stream; wherein the designated frame is a frame representing temporal supplementary information; the synchronous time information corresponding to each path of target video stream is the time information corresponding to the same time standard;

The following describes a video data processing method according to an embodiment of the present invention with reference to the accompanying drawings.

As shown in fig. 1, the video data processing method provided by the embodiment of the present disclosure may include steps S101 to S104:

s101, acquiring a plurality of paths of target video streams collected aiming at a target scene; each path of target video stream corresponds to a video picture of a visual angle;

in this embodiment, a plurality of target video streams collected for a target scene are first obtained, and the obtained plurality of target video streams may be predetermined by a relevant worker. Illustratively, the target scene may be a scene to be subjected to multi-picture live broadcasting, such as a large evening party, a concert, and the like. For example, a worker may fill identification information of each video stream that needs to be displayed in a common-screen multi-picture mode in an operation interface of a live background in advance, and after receiving the identification information, a server side pulls each target video stream corresponding to the identification information of each video stream from a video acquisition device to complete acquisition of multiple target video streams.

It can be understood that each path of target video stream corresponds to a video picture at a viewing angle, that is, each path of target video stream is captured by a video capture device corresponding to a capturing viewing angle and output to one picture position of a plurality of picture positions set on a playing interface of the client for playing and displaying. In addition, when multiple target video streams are acquired, since the multiple target video streams are used for being displayed in the same playing interface, in this case, in order to ensure that multiple pictures in the playing interface have the same fluency when being displayed, the multiple target video streams may be acquired at the same frame rate. And the initial frame in each path of the acquired target video stream is an IDR frame, so that each path of the target video stream can be decoded and played smoothly.

S102, aiming at each path of target video stream, acquiring synchronous time information corresponding to the path of target video stream from a specified frame of the path of target video stream; wherein, the appointed frame is a frame representing the time supplementary information; the synchronous time information corresponding to each path of target video stream is the time information corresponding to the same time standard;

it can be understood that, since the designated frame is a frame that represents the temporal supplemental information of the target video stream, the synchronization time information corresponding to the target video stream can be obtained from the designated frame. Furthermore, since the synchronization time information is time information corresponding to the same time standard, when the alignment processing is subsequently performed on the multiple target video streams, the synchronization time information can be used to align the video frames in the target video streams corresponding to the same time.

For example, the synchronization Time information may be ntp (Network Time Protocol) data, ptp (precision Time Protocol) data, or the like included in the specific frame. For example, if the designated frame includes ntp data, the designated frame is analyzed to obtain the ntp data included in the designated frame, that is, the synchronization time information corresponding to the target video stream is obtained from the designated frame. Illustratively, the designated frame may be a timing sei frame, which contains ntp data corresponding to the target video stream, and the timing sei frame may be pushed by the video capture device when generating the video stream, each video frame in the video stream may correspond to a timing sei frame, and the timing sei frame following each video frame represents the capture time corresponding to the frame.

S103, aligning the multiple paths of target video streams by using the synchronous time information corresponding to each path of target video stream; the alignment processing is used for aligning video frames corresponding to the same synchronization time information in different target video streams;

it can be understood that, when performing multi-view live broadcasting, because the multi-view is a view corresponding to multiple views of a target scene, and video content corresponding to each view should have relevance, after acquiring multiple paths of target video streams, each path of target video stream may be aligned, so that after subsequently performing merge processing on the multiple paths of target video streams after alignment processing, each frame in the merge data is formed by corresponding video frames having the same time in each target video stream, so that when the merge data is played in a client, multiple views in a playing interface may be accurately synchronized.

Optionally, in an implementation manner, performing alignment processing on the multiple target video streams by using synchronization time information corresponding to each target video stream may include steps A1 to A3:

a1, determining a reference video stream from the multiple paths of target video streams;

it is understood that a reference video stream is determined from the multiple target video streams, and then the multiple target video streams can be aligned by aligning each target video stream other than the reference video stream with the reference video stream. Wherein the reference video stream may be any one of the multiple target video streams.

For example, in a specific implementation, determining a reference video stream from the multiple target video streams may include:

and determining the target video stream with the earliest synchronous time information corresponding to the first frame of video frame in the multi-path target video streams as a reference video stream.

Exemplarily, if the multiple paths of target video streams are an a path video stream and a B path video stream, and the synchronization time information corresponding to the first frame of video frame in the a path video stream is 18 00, and the synchronization time information corresponding to the first frame of video frame in the B path video stream is 18. It can be understood that, taking the target video stream with the earliest synchronization time information corresponding to the first frame of video frame as the reference video stream, it may be convenient to subsequently determine the time offset of each path of target video stream with respect to the reference video stream.

A2, aiming at each path of target video stream except the reference video stream, determining the time offset of the path of target video stream relative to the reference video stream based on the synchronization time information corresponding to the path of target video stream and the synchronization time information corresponding to the reference video stream; the time offset is used for representing the acquisition time difference of video frames aiming at the same frame sequence;

in this implementation, the difference between the synchronization time information corresponding to the route target video stream and the synchronization time information corresponding to the reference video stream may be determined as the time offset of the route target video stream with respect to the reference video stream. The synchronization time information corresponding to the target video stream and the synchronization time information corresponding to the reference video stream may be any synchronization time information corresponding to video frames in the same frame sequence. For example, the difference between the synchronization time information corresponding to the 1 st frame of video frame in the a-way video stream and the synchronization time information corresponding to the 1 st frame of video frame in the reference video stream may be determined as the time offset of the a-way target video stream with respect to the reference video stream.

For example, in a specific implementation manner, determining a time offset of the route target video stream relative to the reference video stream based on the synchronization time information corresponding to the route target video stream and the synchronization time information corresponding to the reference video stream may include:

It can be understood that, since the obtained multiple paths of target video streams have the same frame rate, the difference between the synchronization time information corresponding to the first frame of video frame in the path of target video stream and the synchronization time information corresponding to the first frame of video frame in the reference video stream is also the difference between the synchronization time information corresponding to any video frame in the path of target video stream and the reference video stream having the same frame order. Therefore, the alignment processing is subsequently performed on the multiple paths of target video streams by using the time offset obtained by calculating the difference, so that each video frame in the paths of target video streams can be aligned with each video frame in the reference video stream.

And A3, performing alignment processing on the multiple paths of target video streams by using the determined time offset.

It can be understood that, since the synchronization time information is information carried in the target video stream and corresponding to the capture time of the video stream, the time offset of each target video stream relative to the reference video stream determined based on the difference between the synchronization time information of each target video stream except for the reference video stream and the synchronization time information corresponding to the reference video stream can be characterized by the difference between the capture times of the acquired target video streams. In order to make the multiple pictures have relevance during the subsequent video playing, the determined time offset can be used to align the multiple paths of target video streams, that is, the video frames corresponding to the same synchronization time information in each path of video streams are aligned, so that the difference between the multiple pictures can be reduced to be within 1 frame.

It is understood that, there are various ways to perform the alignment process on the multiple target video streams by using the determined time offset, and in an exemplary implementation, performing the alignment process on the multiple target video streams by using the determined time offset may include:

the reference frame is the synchronous time information corresponding to the first frame of video frame in the reference video stream, and the video frame corresponding to the specified time offset is carried out; the specified time offset characterizes a time offset of the way destination video stream relative to a time offset corresponding to the reference video stream.

It can be understood that, since the first frame of each video stream may be a video frame corresponding to different time, in order to make each frame in the obtained merged data be composed of video frames corresponding to the same time in each target video stream when subsequently performing merging processing on multiple video streams, the multiple target video streams may also be aligned by using the determined time offset.

In this implementation, the first frame video frame in each target video stream except the reference video stream is aligned with the reference frame in the reference video stream, so that each video frame in the target video stream is aligned with each video frame in the reference video stream. Exemplarily, if the video frame sequence corresponding to the a path of video stream is from A1 frame to a10 frame, and the synchronization time information corresponding to the A1 frame is 18; the video frame sequence corresponding to the B path of video stream is from B1 frame to B10 frame, and the synchronization time information corresponding to the B1 frame is 18; the video frame sequence corresponding to the reference video stream is from C1 frame to C10 frame, and the synchronization time information corresponding to C1 frame is 18. The time offset of the a-channel video stream with respect to the reference video stream is 10 seconds and the time offset of the B-channel video stream with respect to the reference video stream is 20 seconds. Aiming at the A path video stream, taking a video frame corresponding to the video frame of the first frame of the reference video stream after the synchronization time information is shifted for 10 seconds as a reference frame, namely determining the C2 frame as the reference frame, and aligning the video frame of the first frame of the A path video stream with the reference frame to realize the alignment of the video frames corresponding to the same synchronization time information; for the B path video stream, taking a video frame corresponding to the video frame of the first frame of the reference video stream after the synchronization time information corresponding to the first frame of the reference video stream is shifted for 20 seconds as a reference frame, namely determining the C3 frame as the reference frame, and aligning the video frame of the first frame of the B path video stream with the reference frame to realize the alignment of the video frames corresponding to the same synchronization time information.

It can be understood that, for each target video stream other than the reference video stream, after aligning the first frame video frame in the target video stream with the reference frame in the reference video stream, since the frame rates of the respective video streams are the same, the other video frames in the target video stream are also aligned with the video frames in the reference video stream.

S104, performing confluence processing on the aligned multi-path target video stream to obtain confluence data; and the confluence processing representation synthesizes the aligned video frames in each path of target video stream into video frames corresponding to multiple pictures.

In this embodiment, the multiple paths of target video streams after the alignment processing are subjected to merging processing, that is, video frames in the multiple paths of target video streams after the alignment processing are superimposed according to a multi-picture layout. For example, a worker may set layout information of multiple frames in an operation interface of a live background in advance, and after receiving the layout information, the server may merge multiple paths of target video streams according to a multiple-frame layout style represented by the layout information.

It can be understood that, since the video frames corresponding to the same time in the multiple target video streams after the alignment process are aligned, each frame in the merged data obtained after the multiple target video streams after the alignment process are merged is composed of the video frames corresponding to the same time in the respective target video streams. Therefore, when the confluence data is subsequently output to a multi-picture display interface for playing, the multi-picture can be accurately synchronized.

Optionally, in another embodiment of the present invention, on the basis of the embodiment shown in fig. 1, as shown in fig. 2, the method may further include:

s201, encoding the confluence data according to a plurality of preset code rates to obtain a plurality of encoded data;

it can be understood that, because the code rate and the definition are in direct proportion under the condition of a certain resolution, the merged data is encoded at multiple code rates to obtain encoded data corresponding to multiple code rates, and the encoded data at different code rates can be output to a client for playing subsequently, so that the target video stream is played at different definitions.

For example, a worker may set a plurality of code rates in an operation interface of a live broadcast background in advance according to a plurality of definitions required by multi-picture live broadcast, and submit the code rates to a server, so that the server may input the merged data into an encoder corresponding to the plurality of code rates for encoding according to the plurality of code rates to obtain a plurality of encoded data. The encoding format for encoding the merged data may be h.264, h.265, or other encoding formats, and the encoding format of the encoded data in the embodiment of the present invention is not limited.

S202, aligning each video frame in the plurality of coded data frame by frame, and marking the aligned video frames with the same timestamp information; wherein the frame-by-frame alignment representation aligns video frames of the same ordering position in each video frame of the plurality of encoded data;

it can be understood that, because the corresponding output speeds are inconsistent when the encoded data with different code rates are output, if the encoded data corresponding to a plurality of code rates are directly output to the client, the pictures corresponding to the client before and after switching are inconsistent when the definition is switched, that is, the connectivity of the pictures before and after the code rate switching is poor, so that the user experience is poor. In order to solve the above problem, after the encoded stream data is encoded to obtain a plurality of encoded data, each video frame in the plurality of encoded data may be aligned frame by frame, that is, video frames having the same frame sequence are aligned and marked with the same timestamp information. Therefore, when the plurality of encoded data are sliced and output subsequently, the plurality of encoded data can be sliced according to the same timestamp information, so that the slices with the same start timestamp information corresponding to different encoded data correspond to the same slice duration.

And S203, slicing the plurality of coded data according to the time stamp information corresponding to each video frame in the plurality of coded data, and outputting the plurality of coded data to a designated receiving end.

In this embodiment, each encoded data may be sliced according to the same timestamp information, that is, the encoded data is divided into a plurality of slices according to the same timestamp information, so that the slice durations of the slices corresponding to the same start timestamp information are the same. For example, if the video frames in the encoded data a are A1 frame to a10 frame, and the video frames in the encoded data B are B1 frame to B10 frame, the A1 frame and the B1 frame are labeled with the same timestamp information, the A2 frame and the B2 frame are labeled with the same timestamp information, and so on. It can be understood that, since the encoded data a and the encoded data B are obtained by encoding the same confluence data at different code rates, after the video frames with the same sequencing position are marked with the same timestamp information, the plurality of encoded data are sliced according to the timestamp information corresponding to each video frame in the plurality of encoded data, so that the start frame and the end frame in each slice corresponding to the same start timestamp information can be the same video frame, and the slice durations of the slices corresponding to the same start timestamp information are the same.

For example, the designated receiving end to which the plurality of encoded data are output after being sliced may be a client, a Content Delivery Network (CDN) end, or the like, and if the designated receiving end is the CDN end, the client pulls each slice from the CDN end to play. It can be understood that, after slicing a plurality of coded data according to the same timestamp information, in each slice corresponding to the plurality of coded data, each slice corresponding to the same start timestamp information is a slice corresponding to the same video content, therefore, when a user switches definition at a client, the slice corresponding to a playing position before switching can be searched for a slice having the same start timestamp information with the slice, and the slice is played as the slice after switching, thereby seamless connection can be achieved when switching code rates.

Optionally, in an implementation manner, slicing the plurality of encoded data and outputting the sliced encoded data to a designated receiving end may include:

for each piece of encoded data, after slicing the piece of encoded data, after obtaining at least two slices, the output of the slice is performed to a designated receiving end from the slice with the smallest corresponding timestamp information.

It can be understood that, after the encoded data is sliced, if the slice is directly output to the client for playing, when the user of the client switches the definition, because the output speeds of the slices with different code rates are not consistent, the slice corresponding to the switched code rate may not be generated, and at this time, the problems of cut-off, blocking and the like may be caused. Therefore, in the implementation manner, after obtaining at least two slices, the output of the slices is performed to the designated receiving end from the slice with the smallest timestamp information, and since at least one slice is cached in the server, when the client switches the code rate, the slice corresponding to the switched code rate can be obtained, so that the client can smoothly play the video picture after switching the definition.

Therefore, by the scheme, seamless connection of the pictures before and after switching can be realized during switching among multiple code rates on the basis of realizing accurate synchronization of multiple pictures.

In order to clearly understand the contents of the embodiments of the present invention, the contents of the embodiments of the present invention are described below with reference to a specific example.

The example is developed and designed based on a live broadcast coding system and a video distribution scheduling system, wherein the live broadcast coding system starts to cache after being aligned with a start frame through a cache module, namely, an IDR frame is used as the start frame to start to cache; the buffered multi-channel video streams (corresponding to the multi-channel target video streams) need to be aligned and corrected through an input synchronization module; then, inputting the multiple paths of video frames corresponding to the same time into a confluence module, and copying each converged frame and sending the copied frames into an encoder corresponding to multiple code rates for encoding to obtain encoded data corresponding to the multiple code rates; and sequentially acquiring video frames after encoding output, marking the same pts (time stamp) for the aligned video frames, and finally ensuring that the encoded data with different code rates are synchronously output at the same frame evanescent speed and pts. The video distribution scheduling system depends on pts to distribute slices, so that the slicing time length, the frame number and the initial pts corresponding to the slices of the obstructed code rate are completely consistent. In order to enable the slices corresponding to a plurality of code rates and having the same start pts to exist at the same time, at least 1 slice can be preset in advance, that is, after at least two slices are obtained, the slice is output to the client from the slice with the minimum corresponding timestamp information, so that the client can seamlessly switch and watch among different code rates.

Fig. 3 is a system block diagram illustrating a specific example of implementing the video data processing method, as shown in fig. 3, including a buffer module, an input synchronization module, a merge module, an encoding module, and an output synchronization module, and the functions of the respective modules are described below.

(1) Cache module

The module mainly carries out format analysis and data caching on the input multi-channel video stream. As shown in fig. 3, input 0, input 1 to input n are input multiple video streams, and the buffer module obtains the multiple video streams from the video capture device for buffering. The buffering time can be flexibly configured according to different types of video streams, is usually 10s, buffering needs to be started from an IDR frame during buffering, the strict adjacent relation between an initial IDR frame and a timing SEI frame is guaranteed, and frame matching precision errors are avoided. Wherein, ntp data (corresponding to the above synchronization time information) is contained in a timing sei frame (corresponding to the above specified frame), which is a frame containing ntp data that is suppressed in real time by the video capture device when capturing the video stream.

(2) Input synchronization module

The module is a core module for realizing multi-picture synchronization and is used for aligning all video frames with the same time in a plurality of paths of input streams so as to complete the synchronization of all video frames in a plurality of paths of video streams before confluence. Wherein, the synchronous strategy is as follows:

by acquiring the timing SEI frame in the video stream, taking the video stream with the minimum ntp data corresponding to the first frame video frame in the multi-path video stream as a reference stream, calculating the time offset of the reference stream according to the ntp data corresponding to the respective first frame video frame of other video streams, and waiting until the video stream is correct and real-time, starting to output the cache data, namely starting to output after all the video frames corresponding to the same ntp data are cached, so that all the video frames corresponding to the same ntp data are synthesized into the video frames corresponding to multiple pictures, and accurate frame synchronization is achieved. In this case, the difference between the multiple pictures can be shortened to within 1 frame (40 ms) by the synchronization process.

(3) Converging module

The module is mainly used for superposing and converging the multi-path video streams according to the preset multi-picture layout to obtain converged data and finally performing multi-picture display in one interface.

(4) Coding module

The module is used for encoding and compressing the merged stream data, in this example, the live broadcast encoding system supports multiple encoding formats including h.264, h.265, and the like.

(5) Output synchronization module

Since multi-screen synchronization relies on a live transcoding service, wherein the live transcoding service reads, demultiplexes, transcodes, applies a whole set of services that specify the special effect of an audio/video filter and outputs an output stream in a specified format from the input, in this example, an output synchronization module is also integrated into the flow of the live transcoding service.

Under the default condition, the multi-code-rate coding is pushed to a specified CDN end or a special media server in real time, namely, the coded coding data is immediately sent through a network interface. In this mode, because the output schedules of the encoded data with multiple code rates are inconsistent, a user can see a previous or subsequent picture when switching the code rates, thereby bringing bad experience.

The output synchronization module is used for aligning the video frames with multiple code rates frame by frame, marking the aligned video frames with the same pts, slicing the video frames one by one according to the pts during slicing processing, and naming the slices by the initial pts corresponding to the slices so that the slices with the same name are the slices with the same video content in different code rates. And at least one slice is preset and then output, so that no matter which code rate a user switches to, the slices with different code rates corresponding to the switched video stream can be accurately found in the pulled video stream, and video pictures played before and after the code rate switching can be seamlessly connected. As shown in fig. 3, input 0, input 1 through input n are a plurality of slices of output.

The specific implementation flow of this example is as follows:

(1) A user fills multiple paths of video streams needing common-screen multi-picture display in a live background interface, selects proper output layout and multi-code-rate output, clicks a start button, automatically pulls the video streams by the live background at the moment, analyzes format information of the video streams to obtain ntp data, and aligns all video frames with the same time in the multiple paths of video streams by using the ntp data.

(2) And simultaneously sending the multiple paths of video streams subjected to the alignment processing into a decoder for decoding, and sequentially sending the decoded multiple paths of bare frames into a confluence module to complete superposition and confluence of multiple pictures to obtain confluence data.

(3) And coding the composite stream data at multiple code rates, sequentially sending the coded data at the multiple code rates to an output synchronization module, aligning each video frame in the coded data frame by frame, marking the aligned video frames with the same pts, and sending the aligned video frames to a slice server. And the slicing server slices the coded data according to pts and the slicing duration, and then distributes the sliced data to other edge nodes through the CDN, at the moment, a user can see a completely synchronous intra-frame picture at a client, and meanwhile, seamless connection can be achieved through switching among multiple code rates.

Therefore, according to the scheme, the video acquisition equipment encapsulates the real-time ntp data corresponding to the video stream in the appointed frame of the video stream, and does not need additional out-of-band data transmission logic, namely, additional transmission of time information corresponding to the video stream is not needed, so that the complexity of the system can be reduced; the functional module can be deployed in the existing system in a plug-in mode by adopting a plug-in deployment strategy, so that the existing system is small in invasion, low in coupling, convenient to upgrade and deploy and higher in stability; the real-time video frame synchronous output technology is adopted to ensure the accurate synchronization of the multi-code rate switching point and the picture; and (3) adopting pts named slices corresponding to the video frames and a slice output alignment mechanism to ensure the continuity of slice contents.

Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a video data processing apparatus, which is applied to a server, and as shown in fig. 4, the apparatus includes:

a first obtaining module 410, configured to obtain multiple paths of target video streams collected for a target scene; each path of target video stream corresponds to a video picture of a visual angle;

a second obtaining module 420, configured to, for each route of target video stream, obtain synchronization time information corresponding to the route of target video stream from a specified frame of the route of target video stream; wherein the designated frame is a frame representing temporal supplementary information; the synchronous time information corresponding to each path of target video stream is the time information corresponding to the same time standard;

an alignment module 430, configured to perform alignment processing on the multiple paths of target video streams by using synchronization time information corresponding to each path of target video stream; the alignment processing is used for aligning video frames corresponding to the same synchronization time information in different target video streams;

a merging module 440, configured to perform merging processing on the aligned multiple paths of target video streams to obtain merged data; and the confluence processing representation synthesizes the aligned video frames in each path of target video stream into video frames corresponding to multiple pictures.

Optionally, the alignment module comprises:

a first determining sub-module, configured to determine a reference video stream from the multiple target video streams;

Optionally, the first determining submodule is specifically configured to:

Optionally, the second determining submodule is specifically configured to:

Optionally, the alignment submodule is specifically configured to:

Optionally, the apparatus further comprises:

the encoding module is used for encoding the confluent data according to a plurality of preset code rates to obtain a plurality of encoded data;

Optionally, the slicing module is specifically configured to:

An embodiment of the present invention further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,

a memory 503 for storing a computer program;

the processor 501 is configured to implement the steps of the video data processing method according to any of the above embodiments when executing the program stored in the memory 503.

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

In still another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the video data processing method described in any of the above embodiments.

In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the video data processing method of any of the above embodiments.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A video data processing method, applied to a server, the method comprising:

2. The method according to claim 1, wherein the performing alignment processing on the multiple target video streams by using the synchronization time information corresponding to each target video stream comprises:

determining a reference video stream from the multiple target video streams;

and performing alignment processing on the multiple paths of target video streams by using the determined time offset.

3. The method of claim 2, wherein said determining a reference video stream from the plurality of target video streams comprises:

4. The method of claim 2, wherein determining the time offset of the way target video stream relative to the reference video stream based on the synchronization time information corresponding to the way target video stream and the synchronization time information corresponding to the reference video stream comprises:

5. The method according to claim 3, wherein said performing the alignment process on the multiple target video streams by using the determined time offset comprises:

6. The method according to any one of claims 1-5, further comprising:

aligning each video frame in the plurality of coded data frame by frame, and marking the aligned video frames with the same timestamp information; wherein the frame-by-frame alignment characterization aligns video frames of the same rank order position in each video frame in the plurality of encoded data;

7. The method of claim 6, wherein the slicing the encoded data and outputting the sliced encoded data to a designated receiving end comprises:

8. A video data processing apparatus, applied to a server, the apparatus comprising:

the confluence module is used for carrying out confluence processing on the aligned multi-path target video stream to obtain confluence data; and the confluence processing representation synthesizes the aligned video frames in each path of target video stream into video frames corresponding to multiple pictures.

9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.