WO2024098836A1

WO2024098836A1 - Video alignment method and apparatus

Info

Publication number: WO2024098836A1
Application number: PCT/CN2023/108948
Authority: WO
Inventors: 冯宇飞; 汤然; 郑龙
Original assignee: 上海哔哩哔哩科技有限公司
Priority date: 2022-11-11
Filing date: 2023-07-24
Publication date: 2024-05-16
Also published as: CN115802054A

Abstract

Disclosed in the present disclosure is a video alignment method and apparatus. The method comprises: acquiring supplemental enhancement information (SEI) information in a source video; transcoding the source video, and copying and writing the SEI information to obtain a transcoded video; and, according to the same SEI information in the source video and the transcoded video, performing video alignment on the source video and the transcoded video. The SEI information of the source video is copied and written into the transcoded video, such that the SEI information in the obtained transcoded video and the SEI information in the source video are the same SEI information.

Description

Video alignment method and device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to the Chinese patent application filed with the China Patent Office on November 11, 2022, with application number 2022114114271 and title “Video Alignment Method and Device”, the entire contents of which are incorporated by reference into this application.

Technical Field

The present disclosure relates to the field of live broadcast technology, and in particular to a video alignment method and device.

Background technique

In the live broadcast business, in order to facilitate users to watch videos and reduce video freezes, the host's source video can be transcoded to obtain transcoded videos, which can be pushed to users for viewing.

For transcoded videos, quality assessment is required to ensure that their quality does not affect the viewing experience of users. Quality assessment can be performed by visually observing the same screen of the source video and the transcoded video. However, due to the delay introduced during the transcoding process, the screen of the transcoded video and the source video cannot be aligned. Therefore, a method for video alignment is needed.

Summary of the invention

In view of the above problems, the embodiments of the present disclosure are proposed to provide a video alignment method and device that overcome the above problems or at least partially solve the above problems.

According to a first aspect of an embodiment of the present disclosure, a video alignment method is provided, which includes:

Obtaining supplemental enhancement information SEI information in the source video;

The source video is transcoded and the SEI information is copied and written to obtain a transcoded video;

The source video and the transcoded video are aligned according to the same SEI information in the source video and the transcoded video.

According to a second aspect of an embodiment of the present disclosure, a video alignment device is provided, comprising:

An acquisition module, adapted to acquire supplementary enhancement information SEI information in a source video;

The copy-write module is suitable for transcoding the source video and copying and writing the SEI information to obtain the transcoded video;

The alignment module is adapted to align the source video and the transcoded video according to the same SEI information in the source video and the transcoded video.

According to a third aspect of an embodiment of the present disclosure, there is provided a computing device, including: a processor, The memory, the communication interface and the communication bus, the processor, the memory and the communication interface communicate with each other via the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction enables the processor to execute operations corresponding to the above-mentioned video alignment method.

According to a fourth aspect of an embodiment of the present disclosure, a non-volatile computer-readable storage medium is provided, in which at least one executable instruction is stored, and the executable instruction enables a processor to perform operations corresponding to the above-mentioned video alignment method.

According to a fifth aspect of an embodiment of the present disclosure, a computer program product is provided, which includes a computer program stored on the above-mentioned non-volatile computer-readable storage medium.

According to the video alignment method and device provided by the present disclosure, the SEI information of the source video is copied and written into the transcoded video, that is, the SEI information in the obtained transcoded video is the same as that in the source video. According to the same SEI information in the source video and the transcoded video, the source video and the transcoded video can be aligned, and there is no need to decode the video images of the source video and the transcoded video, which saves resources and greatly improves the processing speed of video alignment.

The above description is only an overview of the technical solution of the present disclosure. In order to more clearly understand the technical means of the present disclosure, it can be implemented according to the contents of the specification. In order to make the above and other purposes, features and advantages of the present disclosure more obvious and easy to understand, the specific implementation methods of the present disclosure are listed below.

BRIEF DESCRIPTION OF THE DRAWINGS

Various other advantages and benefits will become apparent to those of ordinary skill in the art by reading the detailed description of the preferred embodiments below. The accompanying drawings are only for the purpose of illustrating the preferred embodiments and are not to be considered as limiting the present disclosure. Also, the same reference symbols are used throughout the accompanying drawings to represent the same components. In the accompanying drawings:

FIG1 shows a flow chart of a video alignment method according to an embodiment of the present disclosure;

FIG2 shows a flow chart of a video alignment method according to another embodiment of the present disclosure;

FIG3 shows a schematic structural diagram of a video alignment device according to an embodiment of the present disclosure; and

FIG. 4 shows a schematic diagram of the structure of a computing device according to an embodiment of the present disclosure.

Preferred embodiments of the present disclosure

The exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although the exemplary embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided in order to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

First, the terms involved in one or more embodiments of the present disclosure are explained.

Transcoding: re-encoding audio and video;

Video alignment: The content of the video played before and after transcoding is the same;

Video codec: A program or device that can compress or decompress digital video;

Bitstream: a data structure that describes the properties of a video frame;

ffprobe: collects information from multimedia streams and prints it in a human- and machine-readable form;

ffmpeg: An open source computer program that can be used to record, convert, and stream digital audio and video;

SEI (Supplemental Enhancement Information): Supplemental enhancement information provides users with a method to add additional information to the video stream;

PTS (Presentation Time Stamp): Displays the timestamp, telling the player when to play this frame of data.

FIG. 1 shows a flow chart of a video alignment method according to an embodiment of the present disclosure. As shown in FIG. 1 , the method includes the following steps:

Step S101, obtaining supplemental enhancement information SEI information in a source video.

SEI information is supplementary enhancement information, which can be inserted into the audio and video stream to convey additional information. SEI information includes payloadType, which defines the SEI information type, payloadSize, which is the size of the SEI message, and uuid_iso_iec_11578, which starts writing content from byte 16. The content can be customized. In this embodiment, SEI information is used to align the video. The content can use data such as tags that are convenient for comparison. The video alignment is completed based on the comparison results. The specific content can be set according to the implementation situation and is not limited here.

Furthermore, SEI information is not a mandatory option in the decoding process, but can be used for fault tolerance and error correction in the decoding process, and can be integrated into the video bitstream. In other words, SEI information can be inserted in the video generation and video transmission process, and can be transmitted together with the video through the transmission link, and SEI information can be obtained without decoding, which reduces resource consumption and has a faster processing speed. Based on the above characteristics, before the edge computing node transcodes the source video, that is, after the edge computing node receives the source video, the SEI information in the source video can be directly and quickly obtained. Specifically, ffmpeg is used to decapsulate the source video, and the SEI information therein is obtained, and the SEI information is determined according to payloadType, payloadSize, etc., and the content contained in uuid_iso_iec_11578 is read.

Step S102, transcoding the source video, and copying and writing the SEI information to obtain a transcoded video.

To transcode the source video, the source video needs to be decoded first and then re-encoded. This process can be decoded and encoded using a video codec, etc., to facilitate users to stream and watch. Considering that the SEI information will be modified during the transcoding process, the SEI information in the final transcoded video will be inconsistent with the SEI information in the source video. Therefore, this embodiment first obtains the SEI information in the source video, and then transcodes the source video. After the transcoding process, the obtained SEI information is copied and written to the transcoded video, so that the SEI information in the final transcoded video is consistent with the SEI information in the source video.

Step S103: align the source video and the transcoded video according to the same SEI information in the source video and the transcoded video.

After obtaining the transcoded video, the SEI information in the source video and the transcoded video can be compared, such as decapsulating the source video and the transcoded video to obtain the SEI information of the source video and the SEI information of the transcoded video. Since the SEI information is bound to the video frame, the SEI information of the source video and the SEI information of the transcoded video can be compared one by one according to the video frame to determine the video frame with the same SEI information location, that is, to determine the video frame with the same playback content in the source video and the transcoded video, and align the source video and the transcoded video according to the same video frame, so that the source video and the transcoded video play the same content.

According to the video alignment method provided by the present disclosure, the SEI information of the source video is copied and written into the transcoded video, that is, the SEI information in the obtained transcoded video is the same as that in the source video. According to the same SEI information in the source video and the transcoded video, the source video and the transcoded video can be aligned, and there is no need to decode the video images of the source video and the transcoded video, which saves resources and greatly improves the processing speed of video alignment.

FIG2 shows a flow chart of a video alignment method according to an embodiment of the present disclosure. As shown in FIG2 , the method includes the following steps:

Step S201, adding SEI information to the video frames in the source video.

SEI information can be added to the source video of the live broadcast. The SEI information is added before the source video is transcoded, such as when the source video of the anchor is uploaded to the server. The source video with added SEI information is transmitted to the edge computing node for transcoding, or transmitted to the edge computing node and SEI information is added before transcoding, etc., which is not limited here.

The added SEI information corresponds to the video frames in the source video, and SEI information is added to the video frames in the source video. Specifically, SEI information corresponding to each video frame in the source video is added. SEI information is set in an incremental manner. The content contained in uuid_iso_iec_11578 can be SEI serial number sei_idx. In the first frame, SEI serial number sei_idx is set to 1, in the second frame, SEI serial number sei_idx is set to 2, and in the third frame, SEI serial number sei_idx is set to 3... Based on the SEI serial number, each video frames, and the increasing order of SEI sequence numbers matches the order of video frames; or, the content contained in uuid_iso_iec_11578 may be PTS (Presentation Time Stamp), which can tell the video player the specific time to play the corresponding video frame. PTS is related to the order of video frames in the source video. The PTS time of the earlier video frame is earlier than the PTS time of the later video frame. The PTS in each video frame will also show an increasing trend according to the order of the video frames. The specific value of PTS can be set by the encoder when generating the source video, such as selecting a reference clock, the time on the reference clock is linearly increasing, and the encoder timestamps each video frame according to the time on the reference clock. The timestamp is PTS. The above is an example, and it can also be set in other ways, which is not limited here; or, the content contained in uuid_iso_iec_11578 can be an identification code, such as a QR code or other identification codes. When generating the identification code, it can be converted to generate a corresponding identification code according to the increasing serial number or sequence number, and a different identification code is set for each video frame, so that the identification code of the SEI information corresponding to each video frame also shows an increasing trend, which corresponds to the order of the video frames and can effectively distinguish each video frame. The above SEI information is an example, and the specific settings can be set according to the implementation situation, which will not be explained in detail here.

SEI information belongs to the category of bitstream, which is additional information added to the video bitstream. SEI information adopts bitstream. When adding, for example, ffmpeg is used to extract h.264 bitstream from the source video, and h264_metadata bitstream filter is used to add SEI information. The above is an example, and the specific technical means used to add SEI information are not limited here.

Furthermore, before adding SEI information to the video frames of the source video, ffmpeg can be used to detect whether SEI information already exists in the source video. If it already exists, there is no need to add it again to avoid repeated settings. For example, by using ffmpeg to decapsulate the source video, ffprobe can be used to view various information of the audio and video files, such as the encapsulation format, audio/video stream information, data packet information, etc. By viewing the structure avPacket that stores compressed encoded data in the data packet information, it can be checked whether the corresponding SEI information is set for each video frame in the source video to avoid repeated settings.

Step S202: Acquire SEI information in the source video.

Before transmitting the source video with added SEI information to the edge computing node for transcoding, the source video is decapsulated, such as using the ffmpeg tool to decapsulate the source video to obtain the structure avPacket storing the compressed encoded data in the source video, and the SEI information is set in the avPacket. By parsing the structure avPacket, the SEI information of the source video can be obtained.

Here, only the source video is decapsulated to obtain its SEI information, and no decoding or re-encoding is performed on it, which can reduce resource consumption and has a faster processing speed. The source video is only decapsulated without any other processing, and does not involve modification of the specific video content in the source video, ensuring that the source video is still the original video content.

Step S203: storing the SEI information into a cache based on the decoding timestamp of the structure.

Considering that when the source video is subsequently transcoded, the transcoding process will decode the source video and re-encode it. During this process, the SEI information in the transcoded video is modified and inconsistent with the SEI information of the source video, making it impossible to align the video directly. Therefore, it is necessary to store the obtained SEI information of the source video first, so that it is convenient to directly copy and write the transcoded video later. If the SEI information is stored in the cache, the SEI information can be directly obtained from the cache after the transcoding process for copying and writing.

Furthermore, SEI information corresponds to video frames, and multiple video frames correspond to multiple SEI information. When copying and writing, it is also necessary to determine the specific video frame corresponding to the writing. When storing, SEI information can be stored according to DTS (Decoding Time Stamp) in avPacket. Specifically, DTS is used to tell the player when to decode the data of this video frame, corresponding to the video frame. Storing SEI information according to DTS can facilitate decoding video frames according to DTS when transcoding the source video, and obtaining SEI information corresponding to DTS, which is convenient for accurate writing.

Step S204, transcoding the source video, obtaining SEI information according to the decoding timestamp of the structure, and copying and writing the SEI information to obtain a transcoded video.

The source video is transcoded, such as decoding the source video, such as re-encoding each video frame according to the decoding timestamp DTS corresponding to the source video, and obtaining the SEI information corresponding to each video frame from the cache according to the decoding timestamp DTS of the structure, and directly copying the SEI information into the transcoded video to obtain the transcoded video. The SEI information in the transcoded video is a copy of the SEI information of the source video, which ensures that the SEI information in the transcoded video is consistent with the SEI information of the source video.

Step S205: align the source video and the transcoded video according to the same SEI information in the source video and the transcoded video.

When performing video alignment, according to the SEI information contained in the source video and the transcoded video, the video frames with the same SEI information are aligned to achieve video alignment of the source video and the transcoded video. Specifically, the source video and the transcoded video are decapsulated, such as using ffmpeg to decapsulate the source video and the transcoded video, and the SEI information of the source video and the SEI information of the transcoded video are obtained respectively. The SEI information of the source video and the SEI information of the transcoded video are compared to determine the first video frame of the source video and the second video frame of the transcoded video corresponding to the same SEI information. If the SEI information is an SEI sequence number, then the SEI sequence number of the SEI information of each video frame in the source video is compared with the SEI sequence number of the SEI information of each video frame in the transcoded video to determine the first video frame of the source video and the second video frame of the transcoded video with the same SEI sequence number. For example, if the SEI sequence number in the first frame of the source video is 1 and the SEI sequence number in the first frame of the transcoded video is also 1, the first video frame of the source video is the first frame of the source video, and the second video frame of the transcoded video is the first frame of the transcoded video. Alternatively, if the SEI information is a display timestamp, then the display timestamp of the SEI information of each video frame in the source video is compared with the display timestamp of the SEI information of each video frame in the transcoded video. When the display timestamp in the video frame of the source video is the same as the display timestamp in the video frame of the transcoded video, the first video frame of the source video and the second video frame of the transcoded video with the same display timestamp are determined; or, the identification code of the SEI information of each video frame in the source video is compared with the identification code of the SEI information of each video frame in the transcoded video, when the identification code in the video frame of the source video is the same as the identification code in the video frame of the transcoded video, the first video frame of the source video and the second video frame of the transcoded video with the same identification code are determined. Furthermore, the number of the first video frames of the source video and the number of the second video frames of the transcoded video are the same. For example, after comparison, it is determined that the number of the first video frames of the source video is 20, and the number of the second video frames of the transcoded video is also 20.

When there are multiple first video frames and second video frames, when playing or storing the first video frames of the source video and the second video frames of the transcoded video at the same time, the first video frames of the source video and the second video frames of the transcoded video with the same SEI information can be sorted in the order of video playing, such as the first video frame of the source video includes the first frame, the second frame, the third frame... The second video frame of the transcoded video includes the first frame, the second frame, the third frame..., each according to the order of their respective video playing, such as the order of the display timestamp of the PTS, and a one-to-one correspondence is established between each first video frame and the second video frame, such as the first frame of the source video corresponds to the first frame of the transcoded video, the second frame of the source video corresponds to the second frame of the transcoded video, the third frame of the source video corresponds to the third frame of the transcoded video... During playing, the first video frame and the second video frame with a corresponding relationship can be played at the same time to complete the video alignment, and the pictures of the two video frames can be visually observed to perform quality assessment, etc. During storage, the corresponding first video frame and the second video frame are stored according to the corresponding relationship to complete the video alignment of the source video and the transcoded video.

Based on the above processing, taking a 2s video as an example, the processing speed can be increased by about 50 times compared with the decoding operation by only decapsulating, and the CPU performance occupancy is reduced from 1764.14% of decoding to 3.14%, greatly reducing resource consumption. When it is implemented, it will be slightly different according to the device performance data. In this embodiment, when aligning the video, decoding is not required, which can greatly improve the processing speed and reduce the decoding pressure.

According to the video alignment method provided by the present disclosure, the SEI information of the acquired source video is first stored in the cache, and during the transcoding process, the SEI information of the source video is copied and written into the transcoded video, and the SEI information in the transcoded video finally obtained is consistent with the SEI information in the source video, and the SEI information in the source video and the transcoded video can be compared to determine the first video frame of the source video and the second video frame of the transcoded video corresponding to the same SEI information, and the first video frame of the source video and the second video frame of the transcoded video are played and stored at the same time to complete the video alignment of the source video and the transcoded video. In this process, only the source video and the transcoded video are decapsulated to obtain the SEI information for comparison, and there is no need to decode the video screens of the source video and the transcoded video, which improves the processing speed and reduces the occupation of resources.

FIG3 shows a schematic diagram of the structure of a video alignment device provided by an embodiment of the present disclosure. As shown, the device comprises:

An acquisition module 310 is adapted to acquire supplemental enhancement information SEI information in a source video;

The copy-write module 320 is adapted to perform transcoding processing on the source video and copy-write the SEI information to obtain a transcoded video;

The alignment module 330 is adapted to perform video alignment on the source video and the transcoded video according to the same SEI information in the source video and the transcoded video.

Optionally, the acquisition module 310 is further adapted to:

Decapsulate the source video to obtain a structure storing compressed coded data;

Parse the structure to obtain the SEI information of the source video.

Optionally, the device further comprises:

The cache module 340 is adapted to store the SEI information into a cache based on the decoding timestamp of the structure.

Optionally, the copy-write module 320 is further adapted to:

Transcode the source video;

Get the corresponding SEI information from the cache according to the decoding timestamp of the structure;

The SEI information is copied and written into the transcoded video to obtain a transcoded video.

Optionally, the device further comprises:

The adding module 350 is adapted to add SEI information to the video frames in the source video.

Optionally, the SEI information is set in an incremental manner.

Optionally, the SEI information includes a SEI sequence number, a display timestamp and/or an identification code.

Optionally, the alignment module 330 is further adapted to:

Decapsulate the source video and the transcoded video, and obtain the SEI information of the source video and the SEI information of the transcoded video respectively;

Compare the SEI information of the source video and the SEI information of the transcoded video to determine a first video frame of the source video and a second video frame of the transcoded video corresponding to the same SEI information;

The first video frame of the source video and the second video frame of the transcoded video are played and/or stored simultaneously to complete the video alignment of the source video and the transcoded video.

Optionally, the alignment module 330 is further adapted to:

Compare the SEI sequence number of the source video and the SEI sequence number of the transcoded video to determine the first video frame of the source video and the second video frame of the transcoded video corresponding to the same SEI sequence number;

or,

Compare the display timestamp of the source video and the display timestamp of the transcoded video to determine a first video frame of the source video and a second video frame of the transcoded video corresponding to the same display timestamp;

or,

The identification code of the source video and the identification code of the transcoded video are compared to determine a first video frame of the source video and a second video frame of the transcoded video corresponding to the same identification code.

Optionally, the number of first video frames of the source video is the same as the number of second video frames of the transcoded video.

Optionally, the number of the first video frames is multiple; the number of the second video frames is multiple;

The alignment module 330 is further adapted to:

Sort the first video frame of the source video and the second video frame of the transcoded video with the same SEI information in the order of video playback, and establish a one-to-one correspondence;

The first video frame and the second video frame having a corresponding relationship are played and/or stored simultaneously to complete the video alignment of the source video and the transcoded video.

The description of each module above refers to the corresponding description in the method embodiment and will not be repeated here.

According to the video alignment device provided by the present disclosure, the SEI information of the source video is copied and written into the transcoded video, that is, the SEI information in the obtained transcoded video is the same as that in the source video. According to the same SEI information in the source video and the transcoded video, the source video and the transcoded video can be aligned, and there is no need to decode the video images of the source video and the transcoded video, which saves resources and greatly improves the processing speed of video alignment.

The present disclosure also provides a non-volatile computer-readable storage medium, which stores at least one executable instruction, and the executable instruction can execute the video alignment method in any of the above method embodiments.

FIG4 shows a schematic diagram of the structure of a computing device according to an embodiment of the present disclosure. The specific embodiment of the present disclosure does not limit the specific implementation of the computing device.

As shown in Figure 4, the computing device may include: a processor (processor) 402, a communication interface (Communications Interface) 404, a memory (memory) 406, and a communication bus 408.

in:

The processor 402 , the communication interface 404 , and the memory 406 communicate with each other via a communication bus 408 .

The communication interface 404 is used to communicate with other devices such as clients or other servers.

Processor 402 is used to execute program 410, which can specifically execute the above-mentioned video alignment method to implement The relevant steps in the example.

Specifically, the program 410 may include program codes, which include computer operation instructions.

Processor 402 may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the present disclosure. The one or more processors included in the computing device may be processors of the same type, such as one or more CPUs; or may be processors of different types, such as one or more CPUs and one or more ASICs.

Memory 406 is used to store program 410. Memory 406 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk storage.

The program 410 can be specifically used to enable the processor 402 to perform the video alignment method in any of the above method embodiments. The specific implementation of each step in the program 410 can refer to the corresponding descriptions in the corresponding steps and units in the above video alignment embodiments, which will not be repeated here. Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the above-mentioned method embodiments, which will not be repeated here.

The algorithm or display provided herein is not inherently related to any particular computer, virtual system or other device. Various general purpose systems can also be used together with the teachings based on this. According to the above description, it is obvious that the structure required for constructing such systems. In addition, the present disclosure is not directed to any specific programming language either. It should be understood that various programming languages can be utilized to implement the content of the present disclosure described herein, and the above description of specific languages is to disclose the preferred embodiment of the present disclosure.

In the description provided herein, a large number of specific details are described. However, it is understood that the embodiments of the present disclosure can be practiced without these specific details. In some instances, well-known methods, structures and techniques are not shown in detail so as not to obscure the understanding of this description.

Similarly, it should be understood that in order to streamline the present disclosure and aid in understanding one or more of the various inventive aspects, in the above description of the exemplary embodiments of the present disclosure, the various features of the present disclosure are sometimes grouped together into a single embodiment, figure, or description thereof. However, this disclosed method should not be interpreted as reflecting the following intention: the claimed disclosure requires more features than the features explicitly recited in each claim. More specifically, as reflected in the claims below, the inventive aspects lie in less than all the features of the individual embodiments disclosed above. Therefore, the claims that follow the specific embodiment are hereby expressly incorporated into the specific embodiment, with each claim itself serving as a separate embodiment of the present disclosure.

Those skilled in the art will appreciate that the modules in the device of the embodiment may be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments are combined into one module or unit or component, and further they can be divided into multiple sub-modules or sub-units or sub-components. All features disclosed in this specification (including the accompanying claims, abstracts and drawings) and all processes or units of any method or device so disclosed can be combined in any combination, except that at least some of such features and/or processes or units are mutually exclusive. Unless otherwise expressly stated, each feature disclosed in this specification (including the accompanying claims, abstracts and drawings) can be replaced by an alternative feature that provides the same, equivalent or similar purpose.

In addition, those skilled in the art will appreciate that, although some embodiments herein include certain features included in other embodiments but not other features, the combination of features of different embodiments is meant to be within the scope of this disclosure and form different embodiments. For example, in the claims below, any one of the claimed embodiments may be used in any combination.

The various component embodiments of the present disclosure can be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It should be understood by those skilled in the art that a microprocessor or digital signal processor (DSP) can be used in practice to implement some or all functions of some or all components of the present disclosure. The present disclosure can also be implemented as a device or apparatus program (e.g., computer program and computer program product) for executing a part or all of the methods described herein. Such a program implementing the present disclosure can be stored on a computer-readable medium, or can have the form of one or more signals. Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

It should be noted that the above embodiments illustrate the present disclosure rather than limit the present disclosure, and that those skilled in the art may design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference symbol between brackets shall not be constructed as a limitation on the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "one" or "an" preceding an element does not exclude the presence of multiple such elements. The present disclosure may be implemented by means of hardware including several different elements and by means of appropriately programmed computers. In a unit claim that lists several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third, etc. does not indicate any order. These words may be interpreted as names. The steps in the above embodiments, unless otherwise specified, should not be understood as limitations on the order of execution.

Claims

A video alignment method, comprising:

Obtaining supplemental enhancement information SEI information in the source video;

The source video is transcoded and the SEI information is copied and written to obtain a transcoded video;

The source video and the transcoded video are aligned according to the same SEI information in the source video and the transcoded video.
The method according to claim 1, wherein the obtaining SEI information in the source video further comprises:

Decapsulate the source video to obtain a structure storing compressed coded data;

The structure is parsed to obtain SEI information of the source video.
The method according to claim 1, wherein, after obtaining the SEI information in the source video, the method further comprises:

The SEI information is stored in a cache based on a decoding timestamp of the structure.
The method according to claim 3, wherein the step of transcoding the source video and copying and writing the SEI information to obtain the transcoded video further comprises:

Transcoding the source video;

Get the corresponding SEI information from the cache according to the decoding timestamp of the structure;

The SEI information is copied and written into the transcoded video to obtain a transcoded video.
The method according to claim 1, wherein the method further comprises:

SEI information is added to the video frames in the source video.
The method according to claim 5, wherein the SEI information is set in an incremental manner.
The method according to claim 5, wherein the SEI information includes a SEI sequence number, a display timestamp and/or an identification code.
The method according to any one of claims 1 to 7, wherein the step of aligning the source video and the transcoded video according to the same SEI information in the source video and the transcoded video further comprises:

Decapsulating the source video and the transcoded video, and obtaining SEI information of the source video and SEI information of the transcoded video respectively;

Comparing the SEI information of the source video and the SEI information of the transcoded video, determining a first video frame of the source video and a second video frame of the transcoded video corresponding to the same SEI information;

The first video frame of the source video and the second video frame of the transcoded video are played and/or stored simultaneously to complete the video alignment of the source video and the transcoded video.
The method according to claim 8, wherein the comparing the SEI information of the source video and the SEI information of the transcoded video to determine the first video frame of the source video and the second video frame of the transcoded video corresponding to the same SEI information further comprises:

Comparing the SEI sequence number of the source video and the SEI sequence number of the transcoded video, determining a first video frame of the source video and a second video frame of the transcoded video corresponding to the same SEI sequence number;

or,

Comparing the display timestamp of the source video and the display timestamp of the transcoded video, determining a first video frame of the source video and a second video frame of the transcoded video corresponding to the same display timestamp;

or,

The identification code of the source video and the identification code of the transcoded video are compared to determine a first video frame of the source video and a second video frame of the transcoded video corresponding to the same identification code.
The method according to claim 8 or 9, wherein the number of first video frames of the source video is the same as the number of second video frames of the transcoded video.
The method according to any one of claims 8 to 10, wherein the number of the first video frames is multiple; the number of the second video frames is multiple;

The simultaneously playing and/or storing the first video frame of the source video and the second video frame of the transcoded video to complete the video alignment of the source video and the transcoded video further comprises:

Sort the first video frame of the source video and the second video frame of the transcoded video with the same SEI information according to the sequence of video playback, and establish a one-to-one correspondence;

The first video frame and the second video frame having a corresponding relationship are played and/or stored simultaneously to complete the video alignment of the source video and the transcoded video.
A video alignment device, comprising:

An acquisition module, adapted to acquire supplementary enhancement information SEI information in a source video;

A copy and write module, adapted to perform transcoding processing on the source video and copy and write the SEI information to obtain a transcoded video;

The alignment module is adapted to align the source video and the transcoded video according to the same SEI information. The source video and the transcoded video are aligned.
A computing device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction enables the processor to perform operations corresponding to the video alignment method according to any one of claims 1-11.
A non-volatile computer-readable storage medium stores at least one executable instruction, wherein the executable instruction enables a processor to perform operations corresponding to the video alignment method according to any one of claims 1 to 11.
A computer program product, comprising a computer program stored on a non-volatile computer-readable storage medium, wherein the computer program comprises program instructions, and when the program instructions are executed by a processor, the processor is caused to perform operations corresponding to the video alignment method as described in any one of claims 1 to 11.