CN107105255B

CN107105255B - Method and device for adding label in video file

Info

Publication number: CN107105255B
Application number: CN201610099403.5A
Authority: CN
Inventors: 杨江
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-02-23
Filing date: 2016-02-23
Publication date: 2020-03-03
Anticipated expiration: 2036-02-23
Also published as: CN107105255A

Abstract

The application provides a method and a device for adding a label in a video file, wherein the method for adding the label in the video file comprises the following steps: decoding the video file before adding the label to obtain video data; storing macroblock information in video data; acquiring label data, and combining the video data and the label data to obtain new video data; encoding the new video data to obtain a video file added with a label, wherein the encoding comprises predictive encoding, and the predictive encoding comprises: and when the occupation proportion of the label in the added video frame is smaller than a preset value or the current coded macro block is the macro block with no label, performing predictive coding by adopting the stored macro block information. The method can improve the processing speed.

Description

Method and device for adding label in video file

Technical Field

The present application relates to the field of video editing technologies, and in particular, to a method and an apparatus for adding a tag to a video file.

Background

In general, in order to add a tag to a video file encoded by H264, it is necessary to decode the video file to obtain each frame of data (ARGB pixel data or YUV pixels), calculate each frame of data by a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU) to obtain a tag, combine each frame of data of the video file and each frame of data of the tag to obtain new frame of data, and encode all the obtained new frame of data to obtain a new video file.

In the related art, there is a process of searching for intra/inter macroblock information during encoding, and this process has a large amount of calculation and takes a long time, so that the related art has a problem of long processing time, which affects user experience.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present application is to provide a method for adding a tag in a video file, which can improve processing speed and further improve user experience.

Another object of the present application is to provide an apparatus for tagging video files.

In order to achieve the above object, an embodiment of the present application in a first aspect provides a method for adding a tag to a video file, including: decoding the video file before adding the label to obtain video data; storing macroblock information in video data; acquiring label data, and combining the video data and the label data to obtain new video data; encoding the new video data to obtain a video file added with a label, wherein the encoding comprises predictive encoding, and the predictive encoding comprises: and when the occupation proportion of the label in the added video frame is smaller than a preset value or the current coded macro block is the macro block with no label, performing predictive coding by adopting the stored macro block information.

According to the method for adding the label to the video file provided by the embodiment of the first aspect of the application, the macro block information is stored when the video file is decoded, and the stored macro block information is adopted for predictive coding in some cases.

In order to achieve the above object, an apparatus for adding a tag to a video file according to an embodiment of the second aspect of the present application includes: the decoding module is used for decoding the video file before the label is added to obtain video data; the storage module is used for storing macro block information in the video data; the merging module is used for acquiring the label data, merging the video data and the label data and obtaining new video data; an encoding module, configured to encode the new video data to obtain a video file to which a tag is added, where the encoding includes predictive encoding, and the predictive encoding includes: and when the occupation proportion of the label in the added video frame is smaller than a preset value or the current coded macro block is the macro block with no label, performing predictive coding by adopting the stored macro block information.

According to the device for adding the label in the video file, provided by the embodiment of the second aspect of the application, the macro block information is stored when the video file is decoded, and the stored macro block information is adopted for predictive coding in some cases.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a method for adding a tag to a video file according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for adding a tag to a video file according to another embodiment of the present application;

FIG. 3 is a flow chart of H264 encoding and decoding;

fig. 4 is a schematic diagram of NAL units resulting from H264 encoding;

FIG. 5 is a schematic diagram of the area occupied by a tag within a video frame;

fig. 6 is a schematic structural diagram of an apparatus for adding a tag to a video file according to another embodiment of the present application;

fig. 7 is a schematic structural diagram of an apparatus for adding a tag to a video file according to another embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

Fig. 1 is a flowchart illustrating a method for adding a tag to a video file according to an embodiment of the present application.

Referring to fig. 1, the method includes:

s11: and decoding the video file before the label is added to obtain video data.

In the embodiment of the present application, an H264 codec is taken as an example, so that a video file before adding a tag may specifically refer to a video file encoded by using H264, and a tag needs to be added to the video file.

Each frame of data of the video file may be obtained after decoding the video file, and each frame of data obtained after decoding the video file may be referred to as video data in order to be distinguished from each frame of data of the subsequent tag.

S12: macroblock information in video data is stored.

Each frame data may include a plurality of macroblock information, and when stored, the corresponding macroblock information may be stored according to different scenes. For example, all macroblock information may be stored, or only macroblock information not occupied by a tag may be stored. For details, reference may be made to the following examples.

S13: and acquiring the label data, and combining the video data and the label data to obtain new video data.

The content of the tag to be added to the video file may be set by a user, and then each frame of data of the tag is obtained through the CPU or GPU operation, where each frame of data of the tag may be referred to as tag data.

After the video data and the tag data are obtained, the video data and the tag data may be combined to combine each frame data of the video file and each frame data of the tag corresponding to each frame to obtain new frame data, which may be referred to as new video data.

S14: encoding the new video data to obtain a video file added with a label, wherein the encoding comprises predictive encoding, and the predictive encoding comprises: and when the occupation proportion of the label in the added video frame is smaller than a preset value or the current coded macro block is the macro block with no label, performing predictive coding by adopting the stored macro block information.

After the new video data is obtained, the new video data can be encoded, so that a new video file is obtained, and the new video file is the video file to which the label is added.

The encoding will typically include: predictive coding, transform coding, quantization and entropy coding, etc.

In the related art, an intra/inter macroblock information search process is generally used in predictive coding, so as to perform predictive coding according to the searched intra prediction information or inter prediction information.

In this embodiment, in some cases, the stored macroblock information is directly used for predictive coding, and macroblock information does not need to be searched. Some of these include: when the occupation proportion of the label added into the video frame is smaller than the preset value, or the current coding macro block is the macro block without the label. The predictive coding may include macroblock cyclic coding that sequentially codes different macroblocks, so as to determine whether a currently coded macroblock is a macroblock whose label is not occupied.

In this embodiment, by storing the macroblock information when decoding the video file and performing predictive coding using the stored macroblock information in some cases, a long time is consumed due to a large amount of calculation in the search process of the macroblock information.

Fig. 2 is a flowchart illustrating a method for adding a tag to a video file according to another embodiment of the present application.

For better understanding of the present application, the related contents of the H264 codec will be described first.

The encoding and decoding flow of H264 is shown in fig. 3. For the encoding process, firstly, predictive encoding is carried out, the predictive encoding is divided into intra-frame prediction and inter-frame prediction, for the intra-frame encoding, an optimal prediction mode needs to be searched in various prediction modes, and then the difference value between a predicted value and a real pixel is compared and transmitted to a subsequent link; for interframe coding, motion estimation between a current frame and a reference frame is carried out based on the reference frame to obtain a motion vector, the motion vector relative to the reference frame is obtained by utilizing the motion estimation and the motion compensation, and a difference value between a predicted value and a real pixel is compared and transmitted to a subsequent link; then, transform coding is carried out to remove high-frequency signals in the prediction difference, and then the energy of the difference signal is further reduced through quantization; and finally, reordering and entropy coding to obtain a final Network Abstraction Layer (NAL) unit, and storing the final NAL unit in a video file.

For the decoding process, the inverse of the encoding is performed.

Referring to fig. 4, each NAL unit (NAL unit) contains one or more slices (slices), each slice containing a slice header and slice data. The Intra-slice data is composed of a series of consecutive encoded Macroblocks (MBs), and is divided into Intra-prediction macroblocks and inter-prediction macroblocks or non-prediction macroblocks (Skip MBs) according to the types of the macroblocks, and as can be seen from the figure, the Intra-prediction macroblocks and the inter-prediction macroblocks have different types of prediction information, the Intra-prediction information is prediction modes (Intra modes), and the inter-prediction information is Reference frames (Reference frames) and Motion Vectors (Motion Vectors).

In a general scenario, the tag added to the video file does not occupy all the space of the frame image, and in more cases, the tag only occupies a smaller area in the frame of the video file, similar to that shown in fig. 5. The video frame is divided into 4x4 areas, where the filled portions represent areas not occupied by the tags and the filled portions represent areas occupied by the tags. At this point, the area occupied by the tag is 3/16 for the entire video frame. Based on the characteristics, in the encoding process of the new frame data, the information obtained by decoding is effectively utilized, and certain operations in the encoding process are reduced.

Referring to fig. 2, the method includes:

s201: and determining the occupation proportion of the label in the video frame according to the coordinate information of the label.

Here, the tag may be set by a user, and therefore, coordinate information of the tag may be acquired from the setting information.

The occupation ratio is the ratio of the number of macroblocks occupied by the label to the total number of macroblocks in the video frame.

The macro block size is usually 16 × 16 pixel points, the number of macro blocks occupied by the label can be calculated according to the coordinate information of the label and the size of each macro block, and the occupation proportion of the label in the video frame can be determined according to the total number of macro blocks of each video frame.

S202: and judging whether the occupation ratio is larger than a preset value, if so, executing S203-S211, otherwise, executing S212-S218.

The preset value is for example 25%.

After the occupancy ratio and the preset value are obtained, the two values can be compared to obtain a judgment result.

S203: and decoding the video file before the label is added to obtain video data, and storing the macro block information which is not occupied by the label in the video data.

The macro block occupied by the label can be determined according to the coordinate information of the label, and then the information of the macro block not occupied by the label is obtained.

The macroblock information may include: intra prediction information and inter prediction information.

The Intra prediction information is specifically prediction Modes (Intra Modes).

The inter prediction information specifically includes: reference frames (Reference frames) and motion vectors (MotionVectors).

S204: and acquiring the label data, and combining the video data and the label data to obtain new video data.

After new video data is obtained, subsequent processes such as encoding and the like can be carried out.

S205: after the start of the macroblock loop coding, the currently coded macroblock is determined.

For example, each macroblock is sequentially treated as a currently encoded macroblock in a traversal manner.

S206: and judging whether the current coded macro block is occupied by the label, if so, executing S207, otherwise, executing S208.

S207: new macroblock information is calculated by intra and inter search processes and new video data is predictively encoded using the new macroblock information. Then S209 is executed.

After the macro block is occupied by the label, the search of macro block information can be performed again to improve the accuracy.

For example, a new prediction mode can be obtained by searching intra-frame macro block information, intra-frame prediction coding is performed by using the new prediction mode, a new reference frame and a motion vector can be obtained by searching inter-frame macro block information, and inter-frame prediction coding is performed by using the new reference frame and a motion vector.

S208: and performing predictive coding on the new video data by using the stored macro block information.

For example, the stored macroblock information includes: the intra prediction information (prediction mode) and the inter prediction information (reference frame and motion vector) may be respectively intra prediction encoded according to the intra prediction information and inter prediction encoded according to the inter prediction information.

S209: the data after predictive coding is subjected to transform coding, quantization, entropy coding, and the like.

S210: and judging whether the macro block circular coding is finished, if so, executing S211, otherwise, repeatedly executing S205 and the subsequent steps.

For example, after all macroblocks are completely encoded, it is determined to end the macroblock cycle encoding, otherwise, the macroblock cycle encoding is continued.

S211: the macroblock loop coding ends.

S212: and decoding the video file before the label is added to obtain video data, and storing all macro block information in the video data.

S213: and acquiring the label data, and combining the video data and the label data to obtain new video data.

S214: after the start of the macroblock loop coding, the currently coded macroblock is determined.

S215: and performing predictive coding on the new video data by using the stored macro block information.

S216: the data after predictive coding is subjected to transform coding, quantization, entropy coding, and the like.

S217: whether the macro block cyclic encoding is finished is judged, if yes, S218 is executed, otherwise, S214 and the subsequent steps are executed repeatedly.

S218: the macroblock loop coding ends.

In S208 or S215 described above, by omitting the intra/inter macroblock information search process, the encoding time can be significantly reduced. Experiments show that the encoding time of the video is about 2 times of the decoding time, while in the encoding process, the prediction encoding time occupies about 70% of the whole encoding time, and the prediction encoding time is mainly focused on searching the macro block information. For the condition that the proportion of the macro blocks occupied by the labels is smaller than the preset value, all macro block information searching processes are omitted, and the rough estimation can bring about 50% of performance improvement to the whole label adding process. Certainly, neglecting the influence of the tag information on the macroblock information at this time, the residual error of the predictive coding may be increased, and a certain increase may be brought to the code stream generated by the final coding. For the condition that the proportion of the macro blocks occupied by the labels is larger than the preset value, the prediction coding is respectively carried out according to the condition that the macro blocks are occupied, for the macro blocks which are not occupied by the labels, the coding time can be effectively reduced by decoding the stored macro block information, for the macro blocks occupied by the labels, the macro block information is searched again, the residual error of the final prediction coding can be ensured to be optimal, and compared with the conventional coding process, the residual error of the final prediction coding is not increased.

In this embodiment, by storing the macroblock information when decoding the video file and performing predictive coding using the stored macroblock information in some cases, it takes a long time because the amount of calculation in the search process of the macroblock information is large, and in this embodiment, the processing time can be significantly reduced by directly using the stored macroblock information without performing macroblock information search. Furthermore, the stored macro block information is adopted to carry out predictive coding or search and calculate new macro block information in different situations, so that the calculation amount and accuracy can be considered, and the requirements of different scenes can be met.

Fig. 6 is a schematic structural diagram of an apparatus for adding a tag to a video file according to another embodiment of the present application. Referring to fig. 6, the apparatus 60 includes: a decoding module 61, a storage module 62, a merging module 63 and an encoding module 64.

And the decoding module 61 is configured to decode the video file before the tag is added, so as to obtain video data.

And a storage module 62, configured to store macroblock information in the video data.

Each frame data may include a plurality of macroblock information, and when stored, the corresponding macroblock information may be stored according to different scenes. For example, all macroblock information may be stored, or only macroblock information not occupied by a tag may be stored.

And a merging module 63, configured to acquire the tag data, and merge the video data and the tag data to obtain new video data.

An encoding module 64, configured to encode the new video data to obtain a video file with a tag added, where the encoding includes predictive encoding, and the predictive encoding includes: when the occupation proportion of the label in the added video frame is smaller than a preset value, performing predictive coding by adopting the stored macro block information; or, the predictive coding includes macroblock cyclic coding for sequentially coding different macroblocks, and when the currently coded macroblock is a macroblock whose label is not occupied, the stored macroblock information is used for predictive coding.

In some embodiments, referring to fig. 7, the apparatus 60 further comprises:

and the determining module 65 is configured to determine an occupation proportion of the tag in the video frame according to the coordinate information of the tag.

The macroblock size is usually 16 × 16, the number of macroblocks occupied by the tag can be calculated according to the coordinate information of the tag and the size of each macroblock, and then the occupation proportion of the tag in the video frame can be determined according to the total number of macroblocks of each video frame.

Correspondingly, the storage module 62 is specifically configured to:

if the occupation ratio is larger than a preset value, storing macro block information which is not occupied by the label in the video data;

and if the occupation ratio is less than or equal to a preset value, storing all macro block information in the video data.

The preset value is for example 25%.

And after the judgment result is obtained, storing the unoccupied macro block information or storing all the macro block information according to the judgment result.

In some embodiments, the predictive coding performed by the coding module 64 includes:

if the occupation ratio is larger than a preset value, judging whether the current coded macro block is occupied by the label or not after the macro block cyclic coding starts; and if the current coding macro block is not occupied by the label, adopting the stored macro block information to carry out predictive coding on the new video data.

On the other hand, if the currently encoded macroblock is occupied by a tag, new macroblock information is calculated through intra and inter search processes, and new video data is predictively encoded using the new macroblock information.

and if the occupation ratio is smaller than or equal to a preset value, performing predictive coding on new video data by adopting the stored macro block information after the macro block cyclic coding starts.

The specific contents of each module in this embodiment may be referred to in the related description of the above embodiments, and are not described in detail here.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A method for adding a label to a video file, comprising:

decoding the video file before adding the label to obtain video data;

determining the occupation proportion of the label in the video frame according to the coordinate information of the label;

storing macroblock information in video data, wherein the macroblock information comprises intra-frame prediction information and inter-frame prediction information, if the occupation ratio is greater than a preset value, storing macroblock information which is not occupied by a label in the video data, and if the occupation ratio is less than or equal to the preset value, storing all macroblock information in the video data;

acquiring label data, and combining the video data and the label data to obtain new video data;

encoding the new video data to obtain a video file added with a label, wherein the encoding comprises predictive encoding, and the predictive encoding comprises: and when the occupation proportion of the label in the added video frame is less than or equal to a preset value or the current coded macro block is the macro block with no label, performing predictive coding by adopting the stored macro block information.

2. The method of claim 1, wherein if the occupancy is greater than a predetermined value, the predictive coding comprises:

after the macro block circular coding starts, judging whether the current coded macro block is occupied by the label;

and if the current coding macro block is not occupied by the label, adopting the stored macro block information to carry out predictive coding on the new video data.

3. The method of claim 2, further comprising:

if the current coding macro block is occupied by the label, calculating new macro block information through the intra-frame and inter-frame search process, and adopting the new macro block information to carry out predictive coding on new video data.

4. The method of claim 1, wherein if the occupancy is less than or equal to a predetermined value, the predictive coding comprises:

and after the macro block circular coding is started, performing predictive coding on new video data by using the stored macro block information.

5. An apparatus for tagging video files, comprising:

the decoding module is used for decoding the video file before the label is added to obtain video data;

the determining module is used for determining the occupation proportion of the label in the video frame according to the coordinate information of the label;

a storage module, configured to store macroblock information in video data, where the macroblock information includes intra-frame prediction information and inter-frame prediction information, and the storage module is specifically configured to: if the occupation ratio is larger than a preset value, storing macro block information which is not occupied by the label in the video data, and if the occupation ratio is smaller than or equal to the preset value, storing all macro block information in the video data;

the merging module is used for acquiring the label data, merging the video data and the label data and obtaining new video data;

an encoding module, configured to encode the new video data to obtain a video file to which a tag is added, where the encoding includes predictive encoding, and the predictive encoding includes: and when the occupation proportion of the label in the added video frame is less than or equal to a preset value or the current coded macro block is the macro block with no label, performing predictive coding by adopting the stored macro block information.

6. The apparatus of claim 5, wherein the predictive coding performed by the coding module comprises:

if the occupation ratio is larger than a preset value, judging whether the current coded macro block is occupied by the label or not after the macro block cyclic coding starts; if the current coding macro block is not occupied by the label, the stored macro block information is adopted to carry out predictive coding on the new video data;