CN111225277A

CN111225277A - Transcoding method, transcoding device and computer readable storage medium

Info

Publication number: CN111225277A
Application number: CN201811427461.1A
Authority: CN
Inventors: 王晓楠; 闻兴; 郑云飞; 陈宇聪; 黄跃; 陈敏; 蔡砚刚; 于冰
Original assignee: Reach Best Technology Co Ltd
Current assignee: Reach Best Technology Co Ltd; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2020-06-02
Also published as: WO2020108033A1

Abstract

The application relates to a transcoding method, a transcoding device and a computer storage medium. The transcoding method comprises the following steps: decoding an original video through a decoder to obtain a video frame sequence of the original video and coding information of the original video, wherein the original video is the video to which image-text information needs to be added; adding image-text information to the video frame sequence to obtain the video frame sequence added with the image-text information; and coding the video frame sequence added with the image-text information by using the coding information through a coder to obtain a new video. In the transcoding method, the encoder encodes the video frame sequence added with the pictures and texts based on the encoding information obtained in the decoding process of the decoder, so that the technical problems that the traditional transcoding mode of full-resolution full-encoding is long in time consumption and easy to damage in quality are solved.

Description

Transcoding method, transcoding device and computer readable storage medium

Technical Field

The present application belongs to the field of computer software applications, and in particular, relates to a transcoding method, a transcoding device, and a computer-readable storage medium.

Background

In the application of live video and on-demand video, in order to meet various user requirements, transcoding of an original video stream is required, and transcoding is a process of firstly decoding and then encoding an original compressed video stream. At present, there is a wide demand from users to add teletext information to a certain area of the encoded video (fixed or variable area relative to the display screen), such as: watermark pictures, subtitles, picture-in-picture, and magic expressions and stickers now in live broadcasting, etc.

In the related art, when transcoding a media file, a compressed original video stream is first decoded into a video file in an original video format, and then, image-text information and the like are superimposed on a specific region in a video, and then, re-encoding is performed. The transcoding method is actually a method of fully decoding and then fully encoding. The transcoding method of full codec in the related art mainly has the following disadvantages:

the first, full solution and full coding mode has a larger calculation amount, so that the processor needs to process a larger workload, and the coding time is longer;

secondly, in the two processes of primary encoding and secondary encoding of the video, encoders used for the primary encoding and the secondary encoding may be different, or encoding parameters adopted for the primary encoding and the secondary encoding are different, so that parameters such as resolution, code rate and the like of the original video and the transcoded new video are inconsistent, and therefore the definition of the image of the transcoded new video is reduced compared with that of the original video, or the fluency performance of the encoded new video during playback is weakened, and the problem that the video quality is damaged exists.

Disclosure of Invention

To address the problems in the related art, the present application discloses a transcoding method, a transcoding apparatus, and a computer-readable storage medium.

In a first aspect, an embodiment of the present invention provides a transcoding method, including:

decoding an original video through a decoder to obtain a video frame sequence of the original video and coding information of the original video, wherein the original video is a video needing to be added with image-text information;

adding the image-text information to the video frame sequence to obtain the video frame sequence added with the image-text information;

and encoding the video frame sequence added with the image-text information by using the encoding information through an encoder to obtain a new video.

Optionally, the encoding information of the original video includes: the video coding method comprises the steps of obtaining frame information, slice information and coding unit information of an original video, wherein the frame information is video characteristic data of each image frame of the original video, the slice information is a coding parameter of each slice of the original video, and the coding unit information is a coding parameter of a first basic coding unit of each image frame forming the original video.

Optionally, the encoding, by an encoder, the sequence of video frames to which the teletext information is added by using the encoding information includes:

initializing the encoder using the frame information;

dividing the video frame sequence added with the image-text information into a second basic coding unit;

and encoding the second basic coding unit according to the slice information and the coding unit information by the initialized encoder.

Optionally, the encoding the second coding unit according to the slice information and the coding unit information includes:

acquiring position information of each second basic coding unit;

sequentially judging whether the second basic coding unit is related to the coverage area of the image-text information or not based on the position information to obtain a judgment result;

determining whether to encode the second coding tree unit using the slice information and the coding unit information according to the determination result.

Optionally, the sequentially determining, based on the position information, whether the second coding unit is currently associated with a coverage area of the teletext information includes:

determining whether the second base encoding unit currently satisfies any one of a first condition and a second condition, wherein,

the first condition is that the second basic coding unit is located in an area covered by the image-text information currently;

the second condition is that the second base coding unit is currently in inter mode and either of the following conditions is satisfied: with reference to the image of the area covered by the teletext information, video motion vector prediction is affected by a target coding tree unit, wherein the target coding number unit is the second base coding unit that is currently adjacent to the second base coding unit and has been judged to be associated with the covered area.

Optionally, the determining whether to encode the second coding unit using the slice information and the coding unit information according to the determination result includes:

and under the condition that the judgment result is that the second basic coding unit is not related to the area covered by the image-text information at present, the second coding unit is coded by using the piece information and the coding unit information.

Optionally, the encoding standard of the original video is HECV, the frame information is stored in header information of the original video, and the first basic encoding unit is an encoding tree unit.

Optionally, the header information of the original video further includes a video parameter set, a sequence parameter set, and a picture parameter set.

Optionally, the encoding standard of the original video is VP9, the frame information of the original video is included in the header information of each image frame, and the first basic coding unit is a super block.

Optionally, the graphic information includes at least one of a picture watermark, an audio watermark, a subtitle, a bullet screen, a picture-in-picture, a sticker, and a magic expression.

In a second aspect, an embodiment of the present invention provides a transcoding device, including:

the acquisition module is used for decoding an original video through a decoder to acquire a video frame sequence of the original video and coding information of the original video, wherein the original video is a video needing to be added with image-text information;

the adding module is used for adding the image-text information to the video frame sequence to obtain the video frame sequence added with the image-text information;

and the coding module is used for coding the video frame sequence added with the image-text information by using the coding information through a coder to obtain a new video.

Optionally, the encoding information of the original video includes: the video coding method comprises the steps of obtaining frame information, slice information and coding unit information of an original video, wherein the video head information is video characteristic data of a frame image of the original video, the slice information is coding parameters of a slice of the original video, and the coding unit information is coding parameters of a first basic coding unit of each frame image of the original video.

Optionally, the encoding module comprises:

the second acquisition unit is used for acquiring the frame information, the slice information and the coding unit information of the original video from the coding information of the original video and acquiring the video frame sequence added with the image-text information;

an initialization unit for initializing the encoder using the frame information;

the dividing unit is used for dividing the video frame sequence added with the image-text information into a second basic coding unit;

and the coding unit is used for coding the second basic coding unit according to the slice information and the coding unit information through the initialized coder.

Optionally, the encoding unit includes:

an obtaining subunit, configured to obtain position information of each of the second basic coding units;

a judging subunit, configured to sequentially judge, based on the position information, whether the current second basic coding unit is related to a coverage area of the image-text information, so as to obtain a judgment result;

a determining subunit configured to determine, according to the determination result, whether to encode the second basic coding unit using the slice information and the coding unit information.

Optionally, the determining subunit is configured to:

the second condition is that the second base coding unit is currently in inter mode and either of the following conditions is satisfied: with reference to the image of the coverage area, video motion vector prediction is affected by a target coding tree unit, wherein the target coding number unit is the second coding tree unit that is adjacent to the current second coding tree unit and has been judged to be related to the coverage area.

Optionally, the determining subunit is configured to:

and under the condition that the judgment result is that the second basic coding unit is irrelevant to the coverage area of the image-text information at present, the second basic coding unit is coded by using the piece information and the coding unit information.

In a third aspect, an embodiment of the present invention provides a transcoding device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the transcoding method of any of the above.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions, which when executed implement the transcoding method in the first aspect.

In a fifth aspect, embodiments of the present invention provide a computer program product, including a computer program product, the computer program including program instructions, which when executed by an electronic device, cause the electronic device to perform the transcoding method of any one of the above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the transcoding method provided by the embodiment of the invention decodes the original video through the decoder to obtain the video frame sequence of the original video and the coding information of the original video, wherein the original video is the video to which the image-text information needs to be added; adding image-text information to the video frame sequence to obtain the video frame sequence added with the image-text information; and coding the video frame sequence added with the image-text information by using the coding information through a coder to obtain a new video.

According to the transcoding method, the decoder acquires the coded information in the decoding process, and the coded information is acquired conveniently and quickly; the encoder encodes the video frame sequence added with the image-text information based on the encoding information, reduces the time consumed by encoding decision calculation, ensures the consistency of the new video and the original video on information such as resolution, code rate, frame rate and the like, and greatly improves the image quality of the new video, thereby relieving the technical problems of long time consumption and easy quality damage of the traditional transcoding method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

Fig. 1 is a schematic diagram of a conventional transcoding approach;

fig. 2 is a flowchart illustrating a transcoding method according to an embodiment;

fig. 3 is a flowchart of a method for encoding a sequence of video frames with teletext information added thereto using encoding information according to an embodiment;

fig. 4 is a flowchart illustrating a method for encoding a second basic coding unit using slice information and coding unit information according to an embodiment;

FIG. 5 illustrates a mapping of encoded information in an exemplary embodiment;

fig. 6 is a block diagram illustrating a structure of a transcoding device according to a second embodiment;

fig. 7 is a block diagram illustrating a structure of a transcoding apparatus according to a third embodiment;

fig. 8 is a block diagram illustrating a structure of another transcoding apparatus according to a third embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

At present, in order to meet the requirement of a user for adding graphics and text information to an encoded video, a transcoding method in the related art decodes a source stream based on an original video into a video in an original video format, for example, a YUV (Luminance, Chroma) format; superimposing the teletext information on a specific area in the video and then encoding again. Fig. 1 is a schematic diagram of a conventional transcoding method, and referring to fig. 1, a flow of conventional transcoding is specifically as follows:

firstly, a decoder decodes a source stream needing to be added with image-text information into a video frame sequence in a YUV format;

then, image-text information is superposed on each video frame in the YUV format to which the image-text information needs to be added, and a YUV video frame sequence added with the image-text information is generated;

and finally, the YUV video frame sequence added with the image-text information enters an encoder, is encoded again, and generates a transcoding stream to form a new video.

The transcoding mode needs full decoding and full encoding, which is time-consuming.

In addition, the transcoding method of full codec has the following problems:

(1) a change in the structure of a Group of Pictures (GOP) of a video may cause a change in the characteristics of a part of the video, such as the picture frame delay. For example, the GOP structures of different video streams are different due to different video encoders or different parameters used by the encoders. At the transcoding end, the GOP structure of the source stream is difficult to judge in the program, because the GOP is different in length and length, it is difficult to perform differentiation processing on different video streams, and transcoding by using the same encoding parameters in a unified manner destroys the GOP structure of the source stream.

In many practical applications (e.g., live and on-demand scenes), the length of the GOP determines the delay of the image frames, and thus, a change in the GOP structure causes a change in the delay of the image frames.

In addition, in order to improve the overall quality of the video stream, the encoder usually allocates a smaller quantization width (QP parameter) to the I frame to maintain higher image quality, and allocates a larger QP parameter to the B frame next to the P frame to have the relatively worst image quality. The change in the video GOP structure makes it possible for the same frame of the new video and the original video to use different frame types. For example, I-frames of a source stream may become P-frames or even B-frames, and the original P/B-frames may be used as I-frames by the transcoded stream, thereby compromising the overall quality of the video stream.

(2) The code rate of the converted stream and the source stream can not be kept consistent at the moment. For example, the bitrate information is not declared in the header of the video in the HEVC standard, and is non-standard data. In practical applications, the metadata in mp4 format may contain bitrate data, but many bitstreams have no video bitrate data or have video bitrate value errors, so metadata cannot provide a reliable bitrate. Moreover, since most of the current codestream codes adopt the code rate control mode of ABR (available bit rate), the code rate changes in real time, and in this case, it is also very difficult to monitor the source stream code rate and inform the encoder of the real-time change. Meanwhile, because the GOP structure is changed, the same frame of the new video and the original video can use different frame types, so that the code rate of the transcoding stream and the code rate of the source stream are kept consistent at the moment.

In conclusion, the transcoding mode of full-coding and full-decoding can cause inconsistency of resolution, code rate and the like of the original video and the new video, so that the image definition of the new video is reduced to some extent, or the video smoothness performance is weakened to some extent, and the video quality is damaged.

Based on this, the transcoding method, the transcoding device and the computer readable storage medium provided by the embodiment of the invention solve the technical problems that the transcoding method for full decoding and full encoding consumes long time and the quality is easily damaged.

For the purpose of facilitating an understanding of the present embodiments, reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings.

An embodiment of the present invention provides a transcoding method, as shown in fig. 2, including:

step S102, decoding an original video through a decoder to obtain a video frame sequence of the original video and coding information of the original video, wherein the original video is the video needing to be added with image-text information;

step S104, adding image-text information to the video frame sequence to obtain the video frame sequence added with the image-text information;

and step S106, coding the video frame sequence added with the image-text information by using the coding information through a coder to obtain a new video.

It should be noted that the encoder and the decoder are two different functional modules, where the encoder is used for encoding a video frame sequence, and the decoder is used for decoding an original video. The encoder and the decoder may be two separate devices, or may be two functional modules integrated in one device housing, and the encoder and the decoder are not limited in the embodiment of the present invention.

In the embodiment of the invention, the decoder acquires the coding information of the original video in the decoding process, and the acquisition of the coding information is convenient and quick; the encoder encodes the video frame sequence added with the image-text information based on the encoding information obtained by the decoder, so that the time consumed by encoding decision calculation is reduced, the consistency of the new video and the original video on information such as resolution, code rate, frame rate and the like is ensured by utilizing the encoding information of the original video, the image quality of the new video is greatly improved, and the technical problems of long time consumption and easily damaged quality of the traditional transcoding method are solved.

It should be noted that in the embodiment of the present invention, the teletext information that needs to be added to the video frame sequence of the original video includes, but is not limited to, a picture watermark, an audio watermark, a subtitle, a bullet screen, a picture-in-picture, a sticker, and a magic expression.

In step S102, the decoder decodes the original video to obtain a video frame sequence and encoding information of the original video. The video frame sequence and the coding information can be stored in different positions of the original video, and the decoder can obtain the video frame sequence and the coding information of the original video based on one or more parsing processes.

Further, the encoding information of the original video includes frame information, slice information and encoding unit information of the original video, where the frame information is video feature data of an image frame of the original video, such as basic feature data of a video width and the like, the slice information is an encoding parameter of each slice of the original video, and the encoding unit information is an encoding parameter of a first basic encoding unit constituting each image frame of the original video. Each image frame may be divided into a plurality of slices, and each slice may be divided into a plurality of base coding units. The storage structures and storage locations in the original video of the frame information, slice information, and coding unit information may be different in different coding standards, and the terms used to represent the basic coding unit may be different.

The above examples are described below in terms of the HECV standard and the VP9 standard, respectively.

The frame information of the HECV standard is the characteristic data of the image frame, and the basic coding unit is a coding tree unit. For the HECV standard, frame information is stored in the video head information of the original video. The slice information is information of slices constituting image frames of the original video, and may adopt encoding parameters of a first slice of the first image frame. The coding unit information is coding parameters of coding tree units constituting image frames of the original video, and coding parameters of a first coding tree unit in the first slice may be used.

It should be noted that:

(1) the video head information is the most important video information as the video characteristic data of the original video, and comprises the basic characteristic data of the original video, such as width, height and the like, and the data are usually used when an encoder is initialized;

(2) the slice information is header information of the slice. Slice (slice) is an image dividing unit of High Efficiency Video Coding (HEVC), one frame of image can be divided into a plurality of slices and also can be used as one slice, and in many cases, one frame of image is coded as one slice in order to simplify coding and decoding. The slice header contains some of the coding parameters used by slice to configure the coding implementation.

(3) A Coding Tree Unit (CTU) is a basic Unit of HEVC video Coding, and the CTU size may be from 8x8 to 64x64, and one slice may include one or more CTUs. The coding tree unit information is coding parameters used by the coding tree unit.

The video head information, the slice information and the coding tree unit information form the coding information of the original video of the HECV standard, and the information can be conveniently obtained in the process of analyzing the original video. In addition, the video head information, the slice information and the coding tree unit information describe parameters of the original video of the HECV standard in the coding process of the coder more completely, so that in the step S106, after the coder uses the coding information to code the video frame sequence added with the image-text information, the obtained new video and the original video keep better consistency, and the quality of the new video is ensured not to be damaged.

For raw video of the HECV standard, the video header information may include: video Parameter Set (VPS), Sequence Parameter Set (SPS), and Picture Parameter Set (PPS).

That is, in the process of decoding the original video into the video frame sequence, the original video is analyzed, and the obtained frame information is included in the video parameter set, the sequence parameter set, and the image parameter set. Therefore, when the video sequence frame sequence added with the image-text information is coded again, the coder can refer to data such as a video parameter set, a sequence parameter set, an image parameter set and the like, and can better restore the video characteristics of the original video.

In one possible embodiment, the PPS includes different setting information for each frame of image, and the setting information mainly includes: bootstrap information, initial picture control information (such as initial QP), blocking information. All the PPS are inactive at the beginning of decoding, and at most one PPS is active at any time of decoding. When a certain PPS is referenced by a certain portion of the code stream, the PPS is activated, called the active PPS, until another PPS is activated.

The SPS provides information needed by all slices in the video sequence, and the content of the SPS may include: decoding related information such as level, resolution, number of sub-layers, etc.; a function switch identifier in a certain grade and parameters of the function; constraint information on structure and transform coefficient coding flexibility; temporal scalability information.

VPS is used to interpret the overall structure of the encoded video sequence, including temporal sub-layer dependencies and the like. The main purpose of adding this structure to HEVC is to accommodate the extension of the standard in multiple sub-layers of the system. For a given sub-layer of a video sequence, a VPS is shared regardless of whether the SPS is different. The VPS mainly contains the following information: syntax elements shared by multiple sub-layers or operation points; the key information of the conversation such as grade and grade; other operating point specific information not belonging to SPS.

Unlike HEVC, vp9 has no video header information such as VSP/SPS/PPS, only header information at the image frame level, and each image frame has header information of uncompressed header and compressed header. Accordingly, the corresponding frame information is stored in the uncompressed header. In addition, the uncompressed header also contains some other information, such as some of sps, pps, and slice information. The compressed header is a probability table for entropy coding of each syntax element of the current frame. The coding information that can be derived from the header information of the image frame for VP9 therefore includes frame information and slice information. The frame information is the basic characteristic information of the video image.

In addition, the vp9 standard employs an encoding hierarchy of image frames/slices/super blocks/blocks, the image frames can be divided into super blocks of 64x64 size below the image frames, and the slices are divided by the boundaries of the super blocks in a manner as stated in the uncompacted header. Superblocks, known by the english name super-block, SB for short, are the basic coding units for vp9 video coding, and each SB can be recursively divided into blocks (blocks) in the form of a quadtree. In the present application, coding parameters of the super block, such as the division manner of the SB, the coding mode of the block, the motion vector mv, the quantizer, and the like, are used as coding unit information.

In another alternative embodiment of the present invention, as shown in fig. 3, step S106, encoding the video frame sequence after adding the teletext information by using the encoding information, includes:

step S301, obtaining frame information, slice information, and coding unit information from a decoder, and obtaining a video frame sequence to which the teletext information is added.

In step S302, the encoder is initialized using the frame information.

The frame information represents basic characteristic information of the original video. The encoder is initialized using the frame information so that the new video and the original video remain consistent in terms of the configuration parameters of the encoder used.

Step S303, the video frame sequence added with the teletext information is divided into second basic coding units.

In the HEVC standard, the base coding unit is a coding tree unit, and in VP9 the base coding unit is a super block. The first and second basic coding units are used only to distinguish two different basic coding units. In this step, each frame in the sequence of video frames to which the teletext information has been added is divided into elementary coding units of fixed size in raster scan order (left to right and then top to bottom).

And step S304, encoding the second basic coding unit according to the slice information and the coding unit information through the initialized encoder.

In the HEVC standard, after an image frame is divided into Coding tree units, each Coding tree Unit may be recursively divided into Coding units (Coding units, abbreviated as CUs) in a quadtree form. In the VP9 standard, after dividing an image frame into super blocks, each super block may be recursively divided into multi-level blocks (blocks) in the form of a quad-tree.

For the HECV standard, the coding unit information includes CU depth, and a division manner in dividing the CTU into CUs. And dividing the second coding tree unit into coding units by using the coding unit information, so that the division of the coding units is consistent with the division of the coding units in the original video coding process. After the CTU is divided into CUs, intra-frame and inter-frame prediction, Discrete Cosine Transform (DCT for short) and quantization are performed on the CU as a unit, then run-length scanning is performed on the transformed and quantized residual coefficients, and finally entropy coding is performed to complete the coding process. The slice information includes frame display sequence, reference frame number, reference data set information, etc., and the coding tree unit information includes CU depth and partition mode, coding mode, quantization parameter QP, Sample Adaptive Offset (SAO) parameter, etc. And the second coding tree unit is coded by using the slice information and the coding tree unit information, so that the consistency of the new video and the original video in the aspect of coding the coding unit is ensured.

For the VP9 standard, the coding unit information includes block depth, and the partition mode in the process of dividing the superblock into blocks. And dividing the second super block into a plurality of blocks by using the coding unit information, so that the division of the blocks is consistent with the division of the blocks in the original video coding process. After dividing the super block into blocks, intra-frame and inter-frame prediction, Discrete Cosine Transform (DCT for short) and quantization are performed on the blocks as units, then run-length scanning is performed on the transformed and quantized residual coefficients, and finally entropy coding is performed to complete the coding process. The slice information includes frame display sequence, reference frame number, reference data set information, etc., and the coding tree unit information includes CU depth and partition mode, coding mode, quantization parameter QP, Sample Adaptive Offset (SAO) parameter, etc. And the second coding tree unit is coded by using the slice information and the coding tree unit information, so that the consistency of the new video and the original video in the aspect of coding the coding unit is ensured.

In the embodiment of the invention, the new video and the original video keep consistent in the aspect of the configuration parameters of the used encoder, the division of the encoding units keeps consistent, and the encoding of the encoding units keeps consistent, so that the new video and the original video keep consistent in the aspect of video quality, and the technical problem of video quality damage is solved.

In another alternative embodiment of the present invention, as shown in fig. 4, the step S304 of encoding the second coding tree unit according to the slice information and the coding tree unit information may include:

step S401, acquiring position information of each second basic coding unit;

step S402, judging whether the current second basic coding unit is related to the coverage area of the image-text information or not based on the position information in sequence to obtain a judgment result;

the judgment result of the step is as follows: the current second basic coding unit is related to the coverage area of the teletext information or the current second basic coding unit is not related to the coverage area of the teletext information.

In step S403, it is determined whether the second basic coding unit is encoded using the slice information and the coding unit information according to the determination result.

When the current second coding tree unit is not related to the coverage area of the teletext information, it is indicated that adding the teletext information does not change the current second basic coding unit, and thus the current second basic coding unit remains unchanged, in which case the coding decision of the current second basic coding unit does not become easy to maintain consistency with the original video quality.

In the embodiment of the invention, whether the second basic coding unit is coded by using the piece information and the coding unit information is determined by the relationship between the second basic coding unit and the coverage area of the image-text information, and the influence of the coverage area on the second basic coding unit is fully considered, so that the coding decision of the second basic coding unit is more reasonable and scientific.

In another alternative embodiment of the present invention, the step S402 of determining whether the current second coding tree unit is associated with the coverage area of the teletext information based on the position information in turn may include:

determining whether the current second basic coding unit satisfies any one of a first condition and a second condition, wherein,

the first condition is that the current second basic coding unit is positioned in an area covered by the image-text information;

the second condition is that the current second base coding unit is in inter mode and either of the following conditions is satisfied: with reference to the image of the coverage area, the video motion vector prediction is affected by a target coding tree unit, wherein the target coding number unit is a second basic coding unit adjacent to the current second basic coding unit and judged to be related to the coverage area.

When the current second basic coding unit satisfies any one of the first condition and the second condition, the determination result in step S402 is: the current second basic coding unit is related to the coverage area of the image-text information; when the current second basic coding unit does not satisfy the first condition or the second condition, the determination result in step S402 is: the current second basic coding unit is independent of the coverage area of the teletext information.

In the embodiment of the invention, whether the current second basic coding unit is related to the coverage area of the image-text information is judged, the first condition and the second condition are considered, the situation related to the coverage area of the image-text information is considered comprehensively, and the influence of the coverage area on the current second coding tree unit is determined comprehensively and accurately.

In step S403, determining whether to encode the second basic coding unit using the slice information and the coding unit information according to the determination result, may further include:

and under the condition that the judgment result is that the current second basic coding unit is not related to the coverage area of the image-text information, the second basic coding unit is coded by using the piece information and the coding unit information.

Specifically, when the determination result is that the current second basic coding unit is related to the coverage area of the graphics and text information, the coding decision used for coding the second basic coding unit is re-determined, which specifically includes determining the CU or block depth, the partition mode, the coding mode, and the like.

In the embodiment of the present invention, in the case that the current second basic coding unit is not related to the coverage area of the teletext information, the second basic coding unit is coded using the slice information and the coding unit information, i.e. the current second coding unit does not need to be subjected to calculation of a coding decision. Because the second basic coding unit irrelevant to the coverage area occupies a larger proportion in general, the calculation amount in the coding process of the coder is reduced to a great extent, the load of a processor is reduced, the transcoding is accelerated, and the technical problem that the traditional transcoding method consumes long time is solved.

In another alternative embodiment of the present invention, the encoder and the decoder are communicatively coupled such that the encoder obtains the encoded information from the decoder, wherein,

the decoder transmits the encoded information to the encoder in a first data structure and a first data arrangement;

the encoder receives the encoded information from the decoder in a second data structure and a second data arrangement, wherein,

the second data structure is the same as the first data structure, and the second data arrangement mode is the same as the first data arrangement mode.

In the embodiment of the invention, the encoder is in communication connection with the decoder, and the encoder and the decoder realize the transmission of the encoding information in the same data structure and the same data arrangement mode, thereby ensuring the quick and accurate transmission of the encoding information between the encoder and the decoder.

the decoder transmits the encoded information to the encoder in a third data structure and a third data arrangement;

after receiving the coding information, the coder stores the coding information according to a fourth data structure and a fourth data arrangement mode according to the mapping relation, wherein,

the fourth data structure is different from the third data structure, and/or the fourth data arrangement mode is different from the third data arrangement mode;

the mapping relationship is a corresponding relationship between a first position and a second position, the first position is a position of the coded information in the third data structure and the third data arrangement mode, and the second position is a position of the coded information in the fourth data structure and the fourth data arrangement mode.

Specifically, assuming that the encoded information includes A, B, C, the arrangement order of the encoded information in the third data structure and the third data arrangement is ACB, and the arrangement order of the encoded information in the fourth data structure and the fourth data arrangement is ABC, the mapping relationship is shown by arrows in fig. 5.

For example, the encoder and decoder, when storing the coding information of the quantization parameter: probably due to the problem of the calculation method, one is the unit line width of the data by the number of coding units in the image horizontal direction, and the other is the unit line width of the data by the number of coding units in the image horizontal direction plus 1, that is, the encoder and the decoder are different in terms of data structure, and when they perform data communication, the encoder maps the quantization parameter array in the third data structure to the quantization parameter array in the fourth data structure and stores the quantization parameter array line by line according to the mapping relation.

For another example, when the encoder and the decoder store the coding information of the motion vector information: one is to store the motion vector information on the whole image according to the raster scanning sequence by taking the minimum prediction unit as a storage unit; the other is that the motion vector information is stored in a coding tree unit according to a raster scanning sequence by taking a minimum prediction unit as a storage unit to form a plurality of coding tree units, and then the coding tree units are stored in the whole image according to the raster scanning sequence, namely, the encoder and the decoder are different in data arrangement mode, and when the encoder and the decoder carry out data communication, the encoder converts the coordinate of a certain minimum prediction unit in a third data arrangement into the coordinate in a fourth data arrangement according to a mapping relation, and the communication and the acquisition of data between the decoder and the encoder are completed.

It should be noted that, the encoded information of the embodiment of the present invention may be multiple, and the multiple encoded information may include the following three cases at the same time: (1) the fourth data structure is different from the third data structure, and the fourth data arrangement mode is different from the third data arrangement mode; (2) the fourth data structure is different from the third data structure, and the fourth data arrangement mode is the same as the third data arrangement mode; (3) the fourth data structure is the same as the third data structure, and the fourth data arrangement is different from the third data arrangement, and the plurality of encoded information may also include any one or any two of the above three cases. No matter the plurality of encoded information includes several of the three situations, each encoded information in the plurality of encoded information has a corresponding mapping relationship, and the encoder stores the encoded information according to a fourth data structure and a fourth data arrangement mode according to the mapping relationship corresponding to the encoded information.

In the embodiment of the invention, under the condition that the data structures and/or data arrangement modes are different between the decoder and the encoder, the encoder realizes the purpose of orderly acquiring the encoding information from the decoder through the mapping relation, and is particularly suitable for different conditions of developers of the decoder and the encoder.

An embodiment of the present invention provides a transcoding device, as shown in fig. 6, including:

an obtaining module 100, configured to decode an original video through a decoder, to obtain a video frame sequence of the original video and encoding information of the original video, where the original video is a video to which image-text information needs to be added;

an adding module 200, configured to add the image-text information to the video frame sequence to obtain a video frame sequence to which the image-text information is added;

and an encoding module 300, configured to encode, by an encoder, the video frame sequence to which the image-text information is added by using the encoding information, so as to obtain a new video.

The embodiment of the invention provides the transcoding device, a decoder acquires coding information in the decoding process, and the acquisition of the coding information is convenient and quick; the video frame sequence added with the image-text information is coded based on the coding information, so that the time consumed by coding decision calculation is reduced, the consistency of the new video and the original video on information such as resolution, code rate, frame rate and the like is ensured, the image quality of the new video is greatly improved, and the technical problems of long time consumption and easy quality damage of the traditional transcoding method are solved.

The obtaining module 100 decodes the original video through a decoder, and obtains a video frame sequence and coding information of the original video. The video frame sequence and the coding information can be stored in different positions of the original video, and the decoder can obtain the video frame sequence and the coding information of the original video based on one or more parsing processes.

In an optional implementation manner of the embodiment of the present invention, the obtaining module includes:

a decoding unit, configured to decode an original video into a sequence of video frames;

and the first acquisition unit is used for analyzing the original video to acquire the coding information of the original video in the process of decoding the original video into the video frame sequence.

In an optional implementation manner of the embodiment of the present invention, the encoding information of the original video includes: the video coding method comprises the steps of video head information, slice information and coding tree unit information of an original video, wherein the video head information is video characteristic data of the original video, the slice information is coding parameters of a first slice, the coding tree unit information is coding parameters of a first coding tree unit, and the first slice and the first coding tree unit belong to the original video.

In another optional implementation manner of the embodiment of the present invention, the video header information includes: a video parameter set, a sequence parameter set, and a picture parameter set.

In another optional implementation manner of the embodiment of the present invention, the encoding module includes:

In another optional implementation manner of the embodiment of the present invention, the encoding unit includes:

an acquisition subunit configured to acquire position information of each second basic coding unit;

the judgment subunit is used for judging whether the current second basic coding unit is related to the coverage area of the image-text information or not based on the position information in sequence to obtain a judgment result;

a determining subunit for determining whether to encode the second basic coding unit using the slice information and the coding unit information according to a result of the determination.

In another optional implementation manner of the embodiment of the present invention, the determining subunit is configured to:

the second condition is that the current second base coding unit is in inter mode and either of the following conditions is satisfied: with reference to the image of the coverage area, video motion vector prediction is affected by a target coding tree unit, wherein the target coding number unit is the second coding tree unit that is adjacent to the current second coding tree unit and has been judged to be related to the coverage area.

In another alternative implementation of an embodiment of the present invention, the encoder and the decoder are communicatively coupled such that the encoder obtains the encoded information from the decoder, wherein,

With regard to the interaction means in the above-described embodiments, a relatively brief description is given since the functions of the respective modules therein have been described in detail in the above-described embodiments of the interaction method.

An embodiment of the present invention provides a transcoding device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the transcoding method of embodiment one.

In an embodiment of the present invention, a processor is configured to execute the transcoding method of the first embodiment, that is, decode an original video through a decoder, to obtain a video frame sequence of the original video and encoding information of the original video, where the original video is a video to which image-text information needs to be added; adding image-text information to the video frame sequence to obtain the video frame sequence added with the image-text information; and coding the video frame sequence added with the image-text information by using the coding information through a coder to obtain a new video. The decoder acquires the coding information in the decoding process, and the acquisition of the coding information is convenient and quick; the video frame sequence added with the image-text information is coded based on the coding information, so that the time consumed by coding decision calculation is reduced, the consistency of the new video and the original video on information such as resolution, code rate, frame rate and the like is ensured, the image quality of the new video is greatly improved, and the technical problems of long time consumption and easy quality damage of the traditional transcoding method are solved.

Fig. 7 is a block diagram illustrating a structure of a transcoding apparatus 600. Referring to fig. 7, the transcoding device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls the overall operation of the transcoding device 600, such as operations associated with display, data communication, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the transcoding device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the transcoding device 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive an external audio signal when the transcoding device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the apparatus 600, the relative positioning of components, such as a display and keypad of the transcoding device 600, the sensor component 614 may also detect a change in the position of the transcoding device 600 or a component of the transcoding device 600, the presence or absence of user contact with the transcoding device 600, the orientation or acceleration/deceleration of the transcoding device 600, and a change in the temperature of the transcoding device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 616 is configured to facilitate communications between transcoding device 600 and other devices in a wired or wireless manner. The transcoding device 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the transcoding device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

Fig. 8 is a block diagram illustrating another transcoding apparatus 700. For example, the transcoding device 700 may be provided as a server. Referring to fig. 8, transcoding device 700 includes a processing component 722 that further includes one or more processors, and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Further, the processing component 722 is configured to execute instructions to perform the above-described information list display method.

The transcoding device 700 may also include a power component 726 configured to perform power management of the transcoding device 700, a wired or wireless network interface 750 configured to connect the transcoding device 700 to a network, and an input/output (I/O) interface 758. The transcoding device 700 may operate based on an operating system, such as Windows Server, Mac OS XTM, UnixTM, Linux, FreeBSDTM, or the like, stored in memory 732.

The fourth embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed, the transcoding method of the first embodiment is implemented.

In particular, a computer-readable storage medium, such as the memory 604, includes instructions executable by the processor 620 of the transcoding device 600 to perform the methods described above. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In the embodiment of the present invention, when executed, the computer instruction implements the transcoding method of the first embodiment, that is, the original video is decoded by the decoder, so as to obtain the video frame sequence of the original video and the encoding information of the original video, where the original video is a video to which the image-text information needs to be added; adding image-text information to the video frame sequence to obtain the video frame sequence added with the image-text information; and coding the video frame sequence added with the image-text information by using the coding information through a coder to obtain a new video. The decoder acquires the coding information in the decoding process, and the acquisition of the coding information is convenient and quick; the video frame sequence added with the image-text information is coded based on the coding information, so that the time consumed by coding decision calculation is reduced, the consistency of the new video and the original video on information such as resolution, code rate, frame rate and the like is ensured, the image quality of the new video is greatly improved, and the technical problems of long time consumption and easy quality damage of the traditional transcoding method are solved.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of transcoding, comprising:

2. The transcoding method of claim 1, wherein the encoding information of the original video comprises: the video coding method comprises the steps of obtaining frame information, slice information and coding unit information of an original video, wherein the frame information is video characteristic data of each image frame of the original video, the slice information is a coding parameter of each slice of the original video, and the coding unit information is a coding parameter of a first basic coding unit of each image frame forming the original video.

3. The transcoding method of claim 2, wherein the encoding, by an encoder, the sequence of video frames with the added teletext information using the encoding information comprises:

initializing the encoder using the frame information;

4. The transcoding method of claim 3, wherein the encoding the second coding unit according to the slice information and the coding unit information comprises:

acquiring position information of each second basic coding unit;

5. The transcoding method of claim 4, wherein the sequentially determining whether the second coding unit is currently associated with a coverage area of the teletext information based on the position information comprises:

6. The transcoding method of claim 5, wherein the determining whether to encode the second coding unit using the slice information and the coding unit information according to the determination result comprises:

under the condition that the judgment result is that the second basic coding unit is irrelevant to the area covered by the image-text information, the second coding unit is coded by using the piece information and the coding unit information;

preferably, the encoding standard of the original video is HECV, the frame information is stored in the header information of the original video, and the first basic encoding unit is an encoding tree unit;

preferably, the header information of the original video further includes a video parameter set, a sequence parameter set, and a picture parameter set;

preferably, the encoding standard of the original video is VP9, the frame information of the original video is contained in the header information of each image frame, and the first basic coding unit is a super block;

preferably, the graphic information includes at least one of a picture watermark, an audio watermark, a subtitle, a bullet screen, a picture-in-picture, a sticker, and a magic expression.

7. A transcoding device, comprising:

the encoding module is used for encoding the video frame sequence added with the image-text information by using the encoding information through an encoder to obtain a new video;

preferably, the encoding information of the original video includes: the video coding method comprises the steps of obtaining frame information, slice information and coding unit information of an original video, wherein the video head information is video characteristic data of a frame image of the original video, the slice information is a coding parameter of a slice of the original video, and the coding unit information is a coding parameter of a first basic coding unit of each frame image forming the original video;

preferably, the encoding module comprises:

Preferably, the encoding unit includes:

Preferably, the judging subunit is configured to:

Preferably, the determining subunit is configured to:

8. The transcoding device of claim 7, wherein the teletext information comprises at least one of a picture watermark, an audio watermark, a subtitle, a bullet screen, a picture-in-picture, a sticker, and a magic expression.

9. A transcoding device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the transcoding method of any of the above claims 1 to 6.

10. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the transcoding method of any of claims 1 to 6.