CN110049326B

CN110049326B - Video coding method and device and storage medium

Info

Publication number: CN110049326B
Application number: CN201910452802.9A
Authority: CN
Inventors: 黄书敏
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2022-06-28
Anticipated expiration: 2039-05-28
Also published as: CN110049326A

Abstract

The invention discloses a video coding method and device and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring coded data of a first video image, wherein the coded data of the first video image comprises an intra-frame prediction mode of each macro block in a first area of the first video image; encoding a second video image based on the encoded data of the first video image; the first video image is a key frame in the first video stream, the second video image is a key frame in the second video stream, the video image in the first video stream and the video image in the second video stream have an overlapped area, and the first area is an overlapped area of the first video image and the second video image. The invention can reduce the calculation cost of coding and reduce the coding complexity by multiplexing the coding data of the overlapping area in different paths of video streams.

Description

Video coding method and device and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a video encoding method and apparatus, and a storage medium.

Background

With the development of computer technology, the application scenes of videos are more and more abundant. In the video encoding process, a situation may occur in which the same device needs to encode multiple video streams simultaneously. In the related art, when the same device needs to encode multiple video streams simultaneously, each video stream needs to be encoded separately, which results in higher calculation cost and higher encoding complexity.

Disclosure of Invention

The embodiment of the invention provides a video coding method, a video coding device and a storage medium, which can solve the problems of high calculation cost and high coding complexity of video coding in the related technology. The technical scheme is as follows:

in a first aspect, a video encoding method is provided, the method including:

acquiring coded data of a first video image, wherein the coded data of the first video image comprises an intra-frame prediction mode of each macro block in a first area of the first video image;

encoding a second video image based on the encoded data of the first video image;

the first video image is a key frame in a first video stream, the second video image is a key frame in a second video stream, the video image in the first video stream and the video image in the second video stream have an overlapping region, and the first region is a region overlapping with the second video image in the first video image.

Optionally, the acquiring encoded data of the first video image includes:

acquiring encoded data of the first video image when the first region satisfies a specified condition, wherein the specified condition includes at least one of:

The leftmost column of macro blocks of the first area is the leftmost column of macro blocks of the first video image, and the topmost row of macro blocks of the first area is the topmost row of macro blocks of the first video image.

Optionally, before the acquiring the encoded data of the first video image, the method further comprises:

after the first video image is acquired, detecting whether the first area meets the specified condition;

when the first area meets the specified condition, determining the intra-frame prediction mode of the specified macro block in the first area according to the position of the first area in the first video image.

Optionally, the determining an intra prediction mode of a specified macroblock within the first region according to the position of the first region in the first video image includes:

and when the leftmost macroblock of the first area is not the leftmost macroblock of the first video image and the uppermost macroblock of the first area is the uppermost macroblock of the first video image, taking a reconstructed macroblock located above the specified macroblock as a reference macroblock of the specified macroblock, wherein the specified macroblock is any macroblock except the uppermost macroblock in the leftmost macroblock of the first area.

and when the leftmost macroblock of the first area is the leftmost macroblock of the first video image and the uppermost macroblock of the first area is not the uppermost macroblock of the first video image, taking a reconstructed macroblock located on the left side of the specified macroblock as a reference macroblock of the specified macroblock, wherein the specified macroblock is any macroblock except the leftmost macroblock in the uppermost macroblock of the first area.

Optionally, the encoded data further includes at least one of a sub-macroblock dividing manner of each macroblock, a transform manner of each macroblock, a quantization parameter of each macroblock, or a quantized residual of each macroblock.

Optionally, the encoding a second video image based on the encoded data of the first video image includes:

entropy coding is carried out on the basis of the sub-macro block dividing mode of each macro block in the first area, the intra-frame prediction mode of each macro block, the transformation mode of each macro block, the quantization parameter of each macro block and the quantized residual error of each macro block, so as to obtain a code stream corresponding to a second area of the second video image, wherein the second area is an area which is overlapped with the first video image in the second video image.

and when the leftmost column of macroblocks of the first area is not the leftmost column of macroblocks of the first video image and/or the topmost row of macroblocks of the first area is not the topmost row of macroblocks of the first video image, encoding macroblocks of a second area of the second video image, which is an area of the second video image that overlaps with the first video image, outside a specified area that includes the leftmost column of macroblocks and the topmost row of macroblocks of the second area, based on the encoded data of the first video image.

Optionally, the encoding data further includes a sub-macroblock dividing manner of each macroblock, and the encoding, based on the encoding data of the first video image, a macroblock located outside a specified region in the second region of the second video image includes:

acquiring a target macro block in the second area, wherein the target macro block is a macro block located outside the specified area, and the difference value between the pixel value of a reference macro block of the target macro block and the pixel value of a corresponding macro block in the first area is smaller than a specified threshold value;

And performing transformation processing, quantization processing and entropy coding on the target macro block based on the sub-macro block division mode of the corresponding macro block in the first area and the intra-frame prediction mode of the corresponding macro block.

Optionally, the first video image and the second video image satisfy one of the following relationships:

the second video image is obtained by intercepting the first video image;

and the first video image is obtained by splicing the second video image and the third video image.

In a second aspect, a video encoding apparatus is provided, the apparatus comprising:

an obtaining module, configured to obtain encoded data of a first video image, where the encoded data of the first video image includes an intra prediction mode of each macroblock in a first region of the first video image;

the encoding module is used for encoding a second video image based on the encoded data of the first video image;

Optionally, the obtaining module is configured to:

Optionally, the apparatus further comprises:

the detection module is used for detecting whether the first area meets the specified condition or not after the first video image is acquired;

and the determining module is used for determining the intra-frame prediction mode of the specified macro block in the first area according to the position of the first area in the first video image when the first area meets the specified condition.

Optionally, the determining module is configured to:

when the first region is

And when the macroblock in the leftmost column of the domain is the macroblock in the leftmost column of the first video image and the macroblock in the uppermost row of the first region is not the macroblock in the uppermost row of the first video image, taking the reconstructed macroblock located on the left side of the specified macroblock as a reference macroblock of the specified macroblock, wherein the specified macroblock is any macroblock except the macroblock in the leftmost row of the macroblock in the first region.

Optionally, the encoding module is configured to:

when the leftmost macro block of the first area is not the leftmost macro block of the first video image, and/or the uppermost macro block of the first area is not the uppermost macro block of the first video image, encoding a macro block of a second area of the second video image, which is an area overlapping with the first video image, out of a specified area including the leftmost macro block and the uppermost macro block of the second area based on the encoded data of the first video image.

Optionally, the encoded data further includes a sub-macroblock dividing manner of each macroblock, and the encoding module is further configured to:

the second video image is obtained by intercepting the first video image;

In a third aspect, there is provided a video encoding apparatus, including: a processor and a memory, and a control unit,

the memory for storing a computer program;

the processor is configured to execute the computer program stored in the memory to implement the video encoding method according to any of the first aspect.

In a fourth aspect, there is provided a storage medium comprising: the program in the storage medium, when executed by a processor, is capable of implementing a video encoding method as set forth in any one of the first aspects.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

when there is an overlapping region with the second video image in the first video image, the encoding end may encode the second video image based on the encoded data of the first video image. Because the encoded data comprises the intra-frame prediction mode of each macro block in the first area of the first video image, and the first area is the area which is overlapped with the second video image in the first video image, when the area which is overlapped with the first area in the second video image is encoded, the intra-frame prediction mode does not need to be selected again, and only the encoded data in the first area of the first video image needs to be multiplexed, thereby reducing the encoding complexity of the key frame in the video stream, further reducing the encoding complexity of the video stream and reducing the calculation overhead in the video encoding process.

Drawings

FIG. 1 is a diagram illustrating intra prediction according to an embodiment of the present invention;

FIG. 2 is a block diagram of an embodiment of intra prediction;

FIG. 3 is a schematic diagram of a dual-screen live broadcast provided by an embodiment of the present invention;

FIG. 4 is a schematic interface diagram of a mixed flow with wheat provided by an embodiment of the present invention;

fig. 5 is a flowchart of a video encoding method according to an embodiment of the present invention;

fig. 6 is a flowchart of another video encoding method according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a first video image according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of another first video image provided by an embodiment of the invention;

fig. 9 is a flowchart of another video encoding method according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of another video encoding apparatus according to an embodiment of the present invention;

fig. 12 is a block diagram of a video encoding apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Video coding refers to the manner in which a file is converted from one video format to another by a particular compression technique. Video coding typically includes both intra-coding and inter-coding. Intra-frame coding may also be referred to as intra-frame compression, among others. The intra-coding generally includes four processes of intra-prediction, transformation, quantization, and entropy-coding, which are respectively described in the following embodiments of the present invention.

Intra-frame prediction refers to a process of obtaining a predicted macroblock of a current macroblock from a reconstructed macroblock that has been encoded in a video image that is currently being encoded. The intra prediction modes of the current macroblock can be roughly classified into three types: the reconstructed macro block positioned at the upper side of the current macro block is taken as a reference macro block of the macro block, the reconstructed macro block positioned at the left side of the current macro block is taken as a reference macro block of the macro block, and the reconstructed macro blocks positioned at the left side and the upper side of the current macro block are taken as reference macro blocks of the macro block. Illustratively, fig. 1 is a schematic diagram of intra prediction according to an embodiment of the present invention. As shown in fig. 1, z denotes a current macroblock being coded, and x and y denote reconstructed macroblocks that have been coded, and a plurality of synthesized macroblocks can be obtained by synthesizing x and y in a plurality of intra prediction modes. The synthesized macro block with the pixel value closest to the pixel value of z in the synthesized macro blocks is a z prediction macro block, and the intra-frame prediction mode corresponding to the prediction macro block is the optimal intra-frame prediction mode of z.

Furthermore, after the optimal intra-frame prediction mode of the current macro block is determined, the reconstructed macro blocks can be synthesized according to the optimal intra-frame prediction mode to obtain a predicted macro block, and the predicted macro block and the current macro block are subtracted to obtain a residual error. Thus, the input of intra prediction is the current macroblock being coded and the reconstructed macroblock already coded, and the output is the optimal prediction module and the residual. Fig. 2 is a schematic diagram of a framework of intra prediction according to an embodiment of the present invention.

Transform refers to the process of transforming the residual into a form more conducive to encoding. Specifically, the residual is converted from a time domain signal to a frequency domain signal, the correlation in the image signal is removed, and the code rate is reduced. Optionally, the transformation manner may be K-L transformation, fourier transformation, cosine transformation, or wavelet transformation, and the transformation manner is not limited in this embodiment of the present invention.

Quantization refers to a process of performing quantization processing on a residual after the transform processing. In the video encoding process, the quantization parameters are usually determined by a code rate control module, and the quantization step lengths are different when the code rates are different. The smaller the quantization step size, the higher the quantization precision. The embodiment of the invention does not limit the code rate and the quantization step size.

Entropy coding refers to coding without losing any information according to the principle of entropy in the coding process. The information entropy is the average information content (a measure of uncertainty) of the source. Alternatively, the entropy encoding may be Shannon (Shannon) encoding, Huffman (Huffman) encoding and arithmetic encoding (arithmetric encoding). The input to entropy coding is the individual syntax elements and the output of entropy coding is the binary code stream. For intra-frame coding, the input of entropy coding comprises the intra-frame prediction mode of each macro block in a video image, the transformation mode of each macro block, the quantization parameter of each macro block and the quantized residual error of each macro block, and the output of entropy coding is a code stream corresponding to the video image.

In the h.264 standard, a macroblock refers to a block of pixels of size 16 x 16. In the video coding process, a macro block can be divided into sub-macro blocks, and the sub-macro blocks are coded to improve the precision of video coding. The sub-macroblock may be a pixel block with a size of 8 × 8, a pixel block with a size of 4 × 4, or a pixel block with other sizes, and the sub-macroblock division manner may be determined according to parameters such as the actual resolution of the video image.

The video coding method provided by the embodiment of the invention can be applied to a video processing system, and the video processing system comprises a coding end and at least one decoding end. The encoding end and the decoding end can be located on a terminal, and the terminal can be a smart phone, a computer, a multimedia player, an electronic reader or a wearable device and the like. The encoding end and the decoding end can realize the functions of the terminal through an operating system of the terminal or realize the functions of the terminal through a client.

Illustratively, when the anchor is in video live broadcasting, the encoding end is located at an anchor terminal for the anchor to perform video live broadcasting, and the anchor terminal generates a code stream corresponding to a video with a certain definition through video encoding. The decoder on the anchor terminal or on the viewer terminal for watching the live broadcast (the decoder is located in the operating system or the client) may implement the function of the decoding side, which may play the video of a certain definition on the terminal by decoding the bitstream.

With the richness of video application scenes, a situation that the same terminal needs to encode multiple video streams simultaneously occurs. For example, in live video, when a double-screen live broadcast or a main broadcast terminal connected with a mixed stream is required, the main broadcast terminal needs to encode two video streams simultaneously. The double-screen live broadcast refers to that the terminal plays one path of video stream (which may be called as a cross screen stream) under a cross screen and plays the other path of video stream (which may be called as a vertical screen stream) under a vertical screen, and generally, a picture of the vertical screen stream is a part intercepted from a picture of the cross screen stream. The mixed flow of the main broadcasting terminal and the wheat is that the video flows of two main broadcasting terminals connected with the wheat are played simultaneously on the main broadcasting terminal.

Illustratively, fig. 3 is a schematic diagram of a two-screen live broadcast provided by an embodiment of the present invention. As shown in fig. 3, the video image includes a picture a and pictures B located at both sides of the picture a for the same video image. When the terminal is in a horizontal screen, displaying a picture A and a picture B on a display interface of the terminal; and when the terminal is in the vertical screen, displaying a picture A on a display interface of the terminal. Therefore, the picture displayed under the vertical screen can be regarded as a picture obtained by cutting from the picture displayed under the horizontal screen. In order to realize the double-screen live broadcast, the anchor terminal needs to encode the video stream under the horizontal screen and the video stream under the vertical screen at the same time.

Exemplarily, fig. 4 is a schematic interface diagram of the mixed flow with wheat provided by the embodiment of the present invention. As shown in fig. 4, a mixed-flow picture is displayed on the anchor terminal, and the mixed-flow picture includes a picture C and a picture D. And the picture C is a main broadcast live picture corresponding to the main broadcast terminal, and the picture D is a main broadcast live picture corresponding to the main broadcast terminal of the opposite side. Namely, the mixed flow picture displayed on the anchor terminal is obtained by splicing the anchor live picture of the anchor terminal with the anchor live picture of the other anchor. Therefore, the anchor terminal needs to encode one video stream (i.e., mixed-flow video stream) for the picture obtained by splicing the picture C and the picture D, and the video stream is used for playing on the anchor terminal; meanwhile, the anchor terminal needs to encode another video stream for picture C, and the video stream is used for being sent to the anchor terminal of the opposite party, so that the anchor terminal of the opposite party can display the mixed-flow picture.

By adopting the video coding method in the related technology, when double-screen live broadcast or a main broadcast terminal is required to be connected with a microphone for mixed flow, the main broadcast terminal is required to respectively code two paths of video streams, namely, intra-frame or inter-frame prediction, transformation, quantization and entropy coding processes are required to be executed when each path of video stream is coded, the calculation cost is high, and the coding complexity is high.

The embodiment of the invention provides a video coding method, when video images in two paths of video streams have an overlapping region, a coding end can multiplex coded data of the overlapping region so as to reduce the calculation expense of coding and reduce the coding complexity.

Fig. 5 is a flowchart of a video encoding method according to an embodiment of the present invention. The method can be applied to an encoding end in a video processing system, as shown in fig. 5, and the method includes:

step 101, obtaining encoded data of a first video image, where the encoded data of the first video image includes an intra prediction mode of each macroblock in a first region of the first video image.

Step 102, encoding a second video image based on the encoded data of the first video image.

The first video image is a key frame in the first video stream, the second video image is a key frame in the second video stream, the video image in the first video stream and the video image in the second video stream have an overlapped area, and the first area is an overlapped area of the first video image and the second video image.

In summary, in the video encoding method provided in the embodiments of the present invention, when there is an area overlapping with the second video image in the first video image, the encoding end may encode the second video image based on the encoded data of the first video image. Because the encoded data comprises the intra-frame prediction mode of each macro block in the first area of the first video image, and the first area is the area which is overlapped with the second video image in the first video image, when the area which is overlapped with the first area in the second video image is encoded, the intra-frame prediction mode does not need to be selected again, and only the encoded data in the first area of the first video image needs to be multiplexed, thereby reducing the encoding complexity of the key frame in the video stream, further reducing the encoding complexity of the video stream and reducing the calculation overhead in the video encoding process.

Fig. 6 is a flowchart of another video encoding method according to an embodiment of the present invention. The method can be applied to an encoding end in a video processing system, as shown in fig. 6, and the method includes:

step 201, acquiring a first video image and a second video image.

The first video image is a key frame in the first video stream, the second video image is a key frame in the second video stream, and the video image in the first video stream and the video image in the second video stream have an overlapped area. There is an overlapping area of the first video image and the second video image. For convenience of description, in the embodiment of the present invention, a region overlapping with the second video image in the first video image is referred to as a first region, and a region overlapping with the first video image in the second video image is referred to as a second region.

Optionally, the first video image and the second video image satisfy one of the following relationships: the second video image is obtained by intercepting the first video image; the first video image is obtained by splicing the second video image and the third video image. That is, the second video image may be a portion of the content in the first video image.

For example, the first video image may be an image in a landscape stream, and the second video image may be an image in a portrait stream, referring to fig. 3, the first video image may include a picture a and a picture B, and the second video image may include a picture B, and the second video image may be captured from the first video image. As another example, the first video image may be an image in a mixed-stream video stream, the second video image may be an image in one anchor live video stream, and the third video image may be an image in another anchor live video stream, referring to fig. 4, the second video image may include a picture C, the third video image may include a picture D, and the first video image may be spliced from the picture C and the picture D.

It should be noted that the key frame may also be referred to as a base frame or an intra-coded frame (I-frame). In the coding standard proposed by the Moving Picture Experts Group (MPEG), a video frame is divided into three types, I frame, B frame (bidirectional difference frame), and P frame (forward predictive coding frame). The I frame can be used as a reference frame when other frames are generated, the I frame does not need to be generated by referring to other frames, and a complete image can be reconstructed by only using the data of the I frame when the I frame is decoded.

Step 202, detecting whether a first area of the first video image meets a specified condition.

Wherein the specified conditions include at least one of: the leftmost column of macroblocks of the first area is the leftmost column of macroblocks of the first video image, and the topmost row of macroblocks of the first area is the topmost row of macroblocks of the first video image. That is, the specified conditions include: the leftmost column of macro blocks of the first area is the leftmost column of macro blocks of the first video image; or the macroblock at the top row of the first area is the macroblock at the top row of the first video image; or, the leftmost macroblock of the first area is the leftmost macroblock of the first video image, and the top row macroblock of the first area is the top row macroblock of the first video image.

Step 203, when the first area of the first video image meets the specified condition, determining the intra-frame prediction mode of the specified macro block in the first area according to the position of the first area in the first video image.

In an optional embodiment of the present invention, when the leftmost macroblock of the first area is not the leftmost macroblock of the first video image and the uppermost macroblock of the first area is the uppermost macroblock of the first video image, the reconstructed macroblock located above a specified macroblock, which is any one of the leftmost macroblocks of the first area except for the uppermost macroblock, is used as a reference macroblock of the specified macroblock.

It should be noted that, by using the reconstructed macroblock located on the upper side of the specified macroblock as the reference macroblock of the specified macroblock, it is possible to avoid that, when the macroblock in the first region is encoded, independent encoding of the first region is implemented depending on the macroblock located on the left side of the first region.

Illustratively, fig. 7 is a schematic diagram of a first video image according to an embodiment of the present invention. As shown in fig. 7, the leftmost macroblock L1 of the first area M is not the leftmost macroblock of the first video image, and the uppermost macroblock H1 of the first area M is the uppermost macroblock of the first video image, so that the reference macroblock of the macroblocks other than the uppermost macroblock q0 in the leftmost macroblock L1 of the first area M is the macroblock located above the macroblock. For example, the reference macroblock of macroblock q1 is macroblock q0, and macroblock q0 has no reference macroblock.

In another alternative embodiment of the present invention, when the leftmost macroblock of the first area is the leftmost macroblock of the first video image and the uppermost macroblock of the first area is not the uppermost macroblock of the first video image, the reconstructed macroblock located at the left side of the specified macroblock is taken as the reference macroblock of the specified macroblock, where the specified macroblock is any one of the uppermost macroblocks of the first area except for the leftmost macroblock.

It should be noted that, by using the reconstructed macroblock located on the left side of the specified macroblock as the reference macroblock of the specified macroblock, it is possible to avoid that, when the macroblock in the first region is encoded, the macroblock located on the upper side of the first region is relied on to implement independent encoding of the first region.

Illustratively, fig. 8 is a schematic diagram of another first video image provided by the embodiment of the present invention. As shown in fig. 8, the leftmost macroblock L1 of the first area M is the leftmost macroblock of the first video image, and the top row macroblock H1 of the first area M is not the top row macroblock of the first video image, so that the reference macroblock of the top row macroblock H1 of the first area M except the leftmost macroblock p0 is the macroblock located on the left side of the macroblock. For example, the reference macroblock of the macroblock p1 is the macroblock p0, and the macroblock p0 has no reference macroblock.

In yet another optional embodiment of the present invention, when the leftmost macroblock of the first area is the leftmost macroblock of the first video image, and the topmost macroblock of the first area is the topmost macroblock of the first video image, the intra prediction mode of each macroblock in the first video image may be determined according to an intra prediction mode provided in the related art, which is not described herein again in this embodiment of the present invention.

Step 204, encoding the first video image.

Optionally, the encoding of the first video image comprises: sub-macroblock division, intra prediction, transformation, quantization, and entropy coding processes are performed on each macroblock in the first video image, respectively.

It should be noted that, when the first area of the first video image satisfies the specified condition, the specified macroblock is encoded according to the intra prediction mode of the specified macroblock determined in step 203.

After the encoding end encodes the first video image, the encoding end may store the encoded data of the first video image. The encoded data includes an intra prediction mode for each macroblock in the first region. Optionally, the encoded data may further include at least one of a sub-macroblock division manner of each macroblock in the first region, a transform manner of each macroblock in the first region, a quantization parameter of each macroblock in the first region, or a quantized residual of each macroblock in the first region.

And step 205, when the first area of the first video image meets the specified condition, encoding the second video image based on the encoded data of the first video image.

Optionally, after the encoding of the first video image is completed, the encoding end may acquire encoded data of the first video image when determining that the first region of the first video image satisfies the specified condition. The implementation process of step 205 includes: and entropy coding is carried out on the basis of the sub-macro block dividing mode of each macro block in the first area, the intra-frame prediction mode of each macro block, the conversion mode of each macro block, the quantization parameter of each macro block and the quantized residual error of each macro block to obtain a code stream corresponding to the second area of the second video image. Illustratively, when the second video image is a part of image in the first video image, the code stream corresponding to the second area is the code stream corresponding to the second video image.

Based on step 203, when the first region of the first video image satisfies the specified condition, the first region is independently encoded, so that the encoding accuracy of the second region can be ensured by encoding the second region of the second video image with the encoded data of the first region.

It should be noted that, when the video encoding method is used to encode the second region of the second video image, the encoding end does not need to perform sub-macroblock division, intra-frame prediction, transformation and quantization on each macroblock in the second region, and only needs to perform entropy encoding on the encoded data of the first region, so that the encoding of the macroblock in the second region can be completed, the encoding complexity of the second video image is greatly reduced, and the encoding complexity of the second video stream is further reduced.

In summary, in the video encoding method provided in the embodiment of the present invention, when the first video image has the area overlapping the second video image, the encoding end may encode the second video image based on the encoded data of the first video image, that is, when the area overlapping the first area in the second video image is encoded, the encoded data in the first area of the first video image may be multiplexed, so as to reduce the encoding complexity of the key frame in the video stream, further reduce the encoding complexity of the video stream, and reduce the computational overhead in the video encoding process.

Fig. 9 is a flowchart of another video encoding method according to an embodiment of the present invention. The method can be applied to an encoding end in a video processing system, as shown in fig. 9, and the method includes:

Step 301, acquiring a first video image and a second video image.

For the explanation of step 301, reference may be made to the above explanation of step 201, and details of the embodiment of the present invention are not described herein.

Step 302, encoding the first video image.

After the encoding end encodes the first video image, the encoding end may store the encoded data of the first video image. The encoded data includes an intra prediction mode for each macroblock in the first region. Optionally, the encoded data may further include a sub-macroblock partition manner of each macroblock in the first region.

Step 303 encodes the second video image based on the encoded data of the first video image.

Optionally, when the leftmost column of macro blocks of the first area is not the leftmost column of macro blocks of the first video image, and/or the uppermost row of macro blocks of the first area is not the uppermost row of macro blocks of the first video image, based on the encoded data of the first video image, macro blocks of a second area of the second video image, which are outside a specified area, are encoded, the specified area includes the leftmost column of macro blocks and the uppermost row of macro blocks of the second area, and the second area is an area overlapping with the first video image in the second video image. That is, when the first region is not the upper left corner region of the first video image, macroblocks other than the leftmost column of macroblocks and the uppermost row of macroblocks in the second region may be encoded using the encoded data of the first region. The leftmost column of macroblocks and the uppermost row of macroblocks in the second region each perform a complete encoding process, i.e., sub-macroblock partitioning, intra prediction, transformation, quantization, and entropy encoding.

When the leftmost macroblock of the first region is the leftmost macroblock of the first video image and the topmost macroblock of the first region is the topmost macroblock of the first video image, the encoding end may encode all macroblocks in the second region by using the encoded data of the first region.

Optionally, when the encoded data of the first video includes an intra prediction mode of each macroblock in the first region and a sub-macroblock division manner of each macroblock in the first region, a process of encoding a macroblock located outside a specified region in the second region of the second video image based on the encoded data of the first video image may include:

s3031, obtaining a target macro block in the second region, where the target macro block is a macro block located outside the specified region, and a difference between a pixel value of a reference macro block of the target macro block and a pixel value of a corresponding macro block in the first region is smaller than a specified threshold.

Optionally, for each macroblock in the second region, which is located outside the specified region, the encoding end may detect whether a difference between a pixel value of the macroblock and a pixel value of a corresponding macroblock in the first region is less than a specified threshold; when the difference value between the pixel value of the macro block and the pixel value of the corresponding macro block in the first area is smaller than a specified threshold value, the macro block can be determined as a target macro block, that is, the difference between the pixel in the target macro block and the pixel in the corresponding macro block in the first area is smaller; and when the difference value between the pixel value of the macro block and the pixel value of the corresponding macro block in the first area is not less than the specified threshold value, the sub-macro block dividing, intra-frame predicting, transforming, quantizing and entropy coding processes are executed on the macro block again.

S3032, transform processing, quantization processing, and entropy coding are performed on the target macroblock based on the sub-macroblock division manner of the corresponding macroblock in the first region and the intra prediction mode of the corresponding macroblock.

It should be noted that, when the second region of the second video image is encoded by using the above video encoding method, the encoding end does not need to perform sub-macroblock division and intra-frame prediction processes on the target macroblock in the second region, and only needs to perform transform processing, quantization processing, and entropy encoding on the encoded data of the first region, so that the encoding of the target macroblock in the second region can be completed, the encoding complexity of the second video image is reduced, and the encoding complexity of the second video stream is further reduced.

In the process of encoding the first video stream and the second video stream by using the video encoding method provided by the embodiment of the present invention, the encoding end may switch to encoding the second video image after encoding the first video image is completed, and copy the encoded data of the first video image into the encoded data of the second video image, so as to realize multiplexing of the encoded data of the first video image and the second video image.

It should be noted that, compared to the video encoding method shown in fig. 9, the video encoding method shown in fig. 6 multiplexes more encoded data and has lower encoding complexity; the video encoding method shown in fig. 9 multiplexes less encoded data and has higher encoding flexibility than the video encoding method shown in fig. 6. Alternatively, the video encoding method as shown in fig. 6 and the video encoding method as shown in fig. 9 may be used in combination. For example, in step 202, when it is determined that the first region of the first video image does not satisfy the specified condition, steps 302 and 303 may be performed.

The video coding method provided by the embodiment of the invention can be suitable for a plurality of paths of video streams comprising images with overlapping areas, and the application scene of the video coding method is not limited by the embodiment of the invention.

It should be noted that, the order of the steps of the video encoding method provided in the embodiment of the present invention may be appropriately adjusted, and the steps may also be increased or decreased according to the situation, and any method that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention shall be covered within the protection scope of the present invention, and therefore, the details are not described again.

Fig. 10 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present invention. As shown in fig. 10, the apparatus 40 includes:

an obtaining module 401 is configured to obtain encoded data of the first video image, where the encoded data of the first video image includes an intra prediction mode of each macroblock in the first region of the first video image.

An encoding module 402, configured to encode the second video image based on the encoded data of the first video image.

In summary, in the video encoding apparatus provided in the embodiments of the present invention, when there is an area overlapping with the second video image in the first video image, the encoding end may encode the second video image based on the encoded data of the first video image through the encoding module. Because the encoded data comprises the intra-frame prediction mode of each macro block in the first area of the first video image, and the first area is the area which is overlapped with the second video image in the first video image, when the area which is overlapped with the first area in the second video image is encoded, the intra-frame prediction mode does not need to be selected again, and only the encoded data in the first area of the first video image needs to be multiplexed, thereby reducing the encoding complexity of the key frame in the video stream, further reducing the encoding complexity of the video stream and reducing the calculation overhead in the video encoding process.

Optionally, the obtaining module is configured to:

acquiring the encoded data of the first video image when the first region satisfies a specified condition, wherein the specified condition includes at least one of: the leftmost column of macroblocks of the first area is the leftmost column of macroblocks of the first video image, and the topmost row of macroblocks of the first area is the topmost row of macroblocks of the first video image.

Optionally, as shown in fig. 11, the apparatus 40 further includes:

a detecting module 403, configured to detect whether the first area satisfies a specified condition after the first video image is acquired.

A determining module 404, configured to determine an intra prediction mode of a specified macroblock in the first area according to a position of the first area in the first video image when the first area satisfies a specified condition.

Optionally, the determining module is configured to:

when the leftmost macroblock of the first region is not the leftmost macroblock of the first video image and the uppermost macroblock of the first region is the uppermost macroblock of the first video image, the reconstructed macroblock located above the specified macroblock is taken as a reference macroblock of the specified macroblock, and the specified macroblock is any macroblock except the uppermost macroblock in the leftmost macroblock of the first region.

Optionally, the determining is for:

when the leftmost macroblock of the first area is the leftmost macroblock of the first video image and the uppermost macroblock of the first area is not the uppermost macroblock of the first video image, the reconstructed macroblock located on the left side of the specified macroblock is taken as a reference macroblock of the specified macroblock, and the specified macroblock is any macroblock except the leftmost macroblock in the uppermost macroblock of the first area.

Optionally, the encoded data further comprises at least one of a sub-macroblock division manner of each macroblock, a transform manner of each macroblock, a quantization parameter of each macroblock, or a post-quantization residual of each macroblock.

Accordingly, an encoding module to:

entropy coding is carried out on the basis of a sub-macro block dividing mode of each macro block in the first area, an intra-frame prediction mode of each macro block, a conversion mode of each macro block, a quantization parameter of each macro block and a quantized residual error of each macro block to obtain a code stream corresponding to a second area of the second video image, wherein the second area is an area which is overlapped with the first video image in the second video image.

Optionally, an encoding module, configured to:

and when the leftmost column of the macro block of the first area is not the leftmost column of the macro block of the first video image and/or the topmost macro block of the first area is not the topmost macro block of the first video image, coding macro blocks outside a specified area in a second area of the second video image based on the coded data of the first video image, wherein the leftmost column of the macro block and the topmost macro block of the second area are included in the specified area, and the second area is an area overlapping with the first video image in the second video image.

acquiring a target macro block in the second area, wherein the target macro block is a macro block located outside the specified area, and the difference value between the pixel value of the reference macro block of the target macro block and the pixel value of the corresponding macro block in the first area is smaller than a specified threshold value;

the second video image is obtained by intercepting the first video image;

the first video image is obtained by splicing the second video image and the third video image.

In summary, in the video encoding apparatus provided in the embodiment of the present invention, when the first video image has the area overlapping the second video image, the encoding end may encode the second video image based on the encoded data of the first video image through the encoding module, that is, when the area overlapping the first area in the second video image is encoded, the encoded data in the first area of the first video image may be multiplexed, so as to reduce the encoding complexity of the key frame in the video stream, further reduce the encoding complexity of the video stream, and reduce the computational overhead in the video encoding process.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The embodiment of the invention provides a video coding device, which is used for a coding end in a video processing system and comprises the following components: a processor and a memory, wherein the processor is capable of processing a plurality of data,

the memory for storing a computer program;

the processor is configured to execute the computer program stored in the memory to implement the video encoding method shown in fig. 6 or fig. 9.

Fig. 12 is a block diagram of a video encoding apparatus, which may be a terminal, according to an embodiment of the present invention. The terminal 500 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 500 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.

In general, the terminal 500 includes: a processor 501 and a memory 502.

The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing content required to be displayed on a display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the data query method provided by method embodiments herein.

In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502, and peripheral interface 503 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 503 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, display screen 505, camera 506, audio circuitry 507, positioning components 508, and power supply 509.

The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 504 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 504 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to capture touch signals on or over the surface of the display screen 505. The touch signal may be input to the processor 501 as a control signal for processing. At this point, the display screen 505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 505 may be one, providing the front panel of the terminal 500; in other embodiments, the display screens 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in still other embodiments, the display 505 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 500. Even more, the display screen 505 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The display 505 may be an OLED (Organic Light-Emitting Diode) display.

The camera assembly 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 507 may also include a headphone jack.

The positioning component 508 is used for positioning the current geographic Location of the terminal 500 for navigation or LBS (Location Based Service). The Positioning component 508 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 509 is used to power the various components in terminal 500. The power source 509 may be alternating current, direct current, disposable or rechargeable. When power supply 509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 500 also includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: acceleration sensor 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, optical sensor 515, and proximity sensor 516.

The acceleration sensor 511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 501 may control the touch screen 505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may acquire a 3D motion of the user on the terminal 500 in cooperation with the acceleration sensor 511. The processor 501 may implement the following functions according to the data collected by the gyro sensor 512: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 513 may be disposed on a side bezel of the terminal 500 and/or an underlying layer of the touch display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, a user's holding signal of the terminal 500 may be detected, and the processor 501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the touch display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 514 is used for collecting a fingerprint of the user, and the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 514 may be provided on the front, back, or side of the terminal 500. When a physical button or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical button or the vendor Logo.

The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the touch display screen 505 based on the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 505 is turned down. In another embodiment, processor 501 may also dynamically adjust the shooting parameters of camera head assembly 506 based on the ambient light intensity collected by optical sensor 515.

A proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 500. The proximity sensor 516 is used to collect the distance between the user and the front surface of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 gradually decreases, the processor 501 controls the touch display screen 505 to switch from the bright screen state to the dark screen state; when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 becomes gradually larger, the processor 501 controls the touch display screen 505 to switch from the screen-rest state to the screen-on state.

Those skilled in the art will appreciate that the configuration shown in fig. 12 is not intended to be limiting of terminal 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

An embodiment of the present invention provides a storage medium, including: the program in the storage medium, when executed by a processor, can implement a video encoding method as shown in fig. 6 or 9.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

In embodiments of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless explicitly defined otherwise.

The term "and/or" in the embodiment of the present invention is only one kind of association relationship describing an associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The invention is not to be considered as limited to the particular embodiments shown and described, but is to be understood to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of video encoding, the method comprising:

acquiring encoded data of a first video image, wherein the encoded data of the first video image comprises an intra-frame prediction mode of each macro block in a first area of the first video image;

encoding a second video image based on a result of whether a first region of the first video image satisfies a specified condition and encoded data of the first video image;

wherein the specified conditions include at least one of: the leftmost column of macroblocks of the first region is the leftmost column of macroblocks of the first video image, and the topmost macroblock of the first region is the topmost macroblock of the first video image;

the first video image is a key frame in a first video stream, the second video image is a key frame in a second video stream, the video image in the first video stream and the video image in the second video stream have an overlapped area, and the first area is an area overlapped with the second video image in the first video image.

2. The method of claim 1, wherein obtaining encoded data for the first video image comprises:

And when the first area meets a specified condition, acquiring the coded data of the first video image.

3. The method of claim 2, wherein prior to said obtaining encoded data for the first video image, the method further comprises:

4. The method of claim 3, wherein determining the intra-prediction mode for a specified macroblock within the first region based on the location of the first region in the first video picture comprises:

5. The method of claim 3, wherein determining the intra-prediction mode for a specified macroblock within the first region based on the location of the first region in the first video picture comprises:

6. The method according to any of claims 2 to 5, wherein said encoded data further comprises at least one of sub-macroblock partitioning of said each macroblock, transform of said each macroblock, quantization parameter of said each macroblock, or post-quantization residual of said each macroblock.

7. The method of claim 6, wherein encoding the second video image based on the result of whether the first region of the first video image satisfies the specified condition and the encoded data of the first video image comprises:

When a first video area of the first video image meets a specified condition, entropy coding is carried out on the basis of a sub-macro block dividing mode of each macro block in the first area, an intra-frame prediction mode of each macro block, a transformation mode of each macro block, a quantization parameter of each macro block and a quantized residual error of each macro block, so as to obtain a code stream corresponding to a second area of the second video image, wherein the second area is an area, overlapped with the first video image, in the second video image.

8. The method of claim 1, wherein encoding the second video image based on the result of whether the first region of the first video image satisfies the specified condition and the encoded data of the first video image comprises:

9. The method according to claim 8, wherein the encoded data further includes a sub-macroblock division manner for each macroblock, and the encoding, based on the encoded data of the first video image, a macroblock located outside a specified area in the second area of the second video image includes:

10. The method of claim 1, wherein the first video image and the second video image satisfy one of the following relationships:

the second video image is obtained by intercepting the first video image;

11. An apparatus for video encoding, the apparatus comprising:

the encoding module is used for encoding a second video image based on a result of whether a first area of the first video image meets a specified condition and encoded data of the first video image;

wherein the specified conditions include at least one of: the leftmost column of macro blocks of the first area is the leftmost column of macro blocks of the first video image, and the topmost row of macro blocks of the first area is the topmost row of macro blocks of the first video image;

12. The apparatus of claim 11, wherein the obtaining module is configured to:

13. The apparatus of claim 12, further comprising:

the detection module is used for detecting whether the first area meets the specified condition or not after the first video image is obtained;

14. The apparatus of claim 13, wherein the determining module is configured to:

15. The apparatus of claim 13, wherein the determining module is configured to:

16. The apparatus according to any of claims 12 to 15, wherein said encoded data further comprises at least one of sub-macroblock partitioning of said each macroblock, transform of said each macroblock, quantization parameter of said each macroblock, or post-quantization residual of said each macroblock.

17. The apparatus of claim 16, wherein the encoding module is configured to:

and when the first area of the first video image meets the specified condition, performing entropy coding on the basis of a sub-macro block dividing mode of each macro block in the first area, an intra-frame prediction mode of each macro block, a transformation mode of each macro block, a quantization parameter of each macro block and a quantized residual error of each macro block to obtain a code stream corresponding to a second area of the second video image, wherein the second area is an area which is overlapped with the first video image in the second video image.

18. The apparatus of claim 11, wherein the encoding module is configured to:

19. The apparatus of claim 18, wherein the encoded data further comprises sub-macroblock partitions for each macroblock, and wherein the encoding module is further configured to:

20. The apparatus of claim 11, wherein the first video image and the second video image satisfy one of the following relationships:

the second video image is obtained by intercepting the first video image;

21. A video encoding apparatus, comprising: a processor and a memory, wherein the processor is capable of processing a plurality of data,

the memory for storing a computer program;

the processor, configured to execute the computer program stored in the memory, to implement the video encoding method according to any one of claims 1 to 10.

22. A storage medium, comprising: a program stored on a storage medium, when executed by a processor, is capable of implementing a video encoding method as claimed in any one of claims 1 to 10.