CN115914648A

CN115914648A - Video image processing method and device

Info

Publication number: CN115914648A
Application number: CN202111164100.4A
Authority: CN
Inventors: 方华猛; 邸佩云; 刘欣
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2023-04-04
Also published as: WO2023051156A1

Abstract

The embodiment of the application discloses a method and a device for processing a video image, wherein a second code used for generating a second code stream is controlled according to coding information of a first code and/or a first reconstructed image, the second code stream is coded in a full-frame intra-prediction mode, and the second code stream is used as a random access stream, so that the decoding quality of the accessed video content can be improved, the blocking effect is reduced, and partial artifact effect is eliminated on the basis of meeting the requirement of low-delay access of the video content, and the method and the device belong to the technical field of video coding and decoding.

Description

Video image processing method and device

Technical Field

The present application relates to the field of video encoding and decoding technologies, and in particular, to a method and an apparatus for processing a video image.

Background

Video, as an efficient means of information transfer, has been widely distributed over the internet, television broadcasts, and various emerging media applications. With the rapid development of video encoding and decoding technology, communication technology and electronic equipment, in more and more application scenes, higher requirements are placed on the time delay of video playing. Such as video conferencing, interactive entertainment, live sporting events, and other on-demand or live application scenarios.

In an application scenario, such as video conferencing, interactive entertainment, or live sporting events, in order to provide a user with a multi-angle viewing experience, multiple cameras are usually arranged at different locations in the scene to photograph the same scene from different angles, resulting in a set of video signals. The user can select a certain angle to watch the recorded scene video in a proper interactive mode. The multi-camera shooting can provide multi-angle and three-dimensional stereoscopic vision experience for users. More generally, in other scenes with high requirements on the delay, such as a Cloud virtual reality (Cloud VR) game and a low-delay live application scene, in order to enable a user to switch between different content video pictures without any pause perception, the decoding switching playing delay of the video content is a key index affecting the user experience.

The video content switching method usually adopted is to segment the coded video according to a fixed time interval, wherein each segment takes an I frame as a starting frame, when the video content needs to be switched, the segment which is decoded or played to the nearest time point of the new content is continued, and the decoding playing is started from the segmented I frame of the new content. The switching mode has long delay time from receiving a switching instruction to completing the switching of the content, and cannot meet the low-delay requirement of some application scenes.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing a video image, wherein a second code used for generating a second code stream is controlled according to coding information of a first code, the second code stream is coded in a full-frame intra-prediction mode, and the second code stream is used as a random access code stream, so that the decoding quality of the accessed video content can be improved, the blocking effect can be reduced, and partial artifact effect can be eliminated on the basis of meeting the requirement of accessing the video content with low time delay.

In a first aspect, an embodiment of the present application provides a method for processing a video image, where the method may include: and acquiring an image to be coded. And carrying out first coding on the image to be coded to generate a first code stream. And according to the coding information of the first coding, carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image so as to generate a second code stream. The first reconstructed image is a first code stream or a reconstructed image in a first encoding process.

The first encoding and the second encoding are two different encoding modes, the first encoding allowing an inter prediction mode and the second encoding being a full intra prediction mode. When the two encoding modes are used for encoding the image or the reconstructed image with the same video content, the first encoding is used for generating a first code stream, and the second encoding is used for generating a second code stream. In some cases, the coding information of the first and second codes may be different. For example, the division manner of the first encoding and the division manner of the second encoding may be different, the quantization parameter of the first encoding and the quantization parameter of the second encoding may be different, and the like. In some cases, the encoded information of the first encoding and the second encoding may be the same. For example, the division manner of the first encoding and the division manner of the second encoding may be the same, and the quantization parameter of the first encoding and the quantization parameter of the second encoding may be the same.

At the decoding end, the first code stream and the second code stream have different functions. When the decoding end switches the displayed video content, the decoding end can decode the second code stream at the corresponding moment of the video content to be displayed first, and then the frame obtained by decoding the second code stream at the corresponding moment is used as a reference frame for decoding the subsequent frame of the first code stream, so that the subsequent decoding of the first code stream can be performed in time, the decoding of the first code stream can be performed without waiting for the next I frame of the first code stream, and the decoding delay is obviously reduced. In the scheme provided by the first aspect, the second coding is performed according to the coding information of the first coding to obtain the second code stream, so that the quality of the second code stream is equivalent to that of the first code stream, and the decoding result of the second code stream used as the reference frame of the first code stream during decoding can be smoothly linked without occurrence of conditions that user experience is influenced by obvious blocking effects or obvious artifacts during content switching.

Therefore, in the implementation manner of the first aspect, according to the coding information of the first code, the second coding in the full-frame intra-prediction mode is performed on the image to be coded or the first reconstructed image, so that the quality of the reconstructed image of the first code stream is the same as or equal to that of the reconstructed image of the corresponding second code stream, thereby being beneficial to reducing the blocking effect and partial artifact effect caused by inconsistent coding and decoding on the basis of meeting the low-delay access of the video content, and improving the decoding quality of the access video content.

In one possible design, the coding information of the first coding includes one or more of a partitioning manner of the first coding, a quantization parameter of the first coding, and coding distortion information of the first coding.

The partition manner of the first coding may include a TU partition manner of the first coding, a PU partition manner of the first coding, or a CU partition manner of the first coding. In the CU partition of the first coding, the second coding may use the same CU partition as the first coding, or the second coded CU does not cross the CU boundary of the first coding.

Taking the example that the encoded information of the first encoding includes the quantization parameter of the first encoding, the quantization parameter of the second encoding may be a quantization parameter offset superimposed on the quantization parameter of the first encoding.

In one possible design, the second encoding in the full intra prediction mode for the image to be encoded or the first reconstructed image according to the encoding information of the first encoding includes at least one of: performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting a division mode same as the first coding; or, carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting the quantization parameter same as the first coding; or, determining a quantization parameter of a second code according to the quantization parameter of the first code and the quantization parameter offset, and performing a second code of the full-frame intra prediction mode on the image to be coded or the first reconstructed image according to the quantization parameter of the second code; or, according to the coding distortion information of the first coding, determining a quantization parameter of a second coding, and according to the quantization parameter of the second coding, performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image.

In one possible design, the second encoding in the full intra prediction mode for the image to be encoded or the first reconstructed image according to the encoding information of the first encoding includes: and determining the quantization parameter of the second code according to the coding information of the first code and the characteristic information of the image to be coded. And according to the quantization parameter of the second coding, carrying out second coding in the full-frame intra-prediction mode on the image to be coded or the first reconstructed image.

In one possible design, the feature information of the image to be encoded includes one or more of content complexity of the image to be encoded, color classification information of the image to be encoded, contrast information of the image to be encoded, and content segmentation information of the image to be encoded.

The feature information of the image to be encoded may be obtained by performing feature analysis on the image to be encoded.

In one possible design, the second encoding in the full intra prediction mode for the image to be encoded or the first reconstructed image according to the encoding information of the first encoding includes: and determining at least one of a first division mode or a first coding parameter adopted by second coding of the image to be coded or the first reconstructed image according to the coding information of the first coding and the first reconstructed image. And performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to at least one of the first division mode and the first coding parameter.

The interval between two adjacent frames of the full intra-frame prediction mode in the first code stream is larger than the interval between two adjacent frames of the full intra-frame prediction mode in the second code stream.

The first code stream may also be referred to as a long GOP code stream, and the second code stream may also be referred to as a random access code stream. When the video content needs to be accessed, the second code stream is decoded, the decoding result of the second code stream is used as a reference frame, and the first code stream is decoded, so that the video content can be accessed quickly.

Therefore, in the possible design, at least one of the first division mode and the first coding parameter adopted by the second code stream is determined according to the coding information of the first code and the first reconstructed image, so that the quality of the reconstructed image of the first code stream is the same as or equivalent to that of the reconstructed image of the corresponding second code stream, thereby being beneficial to reducing the block effect and partial artifact effect caused by inconsistent coding and decoding on the basis of meeting the requirement of accessing the video content with low time delay and improving the decoding quality of the accessed video content.

In one possible design, a difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, and the second reconstructed image is a second code stream or a reconstructed image in a second encoding process.

By controlling the difference between the first reconstructed image and the second reconstructed image to be smaller than the difference threshold or the similarity between the first reconstructed image and the second reconstructed image to be higher than the similarity threshold, the quality of the reconstructed image of the first code stream and the quality of the reconstructed image of the corresponding second code stream are the same or equivalent, so that on the basis of meeting the requirement of accessing the video content with low time delay, the method is beneficial to reducing the block effect and partial artifact effect caused by inconsistent encoding and decoding, and the decoding quality of the accessed video content is improved.

In one possible design, determining at least one of a first partition manner and a first coding parameter used for performing a second coding on an image to be coded or a first reconstructed image according to coding information of a first coding and the first reconstructed image includes: determining a plurality of second division modes according to the first coded coding information and the first reconstructed image, and selecting one second division mode from the plurality of second division modes as a first division mode; and/or determining a plurality of second encoding parameters according to the first encoded encoding information and the first reconstructed image, and selecting one second encoding parameter from the plurality of second encoding parameters as the first encoding parameter.

The similarity between the first reconstructed image and the second reconstructed image is the highest similarity between the first reconstructed image and the third reconstructed images, the third reconstructed images include the second reconstructed image, the third reconstructed images are reconstructed images obtained by performing multiple second encoding on the image to be encoded or the first reconstructed image according to the second division modes and/or the second encoding parameters, or the third reconstructed images are reconstructed images of third code streams, and the third code streams are obtained by performing multiple second encoding on the image to be encoded or the first reconstructed image according to the second division modes and/or the second encoding parameters.

And respectively carrying out multiple second coding on the image to be coded or the first reconstructed image according to multiple second division modes and/or multiple second coding parameters to obtain multiple third code streams, comparing the similarity between each of the multiple third reconstructed images and the first reconstructed image, selecting one of the multiple third reconstructed images with the highest similarity as a second reconstructed image, and taking the corresponding third code stream as a second code stream. In this way, by maximizing the similarity between the first reconstructed image and the second reconstructed image, the blocking effect and partial artifact effect caused by coding and decoding inconsistency can be reduced.

It is understood that the blocking effect and partial artifact effect caused by the coding and decoding inconsistency can be reduced by minimizing the difference between the first reconstructed image and the second reconstructed image.

In one possible design, the method may further include: a prediction mode of the first encoding is obtained. And when the prediction mode of the first coding is inter-frame prediction, executing the steps of acquiring coding information of the first coding, and performing second coding of a full-frame prediction mode on the image to be coded or the first reconstructed image according to the coding information of the first coding so as to generate a second code stream. And when the prediction mode of the first code is intra-frame prediction, taking the first code stream as a second code stream.

By judging whether the prediction mode of the first code is intra-frame prediction or not, when the prediction mode of the first code is intra-frame prediction, the first code stream is directly used as the second code stream, and the efficiency of generating the first code stream and the second code stream by coding can be improved.

In one possible design, the image to be encoded is a source video image.

The quality of the first code stream and the quality of the second code stream are controlled to be consistent from the frame level through the encoding of the first code stream and the second code stream of the frame level, which is beneficial to reducing the block effect and partial artifact effect caused by the inconsistency of encoding and decoding, and improving the decoding quality of the accessed video content.

In one possible design, the image to be encoded is an image block obtained by dividing a source video image.

By encoding the first code stream and the second code stream at the image block level, the synchronous output of the two code streams at the image block level can be realized, so that a random access frame of the second code stream for accessing the video content can be obtained quickly, and the access time delay is reduced.

In one possible design, the first encoding parameter may include a first quantization parameter or a first code rate. The second encoding parameter may include a second quantization parameter or a second code rate.

In a second aspect, an embodiment of the present application provides a method for processing a video image, where the method may include: at least one first image to be coded and a second image to be coded are obtained, and the second image to be coded is a video image before the at least one first image to be coded. And respectively carrying out first coding on at least one first image to be coded to generate a first code stream. And determining at least one of a first partition mode or a first coding parameter adopted for carrying out second coding on a second image to be coded according to at least one first reconstructed image, wherein the at least one first reconstructed image is a first code stream or a reconstructed image in a first coding process. And performing second coding in a full-frame intra-prediction mode on the second image to be coded according to at least one of the first division mode and the first coding parameter to generate a second code stream.

And adjusting the encoding of the second code stream based on the encoding result of the first code stream to realize the same or equivalent quality of the reconstructed image of the first code stream and the reconstructed image of the corresponding second code stream, thereby improving the decoding quality of the accessed video content, reducing the blocking effect and eliminating part of artifact effect on the basis of meeting the requirement of accessing the video content with low time delay.

The method is beneficial to reducing blocking effect and artifact effect caused by inconsistent encoding and decoding by simulating the behavior of a decoding end in the encoding process.

Alternatively, the second image to be encoded may be a video image one frame or one or more frames apart before at least one first image to be encoded.

In one possible design, the number of at least one first image to be encoded is one, the number of at least one first reconstructed image is one, the difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold, or the similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, the second reconstructed image is obtained by decoding a first code stream by taking a third reconstructed image as a reference image, and the third reconstructed image is a second code stream or a reconstructed image in a second encoding process.

In one possible design, the number of at least one first to-be-encoded image is multiple, the number of at least one first reconstructed image is multiple, a difference between the multiple first reconstructed images and the multiple second reconstructed images is smaller than a difference threshold, or a similarity between the multiple first reconstructed images and the multiple second reconstructed images is higher than a similarity threshold, the multiple second reconstructed images are obtained by decoding a first code stream with a third reconstructed image as a reference image, and the third reconstructed image is a second code stream or a reconstructed image in a second encoding process.

Wherein the differences between the plurality of first reconstructed images and the plurality of second reconstructed images may be a weighted sum of differences between each of the plurality of first reconstructed images and the corresponding second reconstructed image. The similarity between the plurality of first reconstructed images and the plurality of second reconstructed images may be a weighted sum of the similarities between each of the plurality of first reconstructed images and the corresponding second reconstructed image.

The second reconstructed image corresponding to one of the plurality of first reconstructed images is a reconstructed image of the same video content.

In one possible design, the determining, according to the at least one first reconstructed image, at least one of a first partition manner or a first coding parameter used for second coding of a second image to be coded includes: selecting one second division mode from a plurality of second division modes as a first division mode according to the first reconstructed image; and/or selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter according to the first reconstructed image.

The similarity between the first reconstructed image and the second reconstructed image is the highest similarity between the first reconstructed image and the multiple fourth reconstructed images, the multiple fourth reconstructed images include the second reconstructed image, the multiple fourth reconstructed images are obtained by decoding the first code stream by taking the multiple fifth reconstructed images as reference images respectively, the multiple fifth reconstructed images are reconstructed images of multiple third code streams, the multiple third code streams are obtained by performing multiple second encoding on the second image to be encoded respectively according to multiple second division modes and/or multiple second encoding parameters, or the multiple fifth reconstructed images are reconstructed images obtained by performing multiple second encoding on the second image to be encoded respectively according to multiple second division modes and/or multiple second encoding parameters.

In a possible design, the at least one first image to be encoded is a plurality of first images to be encoded, and determining at least one of a first partition manner and a first encoding parameter for second encoding of a second image to be encoded according to at least one first reconstructed image includes: selecting one second division mode from the plurality of second division modes as a first division mode according to the plurality of first reconstructed images; and/or selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter.

For example, the second image to be encoded may be subjected to multiple second encoding according to multiple second division manners and/or multiple second encoding parameters, respectively, to generate multiple third code streams. The plurality of fifth reconstructed images are reconstructed images in a plurality of third code streams or a plurality of second encoding processes. The multiple fifth reconstructed images may be respectively used as reference images to decode the first code stream to obtain multiple groups of fourth reconstructed images, each group of fourth reconstructed images in the multiple groups may include fourth reconstructed images corresponding to the multiple first to-be-encoded images, and by comparing similarities between the multiple groups of fourth reconstructed images and the multiple first reconstructed images, a group of second reconstructed images with the highest similarity is selected from the multiple groups of fourth reconstructed images and is used as the second reconstructed images corresponding to the multiple first to-be-encoded images. And taking a third code stream corresponding to a group of fourth reconstructed images with the highest similarity as a second code stream. The third code stream corresponding to the group of fourth reconstructed images refers to that the group of fourth reconstructed images can be obtained by decoding the first code stream by using the reconstructed image of the third code stream or the reconstructed image in the encoding process for generating the third code stream as a reference image.

In a possible design, before performing the first encoding on at least one first image to be encoded respectively, the method further includes: and carrying out first coding on the second image to be coded to generate a fourth code stream. A prediction mode of the first encoding is obtained. And when the prediction mode of the first encoding is inter-frame prediction, determining at least one of a first partition mode or a first encoding parameter adopted by second encoding of a second image to be encoded according to at least one first reconstructed image. And when the prediction mode of the first code is intra-frame prediction, taking the fourth code stream as the second code stream.

In one possible design, the at least one first image to be encoded is at least one first source video image, and the second image to be encoded is a second source video image.

In one possible design, the first coding parameter includes a first quantization parameter or a first code rate. The second encoding parameter includes a second quantization parameter or a second code rate.

On the basis of any possible design of the first aspect or the first aspect, or any possible design of the second aspect or the second aspect, the identification information of the code stream characteristics can be carried in the encoding process. The identification information is used for distinguishing the first code stream and the second code stream by the decoding end. The identification information of the first code stream and the second code stream may be carried in any one of a parameter set, auxiliary enhancement information, a package layer, a file format, file description information, or a custom message.

In one possible design, the first code stream and the second code stream use the same parameter set, and the parameter set may carry the first identification information. The first identification information is used for indicating that the current code stream is a first code stream, or a second code stream, or the first code stream and the second code stream and the first code stream of the same video content is before the second code stream, or the first code stream and the second code stream and the first code stream of the same video content are after the second code stream. For example, the first identification information may be a stream identifier (stream _ id) in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), or a Picture Parameter Set (PPS).

In this way, the first codestream and the second codestream may be packaged together.

In one possible design, the first code stream and the second code stream use different parameter sets, and the different parameter sets may carry the second identification information. The second identification information is used for indicating that the current code stream is the first code stream or the second code stream. For example, the second identification information may be a stream identification (stream _ id) in VPS, SPS, or PPS of a different stream.

Thus, the first code stream and the second code stream may be packaged together or may be packaged independently.

In a possible design, the fragment header information of the first code stream may carry third identification information. The third identification information is used for indicating that the current code stream is the first code stream. The slice header information of the first code stream may carry fourth identification information. The fourth identification information is used for indicating that the current code stream is the second code stream. For example, the third identification information or the fourth identification information may be a stream identification (stream _ id) in the slice header information.

In one possible design, the auxiliary enhancement information of the first code stream may carry fifth identification information. The fifth identification information is used for indicating that the current code stream is the first code stream. The auxiliary enhancement information of the second code stream may carry sixth identification information. The sixth identification information is used for indicating that the current code stream is the second code stream. For example, the fifth identification information or the sixth identification information may be a stream identification (stream _ id) in the SEI.

In one possible design, the first code stream and the second code stream are packaged independently, and a packaging layer of the first code stream carries seventh identification information, where the seventh identification information is used to indicate that the current code stream is the first code stream. And the encapsulation layer of the second code stream carries eighth identification information. The eighth identification information is used to indicate that the current code stream is the second code stream. For example, the first code stream is encapsulated in a first media stream sequence (track), and the second code stream is encapsulated in a second media sequence (track), wherein the seventh identification information or the eighth identification information may be a media stream sequence class (track _ class).

In one possible design, the first code stream and the second code stream are packaged and transmitted independently, and the file format of the first code stream carries ninth identification information, where the ninth identification information is used to indicate that the current code stream is the first code stream. The file format of the second code stream carries tenth identification information. The tenth identification information is used to indicate that the current code stream is the second code stream.

In one possible design, the first code stream and the second code stream are packaged and transmitted independently, and the file description information of the first code stream carries eleventh identification information used for indicating that the current code stream is the first code stream. And the file description information of the second code stream carries twelfth identification information. The twelfth identification information is used to indicate that the current code stream is the second code stream.

In one possible design, the first code stream and the second code stream are transmitted in a self-defined message mode, where the self-defined message in which the first code stream is located carries thirteenth identification information, where the thirteenth identification information is used to indicate that the current code stream is the first code stream. And the self-defined message of the second code stream carries fourteenth identification information. The fourteenth identification information is used to indicate that the current code stream is the second code stream. For example, the custom message is a TLV message. The thirteenth identification information or the fourteenth identification information may be type information in TLV.

In a third aspect, the present application provides a video image processing apparatus, which may be an electronic device or a server, for example, a chip or a system on chip in the electronic device or the server, and for example, may be a functional module in the electronic device or the server, which is used to implement the first aspect or any possible implementation manner of the first aspect. For example, the apparatus for processing video images comprises: and the acquisition module is used for acquiring the image to be coded. The first coding module is used for carrying out first coding on the image to be coded so as to generate a first code stream. And the second coding module is used for carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to the coding information of the first coding so as to generate a second code stream, wherein the first reconstructed image is the first code stream or a reconstructed image in the first coding process.

In one possible design, the second encoding module is to perform at least one of: performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting a division mode same as the first coding; or, carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting the quantization parameter same as the first coding; or, determining a quantization parameter of a second code according to the quantization parameter of the first code and the quantization parameter offset, and performing a second code of the full intra prediction mode on the image to be coded or the first reconstructed image according to the quantization parameter of the second code; or, according to the coding distortion information of the first coding, determining a quantization parameter of a second coding, and according to the quantization parameter of the second coding, performing second coding in a full-frame intra prediction mode on the image to be coded or the first reconstructed image.

In one possible design, the second encoding module is to: determining a quantization parameter of a second code according to the coding information of the first code and the characteristic information of the image to be coded; and according to the quantization parameter of the second coding, carrying out second coding in the full-frame intra-prediction mode on the image to be coded or the first reconstructed image.

In a possible design, the second encoding module is configured to determine at least one of a first partition manner and a first encoding parameter used for second encoding of the image to be encoded or the first reconstructed image according to the encoding information of the first encoding and the first reconstructed image. And the second coding module is also used for carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to at least one of the first division mode or the first coding parameter.

In one possible design, the difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold or the similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, and the second reconstructed image is a second code stream or a reconstructed image in a second encoding process.

In one possible design, the second encoding module is to: determining a plurality of second division modes according to the first coded coding information and the first reconstructed image, and selecting one second division mode from the plurality of second division modes as a first division mode; and/or determining a plurality of second coding parameters according to the first coded coding information and the first reconstructed image, and selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter.

In one possible design, the second encoding module is further to: a prediction mode for the first encoding is obtained. And when the prediction mode of the first coding is inter-frame prediction, executing the steps of acquiring coding information of the first coding, and performing second coding of a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to the coding information of the first coding so as to generate a second code stream. And when the prediction mode of the first code is intra-frame prediction, taking the first code stream as a second code stream.

In one possible design, the image to be encoded is a source video image; or the image to be coded is an image block obtained by dividing the source video image.

In one possible design, the first encoding parameter includes a first quantization parameter or a first code rate. The second encoding parameter includes a second quantization parameter or a second code rate.

In a fourth aspect, the present application provides a video image processing apparatus, which may be an electronic device or a server, for example, a chip or a system on chip in the electronic device or the server, and for example, may be a functional module in the electronic device or the server for implementing the second aspect or any possible implementation manner of the second aspect. For example, the apparatus for processing video images comprises: the acquisition module is used for acquiring at least one first image to be coded and a second image to be coded, wherein the second image to be coded is a video image before the at least one first image to be coded. The first coding module is used for respectively carrying out first coding on at least one first image to be coded so as to generate a first code stream. The second coding module is used for determining at least one of a first partition mode or a first coding parameter adopted for carrying out second coding on a second image to be coded according to at least one first reconstructed image, wherein the at least one first reconstructed image is a first code stream or a reconstructed image in a first coding process. And the second coding module is further used for carrying out second coding on the second image to be coded according to at least one of the first division mode or the first coding parameter so as to generate a second code stream.

In one possible design, the number of at least one first image to be encoded is one, the number of at least one first reconstructed image is one, the difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold, or the similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, the second reconstructed image is obtained by decoding the first code stream with the third reconstructed image as a reference image, and the third reconstructed image is the second code stream or a reconstructed image in the second encoding process.

In one possible design, the number of the at least one first to-be-encoded image is one, the number of the at least one first reconstructed image is one, and the second encoding module is configured to: selecting one second division mode from a plurality of second division modes as a first division mode according to the first reconstructed image; and/or selecting one second encoding parameter from a plurality of second encoding parameters as the first encoding parameter according to the first reconstructed image.

The similarity between the first reconstructed image and the second reconstructed image is the highest similarity between the first reconstructed image and a plurality of fourth reconstructed images, the plurality of fourth reconstructed images comprise the second reconstructed image, the plurality of fourth reconstructed images are obtained by decoding the first code stream by taking the plurality of fifth reconstructed images as reference images respectively, the plurality of fifth reconstructed images are reconstructed images of a plurality of third code streams, the plurality of third code streams are obtained by performing secondary encoding on the second image to be encoded for a plurality of times respectively according to a plurality of second division modes and/or a plurality of second encoding parameters, or the plurality of fifth reconstructed images are reconstructed images obtained by performing secondary encoding on the second image to be encoded for a plurality of times respectively according to a plurality of second division modes and/or a plurality of second encoding parameters.

In one possible design, the first encoding module is further to: before at least one first image to be coded is subjected to first coding, first coding is carried out on a second image to be coded to generate a fourth code stream. The second encoding module is further to: a prediction mode of the first encoding is obtained. And when the prediction mode of the first encoding is inter-frame prediction, determining at least one of a first partition mode or a first encoding parameter adopted by second encoding of the second image to be encoded according to at least one first reconstructed image. And when the prediction mode of the first code is intra-frame prediction, taking the fourth code stream as the second code stream.

In a fifth aspect, an embodiment of the present application provides a video image processing apparatus, including: one or more processors. A memory for storing one or more programs. When executed by the one or more processors, cause the one or more processors to implement a method as claimed in any one of the first aspect or the first aspect, or cause the one or more processors to implement a method as claimed in any one of the second aspect or the second aspect.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, including the first code stream and the second code stream obtained by the method according to any one of the first aspect or the first aspect, or including the first code stream and the second code stream obtained by the method according to any one of the second aspect or the second aspect.

In a seventh aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to perform the method according to the first aspect or any one of the first aspects, or the second aspect or any one of the second aspects.

In an eighth aspect, embodiments of the present application provide a computer-readable storage medium, including computer instructions, which, when executed on a computer, cause the computer to perform the method according to the first aspect or any one of the first aspects, or the second aspect or any one of the second aspects.

It should be understood that the third to eighth aspects of the present application are consistent with the technical solutions of the first to second aspects of the present application, and similar beneficial effects are obtained in each aspect and the corresponding possible implementation manner, and are not described again.

Drawings

FIG. 1A is a block diagram of an example of a video encoding and decoding system 10 for implementing embodiments of the present application;

FIG. 1B is a block diagram of an example of a video coding system 40 for implementing embodiments of the present application;

FIG. 2 is a block diagram of an example structure of an encoder 20 for implementing embodiments of the present application;

FIG. 3 is a block diagram of an example structure of a decoder 30 for implementing embodiments of the present application;

FIG. 4 is a block diagram of an example of a video coding apparatus 400 for implementing embodiments of the present application;

FIG. 5 is a block diagram of another example of an encoding device or a decoding device for implementing embodiments of the present application;

FIG. 6 is a schematic diagram illustrating an application scenario of a multi-camera shooting of a sporting event according to an embodiment of the present application;

fig. 7 is a schematic diagram of a track of a decoding frame for switching from a current video content to another video content according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a video image processing system according to an embodiment of the present application;

fig. 9 is a schematic diagram of a video image processing method according to an embodiment of the present application;

fig. 10 is a schematic flowchart of a method for processing a video image according to an embodiment of the present application;

fig. 11 is a schematic flowchart of a method for processing a video image according to an embodiment of the present application;

Fig. 12 is a schematic flowchart of a method for processing a video image according to an embodiment of the present application;

fig. 13 is a schematic flowchart of a method for processing a video image according to an embodiment of the present application;

fig. 14 is a schematic flowchart of a method for processing a video image according to an embodiment of the present application;

fig. 15 is a schematic flowchart of a method for processing a video image according to an embodiment of the present application;

fig. 16 is a schematic diagram illustrating that the first coding and the second coding provided in the embodiment of the present application use the same TU partition manner;

fig. 17 is a schematic diagram of a first encoded quantization parameter and a second encoded quantization parameter provided in an embodiment of the present application;

fig. 18 is a schematic diagram of a quantization parameter transmitted from a first code to a second code according to an embodiment of the present application;

fig. 19 is a schematic diagram of an arrangement form of a first code stream and a second code stream when a stream identifier (stream _ id) provided in this embodiment of the present application is 2;

fig. 20 is a schematic diagram of an arrangement form of a first code stream and a second code stream when a stream identifier (stream _ id) provided in this embodiment of the present application is 3;

fig. 21 is a combination manner of combining three first code streams and a second code stream into one code stream according to the embodiment of the present application;

fig. 22 is a schematic structural diagram of a video image processing apparatus according to an embodiment of the present application.

Detailed Description

The embodiments of the present application will be described below with reference to the drawings. In the following description, reference is made to the accompanying drawings which form a part hereof and in which is shown by way of illustration specific aspects of embodiments of the application or in which specific aspects of embodiments of the application may be employed. It should be understood that embodiments of the present application may be used in other ways and may include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present application is defined by the appended claims. For example, it should be understood that the disclosure in connection with the described methods may equally apply to the corresponding apparatus or system for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may comprise one or more units, such as functional units, to perform the described one or more method steps (e.g., a unit performs one or more steps, or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units, such as functional units, the corresponding method may comprise one step to perform the functionality of the one or more units (e.g., one step performs the functionality of the one or more units, or multiple steps, each of which performs the functionality of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the figures. Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.

The technical scheme related to the embodiment of the application can be applied to the existing video coding standards (such as H.264, HEVC and the like), and can also be applied to the future video coding standards (such as H.266 standard). The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application. Some concepts that may be involved in embodiments of the present application are briefly described below.

Video coding generally refers to processing a sequence of pictures that form a video or video sequence. In the field of video coding, the terms "picture", "frame" or "image" may be used as synonyms. Video encoding as used herein means video encoding or video decoding. Video encoding is performed on the source side, typically including processing (e.g., by compressing) the original video picture to reduce the amount of data required to represent the video picture for more efficient storage and/or transmission. Video decoding is performed at the destination side, typically involving inverse processing with respect to the encoder, to reconstruct the video pictures. Embodiments are directed to video picture "encoding" to be understood as referring to "encoding" or "decoding" of a video sequence. The combination of the encoding and decoding portions is also referred to as CODEC (coding and decoding, CODEC).

A video sequence comprises a series of images (pictures) which are further divided into slices (slices) which are further divided into blocks (blocks). Video coding performs the coding process in units of blocks, and in some new video coding standards, the concept of blocks is further extended. For example, in the h.264 standard, there is a Macroblock (MB), which may be further divided into a plurality of prediction blocks (partitions) that can be used for predictive coding. In the High Efficiency Video Coding (HEVC) standard, basic concepts such as a Coding Unit (CU), a Prediction Unit (PU), and a Transform Unit (TU) are adopted, and various block units are functionally divided, and a brand new tree-based structure is adopted for description. For example, a CU may be partitioned into smaller CUs in a quadtree, and the smaller CUs may be further partitioned to form a quadtree structure, where a CU is a basic unit for partitioning and encoding an encoded image. There is also a similar tree structure for PU and TU, and PU may correspond to a prediction block, which is the basic unit of predictive coding. The CU is further partitioned into PUs according to a partitioning pattern. A TU may correspond to a transform block, which is a basic unit for transforming a prediction residual. However, CU, PU and TU are all concepts of blocks (or image blocks).

For example, in HEVC, a CTU is split into CUs by using a quadtree structure represented as a coding tree. A decision is made at the CU level whether to encode a picture region using inter-picture (temporal) or intra-picture (spatial) prediction. Each CU may be further split into one, two, or four PUs according to the PU split type. The same prediction process is applied within one PU and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying a prediction process based on the PU split type, the CU may be partitioned into Transform Units (TUs) according to other quadtree structures similar to the coding tree used for the CU. In recent developments of video compression techniques, the coding blocks are partitioned using Quad-tree and binary tree (QTBT) partition frames. In the QTBT block structure, a CU may be square or rectangular in shape.

Herein, for convenience of description and understanding, an image block to be encoded in a currently encoded image may be referred to as a current block, e.g., in encoding, referring to a block currently being encoded; in decoding, refers to the block that is currently being decoded. A decoded image block of a reference image used for predicting the current block is referred to as a reference block, i.e. the reference block is a block that provides a reference signal for the current block, wherein the reference signal represents pixel values within the image block. A block in the reference picture that provides a prediction signal for the current block may be a prediction block, wherein the prediction signal represents pixel values or sample values or a sampled signal within the prediction block. For example, after traversing multiple reference blocks, a best reference block is found that will provide prediction for the current block, which is called a prediction block.

In the case of lossless video coding, the original video picture can be reconstructed, i.e., the reconstructed video picture has the same quality as the original video picture (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, the amount of data needed to represent the video picture is reduced by performing further compression, e.g., by quantization, while the decoder side cannot fully reconstruct the video picture, i.e., the quality of the reconstructed video picture is lower or worse than the quality of the original video picture.

Several video coding standards of h.261 belong to the "lossy hybrid video codec" (i.e., the combination of spatial and temporal prediction in the sample domain with 2D transform coding in the transform domain for applying quantization). Each picture of a video sequence is typically partitioned into non-overlapping sets of blocks, typically encoded at the block level. In other words, the encoder side typically processes, i.e., encodes, video at the block (video block) level, e.g., generates a prediction block by spatial (intra-picture) prediction and temporal (inter-picture) prediction, subtracts the prediction block from the current block (currently processed or block to be processed) to obtain a residual block, transforms the residual block and quantizes the residual block in the transform domain to reduce the amount of data to be transmitted (compressed), while the decoder side applies the inverse processing portion relative to the encoder to the encoded or compressed block to reconstruct the current block for representation. In addition, the encoder replicates the decoder processing loop such that the encoder and decoder generate the same prediction (e.g., intra-prediction and inter-prediction) and/or reconstruction for processing, i.e., encoding, subsequent blocks.

The system architecture to which the embodiments of the present application apply is described below. Referring to fig. 1A, fig. 1A schematically illustrates a block diagram of a video encoding and decoding system 10 applied in an embodiment of the present application. As shown in fig. 1A, video encoding and decoding system 10 may include a source device 12 and a destination device 14, source device 12 generating encoded video data, and thus source device 12 may be referred to as a video encoding apparatus. Destination device 14 may decode the encoded video data generated by source device 12, and thus destination device 14 may be referred to as a video decoding apparatus. Various implementations of source apparatus 12, destination apparatus 14, or both may include one or more processors and memory coupled to the one or more processors. The memory can include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer, as described herein. Source apparatus 12 and destination apparatus 14 may comprise a variety of devices, including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, wireless communication devices, or the like.

Although fig. 1A depicts source apparatus 12 and destination apparatus 14 as separate apparatuses, an apparatus embodiment may also include the functionality of both source apparatus 12 and destination apparatus 14 or both, i.e., source apparatus 12 or corresponding functionality and destination apparatus 14 or corresponding functionality. In such embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof.

A communication connection may be made between source device 12 and destination device 14 over link 13, and destination device 14 may receive encoded video data from source device 12 via link 13. Link 13 may comprise one or more media or devices capable of moving encoded video data from source apparatus 12 to destination apparatus 14. In one example, link 13 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. In this example, source apparatus 12 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination apparatus 14. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The one or more communication media may include routers, switches, base stations, or other apparatuses that facilitate communication from source apparatus 12 to destination apparatus 14.

Source device 12 includes an encoder 20, and in the alternative, source device 12 may also include a picture source 16, a picture preprocessor 18, and a communication interface 22. In one embodiment, the encoder 20, the picture source 16, the picture preprocessor 18, and the communication interface 22 may be hardware components of the source device 12 or may be software programs of the source device 12. Described below, respectively:

the picture source 16, which may include or be any type of picture capture device, for example, capturing a real-world picture, and/or any type of picture or comment (for screen content encoding, some text on the screen is also considered part of the picture or image to be encoded) generation device, for example, a computer graphics processor for generating a computer animation picture, or any type of device for obtaining and/or providing a real-world picture, a computer animation picture (e.g., screen content, a Virtual Reality (VR) picture), and/or any combination thereof (e.g., an Augmented Reality (AR) picture). The picture source 16 may be a camera for capturing pictures or a memory for storing pictures, and the picture source 16 may also include any kind of (internal or external) interface for storing previously captured or generated pictures and/or for obtaining or receiving pictures. When picture source 16 is a camera, picture source 16 may be, for example, an integrated camera local or integrated in the source device; when the picture source 16 is a memory, the picture source 16 can be an integrated memory that is local or integrated, for example, in the source device. When the picture source 16 comprises an interface, the interface may for example be an external interface receiving pictures from an external video source, for example an external picture capturing device such as a camera, an external memory or an external picture generating device, for example an external computer graphics processor, computer or server. The interface may be any kind of interface according to any proprietary or standardized interface protocol, e.g. a wired or wireless interface, an optical interface.

The picture may be regarded as a two-dimensional array or matrix of pixel elements (picture elements). The pixels in the array may also be referred to as sampling points. The number of sample points in the array or picture in the horizontal and vertical directions (or axes) defines the size and/or resolution of the picture. To represent color, three color components are typically employed, i.e., a picture may be represented as or contain three sample arrays. For example, in RBG format or color space, a picture includes corresponding arrays of red, green, and blue samples. However, in video coding, each pixel is typically represented in a luminance/chrominance format or color space, e.g. for pictures in YUV format, comprising a luminance component (sometimes also indicated with L) indicated by Y and two chrominance components indicated by U and V. The luminance (luma) component Y represents luminance or gray level intensity (e.g., both are the same in a gray scale picture), while the two chrominance (chroma) components U and V represent chrominance or color information components. Accordingly, a picture in YUV format includes a luma sample array of luma sample values (Y) and two chroma sample arrays of chroma values (U and V). Pictures in RGB format can be converted or transformed into YUV format and vice versa, a process also known as color transformation or conversion. If the picture is black and white, the picture may include only an array of luminance samples. In the embodiment of the present application, the pictures transmitted from the picture source 16 to the picture processor may also be referred to as raw picture data 17.

Picture pre-processor 18 is configured to receive original picture data 17 and perform pre-processing on original picture data 17 to obtain pre-processed picture 19 or pre-processed picture data 19. For example, the pre-processing performed by picture pre-processor 18 may include trimming, color format conversion (e.g., from RGB format to YUV format), toning, or de-noising.

An encoder 20 (or video encoder 20) for receiving the pre-processed picture data 19, processing the pre-processed picture data 19 with a relevant prediction mode (such as the prediction mode in various embodiments herein), thereby providing encoded picture data 21 (structural details of the encoder 20 will be described further below based on fig. 2 or fig. 4 or fig. 5). In some embodiments, the encoder 20 may be configured to perform various embodiments described later to implement the application of the chroma block prediction method described in this application on the encoding side.

A communication interface 22, which may be used to receive encoded picture data 21 and may transmit encoded picture data 21 over link 13 to destination device 14 or any other device (e.g., memory) for storage or direct reconstruction, which may be any device for decoding or storage. Communication interface 22 may, for example, be used to encapsulate encoded picture data 21 into a suitable format, such as a data packet, for transmission over link 13.

Destination device 14 includes a decoder 30, and optionally destination device 14 may also include a communication interface 28, a picture post-processor 32, and a display device 34. Described below, respectively:

communication interface 28 may be used to receive encoded picture data 21 from source device 12 or any other source, such as a storage device, such as an encoded picture data storage device. The communication interface 28 may be used to transmit or receive the encoded picture data 21 by way of a link 13 between the source device 12 and the destination device 14, or by way of any type of network, such as a direct wired or wireless connection, any type of network, such as a wired or wireless network or any combination thereof, or any type of private and public networks, or any combination thereof. Communication interface 28 may, for example, be used to decapsulate data packets transmitted by communication interface 22 to obtain encoded picture data 21.

Both communication interface 28 and communication interface 22 may be configured as a one-way communication interface or a two-way communication interface, and may be used, for example, to send and receive messages to establish a connection, acknowledge and exchange any other information related to a communication link and/or data transfer, such as an encoded picture data transfer.

A decoder 30, otherwise referred to as decoder 30, for receiving the encoded picture data 21 and providing decoded picture data 31 or decoded pictures 31 (structural details of the decoder 30 will be described further below based on fig. 3 or fig. 4 or fig. 5). In some embodiments, the decoder 30 may be configured to perform various embodiments described hereinafter to implement the application of the chroma block prediction method described herein on the decoding side.

A picture post-processor 32 for performing post-processing on the decoded picture data 31 (also referred to as reconstructed picture data) to obtain post-processed picture data 33. Post-processing performed by picture post-processor 32 may include: color format conversion (e.g., from YUV to RGB format), toning, cropping, or resampling, or any other process, may also be used to transmit post-processed picture data 33 to display device 34.

A display device 34 for receiving the post-processed picture data 33 for displaying pictures to, for example, a user or viewer. Display device 34 may be or may include any type of display for presenting the reconstructed picture, such as an integrated or external display or monitor. For example, the display may include a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), a Digital Light Processor (DLP), or any other display of any kind.

Although fig. 1A depicts source device 12 and destination device 14 as separate devices, device embodiments may also include the functionality of both source device 12 and destination device 14 or both, i.e., source device 12 or corresponding functionality and destination device 14 or corresponding functionality. In such embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof.

It will be apparent to those skilled in the art from this description that the existence and (exact) division of the functionality of the different elements, or source device 12 and/or destination device 14 as shown in fig. 1A, may vary depending on the actual device and application. Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, a mobile phone, a smartphone, a tablet or tablet computer, a camcorder, a desktop computer, a set-top box, a television, a camera, an in-vehicle device, a display device, a digital media player, a video game console, a video streaming device (e.g., a content service server or a content distribution server), a broadcast receiver device, a broadcast transmitter device, etc., and may not use or use any type of operating system.

Both encoder 20 and decoder 30 may be implemented as any of a variety of suitable circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, or any combinations thereof. If the techniques are implemented in part in software, an apparatus may store instructions of the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered one or more processors.

In some cases, the video encoding and decoding system 10 shown in fig. 1A is merely an example, and the techniques of this application may be applicable to video encoding settings (e.g., video encoding or video decoding) that do not necessarily involve any data communication between the encoding and decoding devices. In other examples, the data may be retrieved from local storage, streamed over a network, and so on. A video encoding device may encode and store data to a memory, and/or a video decoding device may retrieve and decode data from a memory. In some examples, the encoding and decoding are performed by devices that do not communicate with each other, but merely encode data to and/or retrieve data from memory and decode data.

Referring to fig. 1B, fig. 1B is an illustrative diagram of an example of a video coding system 40 including the encoder 20 of fig. 2 and/or the decoder 30 of fig. 3, according to an example embodiment. Video coding system 40 may implement a combination of the various techniques of embodiments of this application. In the illustrated embodiment, video coding system 40 may include an imaging device 41, an encoder 20, a decoder 30 (and/or a video codec implemented by logic 47 of a processing unit 46), an antenna 42, one or more processors 43, one or more memories 44, and/or a display device 45.

As shown in fig. 1B, the imaging device 41, the antenna 42, the processing unit 46, the logic circuit 47, the encoder 20, the decoder 30, the processor 43, the memory 44, and/or the display device 45 can communicate with each other. As discussed, although video coding system 40 is depicted with encoder 20 and decoder 30, in different examples video coding system 40 may include only encoder 20 or only decoder 30.

In some instances, antenna 42 may be used to transmit or receive an encoded bitstream of video data. Additionally, in some instances, display device 45 may be used to present video data. In some examples, logic 47 may be implemented by processing unit 46. Processing unit 46 may comprise application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like. Video decoding system 40 may also include an optional processor 43, which optional processor 43 similarly may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like. In some examples, the logic 47 may be implemented in hardware, such as video encoding specific hardware, and the processor 43 may be implemented in general purpose software, an operating system, and so on. In addition, the Memory 44 may be any type of Memory, such as a volatile Memory (e.g., static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), etc.), a nonvolatile Memory (e.g., flash Memory, etc.), and the like. In a non-limiting example, storage 44 may be implemented by an ultracache memory. In some examples, logic circuitry 47 may access memory 44 (e.g., to implement an image buffer). In other examples, logic 47 and/or processing unit 46 may include memory (e.g., cache, etc.) for implementing image buffers, etc.

In some examples, encoder 20, implemented by logic circuitry, may include an image buffer (e.g., implemented by processing unit 46 or memory 44) and a graphics processing unit (e.g., implemented by processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include an encoder 20 implemented by logic circuitry 47 to implement the various modules discussed with reference to fig. 2 and/or any other encoder system or subsystem described herein. Logic circuitry may be used to perform various operations discussed herein.

In some examples, decoder 30 may be implemented by logic circuitry 47 in a similar manner to implement the various modules discussed with reference to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. In some examples, logic circuit implemented decoder 30 may include an image buffer (implemented by processing unit 2820 or memory 44) and a graphics processing unit (e.g., implemented by processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include a decoder 30 implemented by logic circuitry 47 to implement the various modules discussed with reference to fig. 3 and/or any other decoder system or subsystem described herein.

In some instances, antenna 42 may be used to receive an encoded bitstream of video data. As discussed, the encoded bitstream may include data related to the encoded video frame, indicators, index values, mode selection data, etc., discussed herein, such as data related to the encoding partition (e.g., transform coefficients or quantized transform coefficients, (as discussed) optional indicators, and/or data defining the encoding partition). Video coding system 40 may also include a decoder 30 coupled to antenna 42 and used to decode the encoded bitstream. The display device 45 is used to present video frames.

It should be understood that for the example described with reference to encoder 20 in the embodiments of the present application, decoder 30 may be used to perform the reverse process. With respect to signaling syntax elements, decoder 30 may be configured to receive and parse such syntax elements and decode the associated video data accordingly. In some examples, encoder 20 may entropy encode the syntax elements into an encoded video bitstream. In such instances, decoder 30 may parse such syntax elements and decode the relevant video data accordingly.

It should be noted that the encoder 20 and the decoder 30 in the embodiment of the present application may be a codec corresponding to a video standard protocol such as h.263, h.264, HEVV, MPEG-2, MPEG-4, VP8, VP9, or a next generation video standard protocol (e.g., h.266).

Referring to fig. 2, fig. 2 shows a schematic/conceptual block diagram of an example of an encoder 20 for implementing embodiments of the present application. In the example of fig. 2, encoder 20 includes a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a buffer 216, a loop filter unit 220, a Decoded Picture Buffer (DPB) 230, a prediction processing unit 260, and an entropy encoding unit 270. Prediction processing unit 260 may include inter prediction unit 244, intra prediction unit 254, and mode selection unit 262. Inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The encoder 20 shown in fig. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.

For example, the residual calculation unit 204, the transform processing unit 206, the quantization unit 208, the prediction processing unit 260, and the entropy encoding unit 270 form a forward signal path of the encoder 20, and, for example, the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the Decoded Picture Buffer (DPB) 230, the prediction processing unit 260 form a backward signal path of the encoder, wherein the backward signal path of the encoder corresponds to a signal path of a decoder (see the decoder 30 in fig. 3).

The encoder 20 receives, e.g., via an input 202, a picture 201 or an image block 203 of the picture 201, e.g., a picture in a sequence of pictures forming a video or a video sequence. Image block 203 may also be referred to as a current picture block or a picture block to be encoded, and picture 201 may be referred to as a current picture or a picture to be encoded (especially when the current picture is distinguished from other pictures in video encoding, such as previously encoded and/or decoded pictures in the same video sequence, i.e., a video sequence that also includes the current picture).

An embodiment of the encoder 20 may comprise a partitioning unit (not shown in fig. 2) for partitioning the picture 201 into a plurality of blocks, e.g. image blocks 203, typically into a plurality of non-overlapping blocks. The partitioning unit may be used to use the same block size for all pictures in a video sequence and a corresponding grid defining the block size, or to alter the block size between pictures or subsets or groups of pictures and partition each picture into corresponding blocks.

In one example, prediction processing unit 260 of encoder 20 may be used to perform any combination of the above-described segmentation techniques.

Like the picture 201, the image block 203 is also or can be considered as a two-dimensional array or matrix of sample points having sample values, although of a smaller size than the picture 201. In other words, the image block 203 may comprise, for example, one sample array (e.g., a luminance array in the case of a black and white picture 201) or three sample arrays (e.g., a luminance array and two chrominance arrays in the case of a color picture) or any other number and/or class of arrays depending on the color format applied. The number of sampling points in the horizontal and vertical directions (or axes) of the image block 203 defines the size of the image block 203.

The encoder 20 as shown in fig. 2 is used to encode a picture 201 block by block, e.g. performing encoding and prediction for each image block 203.

Residual calculation unit 204 is used to calculate residual block 205 based on picture image block 203 and prediction block 265 (other details of prediction block 265 are provided below), e.g., by subtracting sample values of prediction block 265 from sample values of picture image block 203 on a sample-by-sample (pixel-by-pixel) basis to obtain residual block 205 in the sample domain.

The transform processing unit 206 is configured to apply a transform, such as a Discrete Cosine Transform (DCT) or a Discrete Sine Transform (DST), on the sample values of the residual block 205 to obtain transform coefficients 207 in a transform domain. The transform coefficients 207 may also be referred to as transform residual coefficients and represent the residual block 205 in the transform domain.

The transform processing unit 206 may be used to apply integer approximations of DCT/DST, such as the transform specified for HEVC/h.265. Such integer approximations are typically scaled by some factor compared to the orthogonal DCT transform. To maintain the norm of the residual block processed by the forward transform and the inverse transform, an additional scaling factor is applied as part of the transform process. The scaling factor is typically selected based on certain constraints, e.g., the scaling factor is a power of 2 for a shift operation, a trade-off between bit depth of transform coefficients, accuracy and implementation cost, etc. For example, a specific scaling factor may be specified on the decoder 30 side for the inverse transform by, for example, inverse transform processing unit 212 (and on the encoder 20 side for the corresponding inverse transform by, for example, inverse transform processing unit 212), and correspondingly, a corresponding scaling factor may be specified on the encoder 20 side for the forward transform by transform processing unit 206.

Quantization unit 208 is used to quantize transform coefficients 207, e.g., by applying scalar quantization or vector quantization, to obtain quantized transform coefficients 209. Quantized transform coefficients 209 may also be referred to as quantized residual coefficients 209. The quantization process may reduce the bit depth associated with some or all of transform coefficients 207. For example, n-bit transform coefficients may be rounded down to m-bit transform coefficients during quantization, where n is greater than m. The quantization level may be modified by adjusting a Quantization Parameter (QP). For example, for scalar quantization, different scales may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. An appropriate quantization step size may be indicated by a Quantization Parameter (QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization step sizes. For example, a smaller quantization parameter may correspond to a fine quantization (smaller quantization step size) and a larger quantization parameter may correspond to a coarse quantization (larger quantization step size), or vice versa. The quantization may comprise a division by a quantization step size and a corresponding quantization or inverse quantization, e.g. performed by inverse quantization 210, or may comprise a multiplication by a quantization step size. Embodiments according to some standards, such as HEVC, may use a quantization parameter to determine the quantization step size. In general, the quantization step size may be calculated based on the quantization parameter using a fixed point approximation of an equation that includes division. Additional scaling factors may be introduced for quantization and dequantization to recover the norm of the residual block that may be modified due to the scale used in the fixed point approximation of the equation for the quantization step size and quantization parameter. In one example embodiment, the inverse transform and inverse quantization scales may be combined. Alternatively, a custom quantization table may be used and signaled from the encoder to the decoder, e.g., in a bitstream. Quantization is a lossy operation, where the larger the quantization step size, the greater the loss.

The inverse quantization unit 210 is configured to apply inverse quantization of the quantization unit 208 on the quantized coefficients to obtain inverse quantized coefficients 211, e.g., to apply an inverse quantization scheme of the quantization scheme applied by the quantization unit 208 based on or using the same quantization step as the quantization unit 208. The dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211, corresponding to transform coefficients 207, although the loss due to quantization is typically not the same as the transform coefficients.

The inverse transform processing unit 212 is configured to apply an inverse transform of the transform applied by the transform processing unit 206, for example, an inverse Discrete Cosine Transform (DCT) or an inverse Discrete Sine Transform (DST), to obtain an inverse transform block 213 in the sample domain. The inverse transform block 213 may also be referred to as an inverse transform dequantized block 213 or an inverse transform residual block 213.

The reconstruction unit 214 (e.g., summer 214) is used to add the inverse transform block 213 (i.e., the reconstructed residual block 213) to the prediction block 265 to obtain the reconstructed block 215 in the sample domain, e.g., to add sample values of the reconstructed residual block 213 to sample values of the prediction block 265.

Optionally, a buffer unit 216 (or simply "buffer" 216), such as a line buffer 216, is used to buffer or store the reconstructed block 215 and corresponding sample values for, e.g., intra prediction. In other embodiments, the encoder may be used to use the unfiltered reconstructed block and/or corresponding sample values stored in buffer unit 216 for any class of estimation and/or prediction, such as intra prediction.

For example, embodiments of encoder 20 may be configured such that buffer unit 216 is used not only to store reconstructed block 215 for intra prediction 254, but also for loop filter unit 220 (not shown in fig. 2), and/or such that buffer unit 216 and decoded picture buffer unit 230 form one buffer, for example. Other embodiments may be used to use filtered block 221 and/or blocks or samples from decoded picture buffer 230 (none shown in fig. 2) as an input or basis for intra prediction 254.

Loop filter unit 220 (or simply "loop filter" 220) is used to filter reconstructed block 215 to obtain filtered block 221 to facilitate pixel transitions or to improve video quality. Loop filter unit 220 is intended to represent one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), or a sharpening or smoothing filter, or a collaborative filter. Although loop filter unit 220 is shown in fig. 2 as an in-loop filter, in other configurations, loop filter unit 220 may be implemented as a post-loop filter. The filtered block 221 may also be referred to as a filtered reconstructed block 221. The decoded picture buffer 230 may store the reconstructed encoded block after the loop filter unit 220 performs a filtering operation on the reconstructed encoded block.

Embodiments of encoder 20 (correspondingly, loop filter unit 220) may be used to output loop filter parameters (e.g., sample adaptive offset information), e.g., directly or after entropy encoding by entropy encoding unit 270 or any other entropy encoding unit, e.g., such that decoder 30 may receive and apply the same loop filter parameters for decoding.

Decoded Picture Buffer (DPB) 230 may be a reference picture memory that stores reference picture data for use by encoder 20 in encoding video data. DPB 230 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. The DPB 230 and the buffer 216 may be provided by the same memory device or separate memory devices. In a certain example, a Decoded Picture Buffer (DPB) 230 is used to store filtered blocks 221. Decoded picture buffer 230 may further be used to store other previously filtered blocks, such as previously reconstructed and filtered blocks 221, of the same current picture or of a different picture, such as a previously reconstructed picture, and may provide the complete previously reconstructed, i.e., decoded picture (and corresponding reference blocks and samples) and/or partially reconstructed current picture (and corresponding reference blocks and samples), e.g., for inter prediction. In a certain example, if reconstructed block 215 is reconstructed without in-loop filtering, decoded Picture Buffer (DPB) 230 is used to store reconstructed block 215.

Prediction processing unit 260, also referred to as block prediction processing unit 260, is used to receive or obtain image block 203 (current image block 203 of current picture 201) and reconstructed picture data, e.g., reference samples of the same (current) picture from buffer 216 and/or reference picture data 231 from one or more previously decoded pictures of decoded picture buffer 230, and to process such data for prediction, i.e., to provide prediction block 265, which may be inter-predicted block 245 or intra-predicted block 255.

The mode selection unit 262 may be used to select a prediction mode (e.g., intra or inter prediction mode) and/or a corresponding prediction block 245 or 255 used as the prediction block 265 to calculate the residual block 205 and reconstruct the reconstructed block 215.

Embodiments of mode selection unit 262 may be used to select a prediction mode (e.g., selected from those supported by prediction processing unit 260) that provides the best match or minimum residual (minimum residual means better compression in transmission or storage), or that provides the minimum signaling overhead (minimum signaling overhead means better compression in transmission or storage), or both. The mode selection unit 262 may be configured to determine a prediction mode based on Rate Distortion Optimization (RDO), i.e., select a prediction mode that provides the minimum rate distortion optimization, or select a prediction mode in which the associated rate distortion at least meets the prediction mode selection criteria.

The prediction processing performed by the instance of the encoder 20 (e.g., by the prediction processing unit 260) and the mode selection performed (e.g., by the mode selection unit 262) will be explained in detail below.

As described above, the encoder 20 is configured to determine or select the best or optimal prediction mode from a (predetermined) set of prediction modes. The set of prediction modes may include, for example, intra-prediction modes and/or inter-prediction modes.

The intra prediction mode set may include 35 different intra prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in h.265, or may include 67 different intra prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in h.266 under development.

In possible implementations, the set of inter Prediction modes may include, for example, an Advanced Motion Vector Prediction (AMVP) mode and a merge (merge) mode depending on available reference pictures (i.e., at least partially decoded pictures stored in the DBP 230, for example, as described above) and other inter Prediction parameters, e.g., depending on whether the entire reference picture or only a portion of the reference picture, such as a search window region of a region surrounding the current block, is used to search for a best matching reference block, and/or depending on whether pixel interpolation, such as half-pixel and/or quarter-pixel interpolation, is applied, for example. In a specific implementation, the inter prediction mode set may include an improved control point-based AMVP mode and an improved control point-based merge mode according to the embodiment of the present application. In one example, intra-prediction unit 254 may be used to perform any combination of the inter-prediction techniques described below.

In addition to the above prediction mode, embodiments of the present application may also apply a skip mode and/or a direct mode.

The prediction processing unit 260 may further be configured to partition the image block 203 into smaller block partitions or sub-blocks, for example, by iteratively using quad-tree (QT) partitions, binary-tree (BT) partitions, or ternary-tree (TT) partitions, or any combination thereof, and to perform prediction, for example, for each of the block partitions or sub-blocks, wherein the mode selection includes selecting a tree structure of the partitioned image block 203 and selecting a prediction mode to apply to each of the block partitions or sub-blocks.

The inter prediction unit 244 may include a Motion Estimation (ME) unit (not shown in fig. 2) and a Motion Compensation (MC) unit (not shown in fig. 2). The motion estimation unit is used to receive or obtain picture image blocks 203 (current picture image blocks 203 of current picture 201) and decoded pictures 231, or at least one or more previously reconstructed blocks, e.g., reconstructed blocks of one or more other/different previously decoded pictures 231, for motion estimation. For example, the video sequence may comprise a current picture and a previously decoded picture 31, or in other words, the current picture and the previously decoded picture 31 may be part of, or form the sequence of pictures forming the video sequence.

For example, the encoder 20 may be configured to select a reference block from a plurality of reference blocks of the same or different one of a plurality of other pictures and provide the reference picture and/or an offset (spatial offset) between a position (X, Y coordinates) of the reference block and a position of the current block to a motion estimation unit (not shown in fig. 2) as an inter prediction parameter. This offset is also called Motion Vector (MV).

The motion compensation unit is configured to obtain an inter prediction parameter and perform inter prediction based on or using the inter prediction parameter to obtain an inter prediction block 245. The motion compensation performed by the motion compensation unit (not shown in fig. 2) may involve taking or generating a prediction block based on a motion/block vector determined by motion estimation (possibly performing interpolation to sub-pixel precision). Interpolation filtering may generate additional pixel samples from known pixel samples, potentially increasing the number of candidate prediction blocks that may be used to encode a picture block. Upon receiving the motion vector for the PU of the current picture block, motion compensation unit 246 may locate the prediction block in one reference picture list to which the motion vector points. Motion compensation unit 246 may also generate syntax elements associated with the blocks and video slices for use by decoder 30 in decoding picture blocks of a video slice.

In particular, the inter prediction unit 244 may transmit a syntax element including inter prediction parameters (e.g., indication information for selecting an inter prediction mode for current block prediction after traversing a plurality of inter prediction modes) to the entropy encoding unit 270. In a possible application scenario, if there is only one inter prediction mode, the inter prediction parameters may not be carried in the syntax element, and the decoding end 30 can directly use the default prediction mode for decoding. It will be appreciated that the inter prediction unit 244 may be used to perform any combination of inter prediction techniques.

The intra prediction unit 254 is used to obtain, e.g., receive, the picture block 203 (current picture block) of the same picture and one or more previously reconstructed blocks, e.g., reconstructed neighboring blocks, for intra estimation. For example, the encoder 20 may be configured to select an intra-prediction mode from a plurality of (predetermined) intra-prediction modes.

Embodiments of encoder 20 may be used to select an intra prediction mode based on optimization criteria, such as based on a minimum residual (e.g., an intra prediction mode that provides a prediction block 255 that is most similar to current picture block 203) or a minimum code rate distortion.

The intra-prediction unit 254 is further configured to determine the intra-prediction block 255 based on the intra-prediction parameters as the selected intra-prediction mode. In any case, after selecting the intra-prediction mode for the block, intra-prediction unit 254 is also used to provide intra-prediction parameters, i.e., information indicating the selected intra-prediction mode for the block, to entropy encoding unit 270. In one example, intra-prediction unit 254 may be used to perform any combination of intra-prediction techniques.

Specifically, the above-described intra prediction unit 254 may transmit a syntax element including an intra prediction parameter (such as indication information of selecting an intra prediction mode for current block prediction after traversing a plurality of intra prediction modes) to the entropy encoding unit 270. In a possible application scenario, if there is only one intra-prediction mode, the intra-prediction parameters may not be carried in the syntax element, and the decoding end 30 may directly use the default prediction mode for decoding.

Entropy encoding unit 270 is configured to apply an entropy encoding algorithm or scheme (e.g., a Variable Length Coding (VLC) scheme, a Context Adaptive VLC (CAVLC) scheme, an arithmetic coding scheme, a Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or other entropy encoding methods or techniques) to individual or all of quantized residual coefficients 209, inter-prediction parameters, intra-prediction parameters, and/or loop filter parameters (or not) to obtain encoded picture data 21 that may be output by output 272 in the form of, for example, encoded bitstream 21. The encoded bitstream may be transmitted to video decoder 30, or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 270 may also be used to entropy encode other syntax elements of the current video slice being encoded.

Other structural variations of video encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may quantize the residual signal directly without the transform processing unit 206 for certain blocks or frames. In another embodiment, the encoder 20 may have the quantization unit 208 and the inverse quantization unit 210 combined into a single unit.

Specifically, in the embodiment of the present application, the encoder 20 may be used to implement a video image processing method described in the following embodiments.

It should be understood that other structural variations of the video encoder 20 may be used to encode the video stream. For example, for some image blocks or image frames, video encoder 20 may quantize the residual signal directly without processing by transform processing unit 206 and, correspondingly, without processing by inverse transform processing unit 212; alternatively, for some image blocks or image frames, the video encoder 20 does not generate residual data and accordingly does not need to be processed by the transform processing unit 206, the quantization unit 208, the inverse quantization unit 210, and the inverse transform processing unit 212; alternatively, video encoder 20 may store the reconstructed image block directly as a reference block without processing by filter 220; alternatively, the quantization unit 208 and the inverse quantization unit 210 in the video encoder 20 may be merged together. The loop filter 220 is optional, and in the case of lossless compression coding, the transform processing unit 206, the quantization unit 208, the inverse quantization unit 210, and the inverse transform processing unit 212 are optional. It should be appreciated that the inter prediction unit 244 and the intra prediction unit 254 may be selectively enabled according to different application scenarios.

Referring to fig. 3, fig. 3 shows a schematic/conceptual block diagram of an example of a decoder 30 for implementing embodiments of the present application. Video decoder 30 is operative to receive encoded picture data (e.g., an encoded bitstream) 21, e.g., encoded by encoder 20, to obtain a decoded picture 231. During the decoding process, video decoder 30 receives video data, such as an encoded video bitstream representing picture blocks of an encoded video slice and associated syntax elements, from video encoder 20.

In the example of fig. 3, decoder 30 includes entropy decoding unit 304, inverse quantization unit 310, inverse transform processing unit 312, reconstruction unit 314 (e.g., summer 314), buffer 316, loop filter 320, decoded picture buffer 330, and prediction processing unit 360. The prediction processing unit 360 may include an inter prediction unit 344, an intra prediction unit 354, and a mode selection unit 362. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with reference to video encoder 20 of fig. 2.

Entropy decoding unit 304 is to perform entropy decoding on encoded picture data 21 to obtain, for example, quantized coefficients 309 and/or decoded encoding parameters (not shown in fig. 3), e.g., any or all of inter-prediction, intra-prediction parameters, loop filter parameters, and/or other syntax elements (decoded). Entropy decoding unit 304 is further to forward inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to prediction processing unit 360. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

Inverse quantization unit 310 may be functionally identical to inverse quantization unit 110, inverse transform processing unit 312 may be functionally identical to inverse transform processing unit 212, reconstruction unit 314 may be functionally identical to reconstruction unit 214, buffer 316 may be functionally identical to buffer 216, loop filter 320 may be functionally identical to loop filter 220, and decoded picture buffer 330 may be functionally identical to decoded picture buffer 230.

Prediction processing unit 360 may include inter prediction unit 344 and intra prediction unit 354, where inter prediction unit 344 may be functionally similar to inter prediction unit 244 and intra prediction unit 354 may be functionally similar to intra prediction unit 254. The prediction processing unit 360 is typically used to perform block prediction and/or to obtain a prediction block 365 from the encoded data 21, as well as to receive or obtain (explicitly or implicitly) prediction related parameters and/or information about the selected prediction mode from, for example, the entropy decoding unit 304.

When the video slice is encoded as an intra-coded (I) slice, intra-prediction unit 354 of prediction processing unit 360 is used to generate a prediction block 365 for the picture block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When a video frame is encoded as an inter-coded (i.e., B or P) slice, inter prediction unit 344 (e.g., a motion compensation unit) of prediction processing unit 360 is used to generate a prediction block 365 for the video block of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 304. For inter prediction, a prediction block may be generated from one reference picture within one reference picture list. Video decoder 30 may construct the reference frame list using default construction techniques based on the reference pictures stored in DPB 330: list 0 and list 1.

Prediction processing unit 360 is used to determine prediction information for the video blocks of the current video slice by parsing the motion vectors and other syntax elements, and to generate a prediction block for the current video block being decoded using the prediction information. In an example of the present application, prediction processing unit 360 uses some of the syntax elements received to determine a prediction mode (e.g., intra or inter prediction) for encoding video blocks of a video slice, an inter prediction slice type (e.g., B-slice, P-slice, or GPB-slice), construction information for one or more of a reference picture list of the slice, a motion vector for each inter-coded video block of the slice, an inter prediction state for each inter-coded video block of the slice, and other information to decode video blocks of a current video slice. In another example of the present disclosure, the syntax elements received by video decoder 30 from the bitstream include syntax elements received in one or more of an Adaptive Parameter Set (APS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), or a slice header.

Inverse quantization unit 310 may be used to inverse quantize (i.e., inverse quantize) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 304. The inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in the video slice to determine the degree of quantization that should be applied and likewise the degree of inverse quantization that should be applied.

Inverse transform processing unit 312 is used to apply an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to produce a residual block in the pixel domain.

The reconstruction unit 314 (e.g., summer 314) is used to add the inverse transform block 313 (i.e., reconstructed residual block 313) to the prediction block 365 to obtain the reconstructed block 315 in the sample domain, e.g., by adding sample values of the reconstructed residual block 313 to sample values of the prediction block 365.

Loop filter unit 320 (during or after the encoding cycle) is used to filter reconstructed block 315 to obtain filtered block 321 to facilitate pixel transitions or to improve video quality. In one example, loop filter unit 320 may be used to perform any combination of the filtering techniques described below. Loop filter unit 320 is intended to represent one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), or a sharpening or smoothing filter, or a collaborative filter. Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.

Decoded video block 321 in a given frame or picture is then stored in decoded picture buffer 330, which stores reference pictures for subsequent motion compensation.

Decoder 30 is used to output decoded picture 31, e.g., via output 332, for presentation to or viewing by a user.

Other variations of video decoder 30 may be used to decode the compressed bitstream. For example, decoder 30 may generate an output video stream without loop filter unit 320. For example, the non-transform based decoder 30 may directly inverse quantize the residual signal without the inverse transform processing unit 312 for certain blocks or frames. In another embodiment, video decoder 30 may have inverse quantization unit 310 and inverse transform processing unit 312 combined into a single unit.

Specifically, in the embodiment of the present application, the decoder 30 is used to implement the processing method of the video image described in the following embodiments.

It should be understood that other structural variations of the video decoder 30 may be used to decode the encoded video bitstream. For example, video decoder 30 may generate an output video stream without processing by filter 320; alternatively, for some image blocks or image frames, the quantized coefficients are not decoded by entropy decoding unit 304 of video decoder 30 and, accordingly, do not need to be processed by inverse quantization unit 310 and inverse transform processing unit 312. Loop filter 320 is optional; and the inverse quantization unit 310 and the inverse transform processing unit 312 are optional for the case of lossless compression. It should be understood that the inter prediction unit and the intra prediction unit may be selectively enabled according to different application scenarios.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a video coding apparatus 400 (e.g., a video encoding apparatus 400 or a video decoding apparatus 400) provided by an embodiment of the present application. Video coding apparatus 400 is suitable for implementing the embodiments described herein. In one embodiment, video coding device 400 may be a video decoder (e.g., decoder 30 of fig. 1A) or a video encoder (e.g., encoder 20 of fig. 1A). In another embodiment, video coding device 400 may be one or more components of decoder 30 of fig. 1A or encoder 20 of fig. 1A described above.

Video coding apparatus 400 includes: an ingress port 410 and a reception unit (Rx) 420 for receiving data, a processor, logic unit or Central Processing Unit (CPU) 430 for processing data, a transmitter unit (Tx) 440 and an egress port 450 for transmitting data, and a memory 460 for storing data. Video coding device 400 may also include optical-to-electrical conversion components and electrical-to-optical (EO) components coupled with ingress port 410, receiver unit 420, transmitter unit 440, and egress port 450 for egress or ingress of optical or electrical signals.

The processor 430 is implemented by hardware and software. Processor 430 may be implemented as one or more CPU chips, cores (e.g., multi-core processors), FPGAs, ASICs, and DSPs. Processor 430 is in communication with inlet port 410, receiver unit 420, transmitter unit 440, outlet port 450, and memory 460. Processor 430 includes a coding module 470 (e.g., encoding module 470 or decoding module 470). The encoding/decoding module 470 implements embodiments disclosed herein to implement the chroma block prediction methods provided by embodiments of the present application. For example, the encode/decode module 470 implements, processes, or provides various encoding operations. Accordingly, a substantial improvement is provided to the function of the video coding apparatus 400 by the encoding/decoding module 470 and affects the transition of the video coding apparatus 400 to different states. Alternatively, the encode/decode module 470 is implemented as instructions stored in the memory 460 and executed by the processor 430.

The memory 460, which may include one or more disks, tape drives, and solid state drives, may be used as an over-flow data storage device for storing programs when such programs are selectively executed, and for storing instructions and data that are read during program execution. The memory 460 may be volatile and/or nonvolatile, and may be Read Only Memory (ROM), random Access Memory (RAM), random access memory (TCAM), and/or Static Random Access Memory (SRAM).

Referring to fig. 5, fig. 5 is a simplified block diagram of an apparatus 500 that may be used as either or both of source device 12 and destination device 14 in fig. 1A according to an example embodiment. Apparatus 500 may implement the techniques of this application. In other words, fig. 5 is a schematic block diagram of an implementation manner of an encoding apparatus or a decoding apparatus (simply referred to as a decoding apparatus 500) of the embodiment of the present application. The decoding apparatus 500 may include, among other things, a processor 510, a memory 530, and a bus system 550. Wherein the processor is connected with the memory through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored by the memory. The memory of the coding device stores program code and the processor may invoke the program code stored in the memory to perform various video encoding or decoding methods described herein, particularly various new methods of random access flow quality control. To avoid repetition, it is not described in detail here.

In the embodiment of the present application, the processor 510 may be a Central Processing Unit (CPU), and the processor 510 may also be other general-purpose processors, digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), field Programmable Gate Arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 530 may include a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of memory device may also be used for memory 530. Memory 530 may include code and data 531 to be accessed by processor 510 using bus 550. Memory 530 may further include an operating system 533 and application programs 535, the application programs 535 including at least one program that allows processor 510 to perform the video encoding or decoding methods described herein, and in particular the video image processing methods described herein. For example, the application programs 535 may include applications 1 through N, which further include a video encoding or decoding application (simply a video coding application) that performs the video encoding or decoding methods described herein.

The bus system 550 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For clarity of illustration, however, the various buses are designated in the figure as bus system 550.

Optionally, the translator device 500 may also include one or more output devices, such as a display 570. In one example, the display 570 may be a touch-sensitive display that incorporates a display with a touch-sensitive unit operable to sense touch input. A display 570 may be coupled to the processor 510 via the bus 550.

The technical scheme of the embodiment of the application is explained in detail as follows:

first, some technical terms or technical concepts applied to the embodiments of the present application are introduced.

I frame: in the field of video coding, a frame that can be decoded independently of other frames is generally referred to as an "I frame". The encoding prediction mode of all blocks of the I frame is intra prediction.

P frame: in the field of video coding, frames that refer to forward frames and are marked in the code stream as being of the P-frame type, are commonly referred to simply as "P-frames".

B frame: in the field of video coding, reference can be made to both forward and backward frames, commonly referred to as "B-frames" for short. The embodiments of the present application mainly aim at low latency scenarios, and therefore, in order to reduce latency, the encoding is generally a "P frame".

Random access to code stream: for convenience of description, in the embodiments of the present application, the frame type of all frames in a video stream is an I frame, or a video stream in which all frames are in an intra-frame prediction mode is simply referred to as a "random access code stream". The random access code stream may also be referred to as a second code stream, and the name thereof is not limited thereto. The encoding process for generating the random access code stream is referred to as a second encoding.

Long group of pictures (GOP) stream: for convenience of description, the embodiments of the present application refer to a video stream containing at least one P frame between adjacent I frames in a video stream or a video stream where inter prediction mode is allowed, which is simply referred to as a "long GOP stream". The long GOP stream may also be referred to as a first code stream, an elementary stream, or the like, and the name thereof is not limited thereto. The encoding process used to generate the GOP stream is referred to as the first encoding.

And (3) reconstructing a frame: in the encoding process, the encoding result of the previous frame needs to be decoded and stored for reference by the subsequent frame, the decoded frame is generally called a "reconstructed frame", and the reconstructed frame is consistent with the frame decoded by the decoder and is called as an encoding-decoding consistency. For convenience of description, frames decoded by the decoder are also collectively referred to as "reconstructed frames" in the embodiments of the present application.

Strong interaction scene: the application scene of user's input and feedback can be received in real time. Such as game scenes, live interaction scenes, etc.

A machine position: when the camera at one location belongs to one machine position, the machine position is switched to the content of the other machine position from the playing of the content of one machine position.

Viewing angle: at one moment, the direction of the lens or the direction the user looks at, when the lens is rotated or the user turns around, switching of the viewing angle occurs.

Shooting in multiple positions: two or more cameras are used to shoot the same scene in multiple angles and directions.

In order to reduce the time delay of accessing (including decoding or playing) video content to meet the low time delay requirement of some application scenes, the encoding end can provide two code streams for the same video content, wherein one code stream is a long GOP (group of picture) code stream, and the other code stream is a random access code stream. The coding blocks of all frames of the random access code stream are in an intra-frame prediction mode. When the video content needs to be accessed, a frame at the access moment in the random access stream of the video content is decoded, and then the decoding result of the frame is used as a reference frame of a long GOP (group of pictures) code stream of the video content, so that the video content is accessed quickly.

Accessing video content includes initially accessing video content or switching from one video content to another. For example, initially accessing the video content may be based on a detected user click to begin playing a live content, beginning to decode the video content. Switching from one video content to another may be from decoding one set of machine video to another as shown in fig. 6 below.

Two code streams are provided for the same video content through the encoding end so as to reduce the time delay of accessing the video content. An application scenario of multi-station shooting of a sports event shown in fig. 6 is taken as an example for illustration. As shown in fig. 6, a plurality of cameras, for example, 7 cameras as shown in fig. 6, are arranged at different positions in the field of the sporting event, respectively a camera a, a camera B, a camera C, a camera D, a camera E, a camera F, and a camera G. Cameras at different positions can capture the same scene from different angles to obtain a set of video signals. The set of video signals may include a set a video, a set B video, a set C video, a set D video, a set E video, a set F video, and a set G video. The machine position A video, the machine position B video, the machine position C video, the machine position D video, the machine position E video, the machine position F video and the machine position G video can be respectively used as one video content. The multiple machine position (machine position A, machine position B, machine position C, machine position D, machine position E, machine position F and machine position G) videos can provide a multi-angle and three-dimensional stereoscopic visual experience for a user. The encoding end can provide two code streams for each machine position (machine position A, machine position B, machine position C, machine position D, machine position E, machine position F or machine position G) video. The user can switch from the one-angle-position video to the other-angle-position video for viewing in a suitable interactive form. The decoding end switches from decoding the code stream of one machine position video to decoding the code stream of another machine position video based on user operation. Fig. 7 is a schematic diagram of a track of a decoding frame for switching from a current video content to another video content according to an embodiment of the present application. For example, the current video content is the set-top a video content, and the other video content is the set-top B video content. As shown in fig. 7, the decoding side decodes the frame number 0 (# 0 frame), the frame number 1 (# 1 frame), and the frame number 2 (# 2 frame) of the long GOP stream of the current video content, and when the frame number 3 (# 3 frame) of the current video content is decoded, a video content switch occurs, that is, a switch from decoding the current video content to decoding another video content. The decoding end can acquire and decode a frame at a switching moment in a random access code stream of another video content. For example, a random access codestream of another video content as shown in fig. 7 has frame number 3 (# 3 frame). After decoding the frame with number 3 (# 3 frame) of the random access code stream of another video content, the decoding result (i.e. reconstructed frame) is put into the reference frame list cache. After that, the decoding of the frame (# 4 frame) with the number 4 of the long GOP code stream of the other video content is started, and the frame (# 4 frame) with the number 4 of the long GOP code stream of the other video content refers to the decoding result of the frame (# 3 frame) with the number 3 of the random access code stream, thereby completing the decoding and accessing the other video content. Thus, the decoding side does not need to start decoding from the previous I frame (i.e., #0 frame) at the switching time in the long GOP stream of the other video content, nor does it need to wait until the next I frame (i.e., #9 frame) to start decoding.

During the process of accessing the video content, the reconstructed frame of the random access code stream of the video content is used as the reference frame of the long GOP code stream of the video content. Thus, although fast access to video content can be achieved, there is a problem of inconsistent coding and decoding. When the quality of the random access code stream and the quality of the long GOP code stream have a large difference, subjective visible blocking effect and obvious artifact (artifact) effect are easy to occur. After analysis, the reason for causing the above problems is mainly that, in the process of encoding the long GOP code stream, a frame used for encoding prediction is a reconstructed frame of the long GOP code stream, and when the long GOP code stream is accessed, a reconstructed frame of a random access code stream is used. The reconstructed frame of the random access code stream and the reconstructed frame of the long GOP code stream lack quality matching in the coding process.

Aiming at the problem that reconstructed frames of different code streams lack quality matching, the embodiment of the application provides a video image processing method, which can also be called as a random access quality control method, and aims to improve the decoding quality of accessed video content, reduce blocking effect and eliminate partial artifact effect on the basis of meeting the requirement of accessing the video content with low time delay.

Before describing the technical solution of the embodiment of the present application, a video image processing system according to the embodiment of the present application will be described with reference to the drawings. Referring to fig. 8, fig. 8 is a schematic view of a video image processing system according to an embodiment of the present disclosure. The video image processing system may include a server 801 and a terminal 802. The server 801 may communicate with the terminal 802, for example, the terminal 802 may communicate with the server 801 via a manner such as wireless-fidelity (Wifi) communication, bluetooth communication, or cellular 2/3/4/5generation (2/3/4/5 generation, 2G/3G/4G/5G) communication. It should be understood that other communication manners, including future communication manners, may also be adopted between the server 801 and the terminal 802, and are not particularly limited in this regard. It should be noted that fig. 8 only uses one terminal 801 as a schematic illustration, and the system may include a plurality of terminals 802, which are not necessarily illustrated in the embodiment of the present application.

The terminal 802 may be various types of devices configured with a display component, for example, the terminal 802 may be a mobile phone, a tablet computer, a notebook computer, a smart television, or other terminal devices (fig. 8 illustrates that the terminal is a mobile phone), the terminal may also be a device for virtual scene interaction, including VR glasses, AR devices, MR interactive devices, or other devices, the terminal may also be a wearable electronic device such as a smart watch, a smart bracelet, or other devices carried in a vehicle, an unmanned aerial vehicle, an industrial robot, or other carriers. The embodiment of the present application does not specifically limit the specific form of the terminal.

Further, the terminal can also be referred to as a User Equipment (UE), a subscriber station, a mobile unit, a subscriber unit, a wireless unit, a remote unit, a mobile device, a wireless communications device, a remote device, a mobile subscriber station, a terminal device, an access terminal, a mobile terminal, a wireless terminal, a smart terminal, a remote terminal, a handset, a user agent, a mobile client, a client, or some other suitable terminology.

The server 801 may be one or more physical servers (one physical server is taken as an example in fig. 8), may also be a computer cluster, and may also be a virtual machine or a cloud server in a cloud computing scenario, and so on.

By way of example, the terminal 802 may be installed with a client, for example, the client may be a video playing application, a live application (e.g., an e-commerce live type, a game live type, etc.), a video conference application, or a game application, and the like, which relate to video encoding and decoding (APP). The terminal 802 can run the client based on a user's operation (e.g., clicking, touching, sliding, shaking, voice-controlling, etc.), access the video content, and display the video content on the display component.

The server 801 may serve as the source device 12 described in the foregoing embodiment, and provide two code streams with the same or equivalent quality for the same video content by using the video image processing method according to the embodiment of the present application, where one of the code streams is a long GOP code stream, and the other code stream is a random access code stream. The terminal 802 may be used as the destination device 14 in the above embodiments to implement fast access to the video content by decoding the random access stream and the long GOP stream.

Specifically, server 801 may obtain video images, the videoThe image may be a video image captured by a camera or may be a decoded video image. The camera may be a camera of any of the stands as shown in figure 6. The server 801 may provide two code streams with the same or equivalent quality for the same video content by using the video image processing method according to the embodiment of the present application, where one of the code streams is a long GOP code stream, and the other code stream is a random access code stream. The server 801 may provide both streams to the client. In an implementation manner, for an on-demand application scenario, a client may support a function of a user requesting video content, and the server 801 may store the two streams. And when a video content request sent by the client is received, sending the two code streams to the client. The client can decode the random access code stream first, and decode the long GOP code stream based on the decoding result of the random access code stream so as to quickly access the video content. Or, when receiving a video content request sent by a client, the video content request is used for requesting from t ₀ Time of day, if from t ₀ The video content corresponding to the time is an Instantaneous Decoding Refresh (IDR) frame in the long GOP code stream, and the server can send the long GOP code stream to the client. The client can directly decode the long GOP code stream to access the video content. If from t ₀ The video content corresponding to the moment is not an IDR frame in the long GOP code stream, and the server can contain t ₀ Time and t ₀ And sending the random access code stream and the long GOP code stream of the video content after the moment to the client. The client can decode the random access code stream first, and decode the long GOP code stream based on the decoding result of the random access code stream so as to quickly access the video content. The client may also render and display the decoded video content, thereby presenting the video content to the user. In another implementation manner, for a live broadcast application scenario, the client may support a live broadcast video content function, and the server 801 may issue the two code streams to the client when receiving a request for accessing the live broadcast video content. The client can decode the random access code stream first, and decode the long GOP code stream based on the decoding result of the random access code stream so as to quickly access the video content. The client can also pair The decoded video content is rendered and displayed, thereby presenting the video content to a user.

For another example, an intermediate node 803 may be further disposed between the server 801 and the terminal 802. The server 801 may serve as the source device 12 in the foregoing embodiment, and by using the method for processing a video image in the embodiment of the present application, two types of code streams with the same or equivalent quality are provided for the same video content, where one type of code stream is a long GOP code stream, and the other type of code stream is a random access code stream. The intermediate node 803 may be the destination device 14 described in the above embodiment by decoding the random access code stream and the long GOP code stream. The intermediate node 803 provides the decoded video content to the terminal 802 to enable fast access to the video content. For example, the intermediate node 803 may be a node in a Content Delivery Network (CDN).

In an implementation manner, for a live broadcast application scenario, a client may support a live broadcast video content function, and when receiving a request for accessing a live broadcast video content, a server 801 may issue the two code streams to an intermediate node 803. The intermediate node 803 may decode the random access code stream first, and decode the long GOP code stream based on the decoding result of the random access code stream. The intermediate node 803 provides the decoded video content to the terminal 802 to enable fast access to the video content. The client may also render and display the decoded video content, thereby presenting the video content to the user.

Illustratively, the client sends a request to the server for accessing the live video content, the request for accessing the live video content being used for requesting the content to be accessed from t ₀ And accessing the live broadcast content at any moment. When receiving a request for accessing live video content sent by a client, a server may include t ₀ Time and t ₀ The random access code stream and the long GOP code stream of the video content after the moment are sent to the intermediate node or the client, the intermediate node or the client can decode the random access code stream at first, and the long GOP code stream is decoded based on the decoding result of the random access code stream, so that the video content is accessed quickly.

In the embodiment of the present application, any one of the above-mentioned application programs of the terminal 801 may be an application program built in the terminal 801 itself, or an application program provided by a third-party service provider installed by the user, which is not particularly limited.

It should be noted that the two code streams with the same or equivalent quality in the embodiments of the present application refer to that the reconstructed frame of the long GOP code stream and the reconstructed frame of the corresponding random access code stream have the same or equivalent quality, so that the reconstructed frame of the random access code stream is used as a reference frame to decode the long GOP code stream, which is beneficial to reducing the blocking effect and eliminating part of the artifact effect.

The reconstructed frames of the long GOP code stream and the reconstructed frames of the corresponding random access code stream with the same or equivalent quality meet one or more of the following conditions:

the difference between the reconstructed frame of the long GOP code stream and the reconstructed frame of the corresponding random access code stream is smaller than a difference threshold value; or,

the similarity between the reconstructed frame of the long GOP code stream and the reconstructed frame of the corresponding random access code stream is higher than a similarity threshold value; or,

the difference between the reconstructed frame of the long GOP code stream and the reconstructed frame of the corresponding random access code stream is smaller than the difference between the reconstructed frame of the long GOP code stream and the reconstructed frame of the random access code stream obtained by encoding the same video content by adopting other division modes or encoding parameters; or,

the similarity between the reconstructed frame of the long GOP code stream and the reconstructed frame of the corresponding random access code stream is higher than the similarity between the reconstructed frame of the random access code stream and the long GOP code stream obtained by coding the same video content by adopting other division modes or coding parameters; or,

the difference of the pixel values of the same positions of the reconstructed frame of the long GOP code stream and the reconstructed frame of the corresponding random access code stream is smaller than the pixel value threshold; for example, the pixel value threshold may be 128, which may be other values, and the embodiment of the present application is not necessarily illustrated.

The reconstructed frame of the long GOP code stream and the reconstructed frame of the random access code stream correspond to the reconstructed frame of the long GOP code stream and the reconstructed frame of the random access code stream at the same position. Taking another long GOP stream and random access stream of the video content shown in fig. 7 as an example, the reconstructed frames of the long GOP stream and the reconstructed frames of the random access stream at the same position may include the reconstructed frame of the frame number 3 (# 3 frame) of the long GOP stream and the reconstructed frame of the frame number 3 (# 3 frame) of the random access stream.

The video image processing method of the embodiment of the application can adjust the encoding of the second code stream based on the encoding result of the first code stream (for example, the reconstructed image of the first code stream or the reconstructed image in the first encoding process, and/or the encoding information of the first encoding), so as to achieve the same or equivalent quality of the reconstructed frame of the first code stream and the reconstructed frame of the corresponding second code stream. The video image processing method can be used for encoding the input image to be encoded to generate the first code stream and the second code stream. Namely the same video content, can encode and generate a first code stream and a second code stream with the same or equivalent quality. The quality of the reconstructed frame of the first code stream is the same as or equal to that of the corresponding reconstructed frame of the second code stream. The first stream here may be a long GOP stream as described above. The second code stream may be a random access code stream as described above. The coding of the second code stream is adjusted based on the coding result of the first code stream, and different specific implementation modes can be provided. For example, the quality of the reconstructed frame of the first code stream and the quality of the reconstructed frame of the corresponding second code stream may be the same or equivalent through the following embodiments.

And the interval between two adjacent frames of the full intra-frame prediction mode in the first code stream is greater than the interval between two adjacent frames of the full intra-frame prediction mode in the second code stream.

Referring to fig. 9, fig. 9 is a schematic diagram of a video image processing method according to an embodiment of the present disclosure. The embodiment of the application can be executed by an encoding device. The encoding apparatus may be applied to the source device 12 in the above-described embodiment, for example, as the server 801 in the embodiment shown in fig. 8. As shown in fig. 9, the encoding apparatus may obtain an image to be encoded, and perform first encoding on the image to be encoded to generate a first code stream. And then, according to the coding information of the first coding and/or the first reconstructed image, carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image so as to generate a second code stream. The first reconstructed image is a first code stream or a reconstructed image in a first encoding process.

It should be noted that, as shown in fig. 9, the first code is used to generate a first code stream, the second code is used to generate a second code stream, and the first code stream and the second code stream are generated through different encoding processes. For the transmission of the first code stream and the second code stream, not shown in fig. 9, the first code stream and the second code stream may be transmitted independently, or may be transmitted after being interleaved.

Several realizations of the embodiment shown in fig. 9 will be described in detail below with reference to the embodiments shown in fig. 10 to 15. For the second encoding process, in the embodiments shown in fig. 10 to 15, according to the first reconstructed image, the image to be encoded or the first reconstructed image is subjected to second encoding in the full intra prediction mode to generate a second code stream, or according to the first reconstructed image and the encoding information of the first encoding, the image to be encoded or the first reconstructed image is subjected to second encoding in the full intra prediction mode to generate the second code stream.

Referring to fig. 10, fig. 10 is a schematic flowchart illustrating a video image processing method according to an embodiment of the present disclosure. Portions of the methods of the embodiments of the present application may be performed by an encoding device. The encoding apparatus may be applied to the source device 12 in the above-described embodiment, for example, as the server 801 in the embodiment shown in fig. 8. It should be understood that a series of steps or operations related to the embodiments of the present application may be performed in various orders and/or simultaneously, and the execution order is not limited by the size of the step numbers shown in fig. 10. The method as shown in fig. 10 may comprise the following implementation steps:

Step 1001, an image to be encoded is acquired.

The image to be encoded according to the embodiment of the present application may be a video image captured by a camera or other capturing device, or a decoded video image. The decoded video image may be an image obtained by decoding a compressed video image.

In one implementation, the image to be encoded may be a source video image. Therefore, through the following steps, the source video image is subjected to the first coding and the second coding to generate a first code stream and a second code stream, and frame-level synchronous output is realized. In another implementation, the image to be encoded may be an image block obtained by dividing the source video image. Therefore, through the following steps, the divided image blocks are subjected to first coding and second coding to generate a first code stream and a second code stream, and block-level synchronous output is realized.

Step 1002, performing first coding on an image to be coded to generate a first code stream.

The first encoding may include one or more of prediction, transform, quantization, entropy coding, and the like. For example, an image to be encoded may be predicted, transformed, and quantized to generate first encoded data, and then entropy-encoded to generate a first code stream including the first encoded data.

Alternatively, the prediction mode of the first encoding may be inter prediction. And carrying out first coding on an image to be coded to generate a first code stream comprising a P frame or a P block. Alternatively, the prediction mode of the first encoding may be intra prediction. The method comprises the steps of carrying out first coding on an image to be coded so as to generate a first code stream of frames or I blocks including a full intra prediction mode. For example, the long GOP stream as shown in fig. 7, i.e., here the first stream may include P frames and frames of full intra prediction mode.

Step 1003, determining at least one of a first partition mode or a first coding parameter used for carrying out second coding on the image to be coded or the first reconstructed image according to the first reconstructed image, wherein the first reconstructed image is a first code stream or a reconstructed image in a first coding process.

The method and the device for reconstructing the image can decode the first code stream to obtain the first reconstructed image. Alternatively, the embodiment of the present application may acquire a reconstructed image in the first encoding process. For example, in the first encoding process, the first encoded data is subjected to inverse quantization, inverse transformation, or the like to obtain a first reconstructed image. And determining a first division mode and/or a first coding parameter which are/is adopted for carrying out second coding on the image to be coded or the first reconstructed image according to the first reconstructed image.

The first encoding parameter may include, but is not limited to, a first Quantization Parameter (QP), a first code rate, or the like.

In an implementation manner, the image to be encoded may be a source video image, and accordingly, the first reconstructed image is a first reconstructed frame obtained by decoding a first code stream corresponding to the source video image, or obtained by performing inverse quantization, inverse transformation, and other processing on first encoded data corresponding to the source video image in a first encoding process. In another implementation manner, the image to be encoded may be an image block obtained by dividing the source video image, and accordingly, the first reconstructed image is a first reconstructed image block obtained by decoding a first code stream corresponding to the divided image block, or obtained by performing inverse quantization, inverse transformation and other processing on encoded data corresponding to the divided image block in a first encoding process.

Optionally, in another implementation manner of the foregoing step 1003, at least one of a first partition manner and a first encoding parameter used for performing a second encoding on the image to be encoded or the first reconstructed image is determined according to the first reconstructed image and the encoding information of the first encoding. The coding information of the first coding may include one or more of a division manner of the first coding, a quantization parameter of the first coding, and coding distortion information of the first coding.

And 1004, performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to at least one of the first partition mode and the first coding parameter to generate a second code stream.

And according to the first division mode and/or the first coding parameter, carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image so as to generate a second code stream.

In other words, the prediction mode of the second encoding may be intra prediction. And performing second coding on the image to be coded or the first reconstructed image to generate a second code stream comprising frames or I blocks of the full intra prediction mode. For example, the random access code stream as shown in fig. 7, i.e., here, the second code stream may include frames of the full intra prediction mode.

The second encoding may include one or more of prediction, transform, quantization, entropy coding, and the like. For example, the image to be encoded or the first reconstructed image may be predicted, transformed, and quantized to generate second encoded data, and then the second encoded data may be entropy-encoded to generate a second code stream including the second encoded data.

The second encoding differs from the first encoding in that the prediction mode is different, i.e., the second encoding is a full intra prediction mode. It is understood that other information such as the encoding parameters of the second encoding and the first encoding may be different.

In some embodiments, a difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, and the second reconstructed image is obtained by decoding the second code stream or by performing inverse quantization, inverse transformation, and the like on the second encoded data in a second encoding process. Wherein, the difference threshold or the similarity threshold can be reasonably set according to requirements.

Wherein the difference is used to represent a difference between the features of the first reconstructed image and the features of the second reconstructed image. The difference can be measured by an index such as Mean Absolute Differences (MAD), sum of Absolute Differences (SAD), sum of Squared Differences (SSD), mean Square Differences (MSD), or Sum of Absolute Transformed Differences (SATD). The larger the index such as MAD, SAD, SSD, MSD, or SATD is, the larger the difference is, and the more different the quality of the first reconstructed image and the second reconstructed image is. The smaller the index such as MAD, SAD, SSD, MSD, or SATD, the smaller the difference, and the more equal or comparable the quality of the first reconstructed image and the second reconstructed image. The similarity is used to represent the similarity of the features of the first reconstructed image to the features of the second reconstructed image. The similarity can be measured by using the indexes such as MAD, SAD, SSD, MSD or SATD. The larger the MAD, SAD, SSD, MSD, or SATD, the lower the similarity, and the more different the quality of the first reconstructed image and the second reconstructed image. The smaller the MAD, SAD, SSD, MSD, or SATD, the higher the similarity, and the more equal or comparable the quality of the first reconstructed image and the second reconstructed image.

The disparity threshold may be set to 0 or set according to requirements, such as when MAD is selected as the measure disparity indicator, the luminance signal threshold may be set to 4 and the chrominance signal threshold may be set to 2, and accordingly, when SAD, SSD or MSD is selected as the measure disparity indicator, the luminance signal threshold may be set to 4xN, 16 and the chrominance signal threshold may be set to 2xN, 4, respectively, where N is the total number of pixels in the measure disparity object (which may be a coding block or a picture); when SATD is selected as the measure of dissimilarity, the threshold may be set to 0. The similarity threshold may be similarly set.

Optionally, a specific implementation manner of the step 1003 may be that a plurality of second dividing manners are determined according to the first reconstructed image, and one of the plurality of second dividing manners is selected as the first dividing manner; and/or determining a plurality of second coding parameters according to the first reconstructed image, and selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter.

In this way, the similarity between the first reconstructed image and the second reconstructed image is the highest similarity among the similarities between the first reconstructed image and the third reconstructed images, and the third reconstructed images are reconstructed images obtained by performing second encoding on the image to be encoded or the first reconstructed image for a plurality of times according to the second division manners and/or the second encoding parameters. The third reconstructed images may be reconstructed images in the multiple second encoding processes, or reconstructed images of multiple code streams obtained by multiple second encoding.

For example, the encoding device may perform second encoding on the image to be encoded or the first reconstructed image a plurality of times according to the plurality of second division manners and/or the plurality of second encoding parameters, respectively, to generate a plurality of third encoded data. Then, in the second encoding process, the third encoded data are subjected to inverse quantization, inverse transformation, and the like to obtain third reconstructed images. And selecting one image with the highest similarity as the second reconstructed image by comparing the similarities between the plurality of third reconstructed images and the first reconstructed image respectively. In other words, the similarity between the first reconstructed image and the second reconstructed image is the highest of the similarities between the first reconstructed image and the plurality of third reconstructed images. And taking the third coded data corresponding to the third reconstructed image with the highest similarity as second coded data to generate a code stream comprising the second coded data. Or, the code stream corresponding to the third reconstructed image with the highest similarity is used as the second code stream.

Optionally, before step 1003, the method for processing a video image according to the embodiment of the present application may further include: it is determined whether the prediction mode of the first encoding is intra prediction. When the prediction mode of the first encoding is inter prediction, step 1003 is performed. And when the prediction mode of the first code is intra-frame prediction, taking the first code stream as a second code stream, or taking the first coded data as second coded data to generate a second code stream. In this way, when the image to be encoded is first encoded to obtain the first code stream including the P frame or the P block, the image to be encoded or the first reconstructed image may be second encoded by performing step 1003 and step 1004 to generate the second code stream including the frame or the I block of the full intra prediction mode. When the image to be encoded is subjected to the first encoding to obtain the first code stream of the frame or the I block including the full intra prediction mode, the step 1003 and the step 1004 do not need to be executed, and the frame or the I block of the full intra prediction mode is directly used as the second encoded data, so that the encoding efficiency can be improved.

For example, taking the long GOP stream and the random access stream of another video content shown in fig. 7 as an example, the encoding device performs the first encoding on the image to be encoded to generate a long GOP stream including the frame number 3 (# 3 frame). The encoding apparatus determines whether the prediction mode of the first encoding is intra prediction. As shown in fig. 7, the prediction mode of the first encoding at this time is inter prediction. After that, the encoding device may perform second encoding on the image to be encoded or the reconstructed frame of the frame number 3 (# 3 frame) by performing step 1003 and step 1004 to generate a random access code stream of the frame number 3 (# 3 frame). The prediction mode of the frame number 3 (# 3 frame) in the random access codestream is intra prediction.

It should be noted that, in some embodiments, when an image to be encoded is first encoded to obtain a first code stream including a frame or an I block of a full intra prediction mode, the frame or the I block of the full intra prediction mode may not be used as second encoded data, that is, the second code stream may not include the frame or the I block of the full intra prediction mode, and may be set reasonably according to a video transmission requirement.

It should be noted that the second encoded data may also be referred to as a random access frame.

Optionally, taking the example that the encoding apparatus is applied to a server, the server may store the first code stream and the second code stream. And when a video content request sent by the client is received, sending the first code stream and the second code stream to the client.

An example, on demand application scenario, where a client requests t from a video content ₀ The server will start playing from the moment that t is included ₀ And the first code stream of the video content starts to be issued at the moment. If t is ₀ The video content corresponding to the time is an access frame (for example, a frame of the above-mentioned full intra prediction mode) in the first code stream, and the server provides only the first code stream to the client. And the client decodes and plays the first code stream. Exemplary, t ₀ If the video content corresponding to the time is a frame (# 0 frame) encoded to 0 as shown in fig. 7 in the long GOP stream, the server provides only the long GOP stream to the client. If t is ₀ If the video content corresponding to the moment is not an access frame in the first code stream, the server also needs to issue t in the second code stream to the client ₀ Time of day or distance t ₀ Random access frame with the closest time instant. The client may decode the random access frame first, then decode the reference frame of the second code stream with the reconstructed frame of the random access frame, and decode and play the second code stream. Exemplary, t ₀ If the video content corresponding to the time is a frame (# 3 frame) encoded to 3 as shown in fig. 7 in the long GOP stream, the server also transmits the video content corresponding to the time to the serverIt needs to send the frame coded as 3 in the random access code stream to the client (# 3 frame). The client may decode the frame coded as 3 (# 3 frame) in the random access code stream, use the reconstructed frame coded as 3 frame (# 3 frame) in the random access code stream as a reference frame, decode the frame coded as 4 (# 4 frame) in the long GOP code stream, and then decode and play the subsequent frame of the long GOP code stream.

Another example, a live application scenario, where the client requests a request from t ₀ And accessing live broadcast at any moment, and sending a first code stream and a second code stream to the client by the server. The client decodes t in the second code stream firstly ₀ Time of day or distance t ₀ And a random access frame closest to the moment is decoded by taking a reconstructed frame of the random access frame as a reference frame, wherein the subsequent frame of the first code stream refers to a frame after the moment of the random access frame in the first code stream.

In this embodiment, an image to be encoded is first encoded to generate a first code stream, at least one of a first partition manner and a first encoding parameter used for second encoding of the image to be encoded or the first reconstructed image is determined according to the first reconstructed image, and the image to be encoded or the first reconstructed image is second encoded according to the at least one of the first partition manner and the first encoding parameter to generate a second code stream, where the first reconstructed image is the first code stream or a reconstructed image in a first encoding process. Therefore, the encoding of the second code stream is adjusted based on the first code stream or the reconstructed image in the first encoding process, so that the quality of the reconstructed image of the first code stream is the same as or equal to that of the reconstructed image of the corresponding second code stream, the decoding quality of the accessed video content is improved, the blocking effect is reduced, and partial artifact effect is eliminated on the basis of meeting the requirement of accessing the video content with low time delay.

The following explains a video image processing method according to an embodiment of the present application, with the above-described image to be encoded as an image block obtained by dividing a source video image.

Referring to fig. 11, fig. 11 is a schematic flowchart illustrating a video image processing method according to an embodiment of the present disclosure. Portions of the methods of the embodiments of the present application may be performed by an encoding device. The encoding apparatus may be applied to the source device 12 in the above-described embodiment, for example, as the server 801 in the embodiment shown in fig. 8. The present embodiment takes the image to be encoded as the nth image block of the kth frame image as an example. The encoding device may encode an nth image block of a kth frame image to generate a first code stream and a second code stream. The method as shown in fig. 11 may comprise the following implementation steps:

step 1101, acquiring an nth image block of a kth frame image.

The encoding device may receive an input K-th frame image, and perform block division on the K-th frame image to obtain a plurality of image blocks of the K-th frame image. In the embodiment of the present application, an example of encoding the nth image block of the kth frame image is taken as an example, other image blocks may adopt the same or similar processing manners, and the embodiment of the present application is not necessarily explained.

Step 1102, performing first coding on the nth image block to generate a first code stream.

The encoding apparatus may perform first encoding on the nth image block using information such as a prediction mode P, a partition method D, and a quantization parameter QP, to generate a first code stream. The first code stream may include first encoded data of an nth image block. In the first encoding process, a reconstructed image block a of the nth image block may also be generated.

Step 1103, obtaining the nth image block, the prediction mode P, and the reconstructed image block a.

And starting coding to generate a second code stream by taking the Nth image block and the reconstructed image block A as input information.

Alternatively, information such as the partition method D and the quantization parameter QP used in the first encoding may also be used as input information.

Step 1104, determining whether the prediction mode P is intra-frame prediction, if so, taking the encoding result of the nth image block in the first code stream as the encoding result of the current block in the second code stream, and if not, executing step 1005.

When the prediction mode P of the Nth image block is intra-frame prediction, the encoding result of the current block of the second code stream directly uses the encoding result of the Nth image block of the first code stream. When the prediction mode P of the Nth image block is inter-frame prediction, performing second coding of the intra-frame prediction mode on an original image block to be coded at a position corresponding to a reconstructed image block A in the Kth frame image, namely the Nth image block.

Step 1105, the nth image block is divided in a dividing manner, and all the divided sub-blocks are subjected to the second coding in the full intra prediction mode to obtain the reconstructed image block B of the second code stream.

Illustratively, the nth image block is divided in a division manner, and a series of second encoding processes such as intra-prediction, transformation, quantization, inverse transformation and the like are performed on all the divided sub-blocks to obtain a reconstructed image block B of the second code stream.

The nth image block is divided in a division manner, which includes but is not limited to 2Nx2N division, or NxN division, or no division, for the nth image block. For example, the nth image block is divided by 2Nx2N, that is, the nth image block is divided into 2Nx2N sub-blocks. N is any positive integer greater than 1.

When this step is performed for the nth image block for the first time, the coding information and/or coding parameters used in a series of second coding processes such as partitioning, intra-prediction, transformation, quantization, etc., for example, the partitioning method, QP, code rate, etc., may be coding information and/or coding parameters that are generated randomly by initialization, or may be coding information and/or coding parameters used in an I frame forward of the kth frame image. For example, for QP, the average QP of the K frame picture forward of the nearest I frame or frames may be used.

The reconstructed image block B is an image block corresponding to the reconstructed image block A in a reconstructed image of a Kth frame image of the second code stream. That is, in the reconstructed image of the Kth frame image of the second code stream, the image block with the same position as the reconstructed image block A of the Kth frame image of the first code stream. The reconstructed image block B may be composed of reconstructed blocks of one or more encoded sub-blocks.

And step 1106, calculating similarity cost function values of the reconstructed image block A and the reconstructed image block B.

Specifically, the similarity cost function values f (partition manner, QP) of the reconstructed image block a and the reconstructed image block B may be calculated according to the following formula (1).

Wherein I represents an index of a pixel in the reconstructed image block, I represents a total number of pixels in an image of the reconstructed image block, B (partition, QP, I) represents a reconstructed pixel value at an I-th pixel position of the quantization parameter QP used in one partition of the reconstructed image block B, and a (I) represents a reconstructed pixel value at an I-th pixel position of the reconstructed image block a. T may be 1 or 2.

Alternatively, the similarity between the two reconstructed image blocks may be evaluated in other ways besides the formula (1), including but not limited to MAD, SAD, SSD, MSD, SATD, etc. For example, the similarity of the two reconstructed image blocks is evaluated by any one of the following equations (2) to (5), or by calculating the sum of absolute values of difference images of the two reconstructed image blocks after the hadamard transform.

And (3) evaluating the similarity of the two reconstructed image blocks by adopting a similarity cost function of the MAD:

and evaluating the similarity of the two reconstructed image blocks by adopting a SAD similarity cost function:

and (3) evaluating the similarity of the two reconstructed image blocks by adopting a similarity cost function of the SSD:

and (3) evaluating the similarity of the two reconstructed image blocks by adopting a similarity cost function of MSD:

and (3) evaluating the similarity of the two reconstructed image blocks by adopting a similarity cost function of SATD: and calculating the sum of absolute values of the difference images of the two reconstructed image blocks after the Hadamard transform is carried out on the difference images, and evaluating the similarity of the two reconstructed image blocks.

And calculating to obtain similarity cost function values of the reconstructed image block A and the reconstructed image block B by using any one of the similarity cost functions, wherein the smaller the similarity cost function value is, the higher the similarity between the reconstructed image block A and the reconstructed image block B is.

Step 1107, determine whether the similarity cost function value of the reconstructed image block a and the reconstructed image block B is smaller than the similarity cost function threshold, or the finite iteration local optimization, if yes, execute step 1109, if not, execute step 1108.

Optionally, if the similarity cost function values of the reconstructed image block a and the reconstructed image block B are smaller than the similarity cost function value threshold, step 1109 is directly performed. If the finite iteration is locally optimal, step 1109 is performed directly. The limited iteration local optimization specifically means that all the partition modes and/or encoding parameters are used to perform second encoding on the nth image block, similarity cost function values of the reconstructed image block B and the reconstructed image block a of each partition mode and/or encoding parameter are obtained through calculation, one of the reconstructed image blocks B and a reconstructed image block a with the highest similarity (for example, the minimum similarity cost function value) is selected as the limited iteration local optimization, and step 1109 is performed on the encoded data corresponding to the reconstructed image block B with the highest similarity and the partition modes and/or encoding parameters.

The similarity cost function value threshold can be flexibly set according to requirements. For example, set to 0, or set the similarity cost function value thresholds using MAD, SAD, SSD, MSD evaluation similarity for luminance signal setting to 4, 4xI, 16, respectively. In other words, when the difference between the average luminances of the reconstructed image block B and the reconstructed image block a is smaller than 4 or 16, step 1109 is performed. For another example, the similarity cost function value thresholds for the similarity evaluation using MAD, SAD, SSD, MSD for the chromatic aberration signal are set to 2, 2xI, 4, respectively. In other words, when the difference between the average luminances of the reconstructed image block B and the reconstructed image block a is smaller than 2 or 4, step 1109 is performed. As another example, the threshold value of the similarity cost function value for evaluating the similarity using SATD may be set to 0.

Optionally, in the process of comparing the similarity of the two reconstructed image blocks, if there is a large difference between the individual pixels, for example, the difference between the gray values exceeds 128, the division manner and the corresponding quantization result may be discarded.

After performing step 1109, step 1101 may be repeatedly performed to start the first encoding of the (N + 1) th image block of the K frame image until the first encoding and the second encoding of the entire frame are completed.

Step 1108, transforming the dividing mode and/or the encoding parameters, and repeatedly executing step 1105.

The embodiment of the present application may have multiple partition manners and/or coding parameters, one partition manner and/or coding parameter may be selected from the multiple partition manners and/or coding parameters, step 1105 is repeatedly executed to traverse the multiple partition manners and/or coding parameters, the nth image block is subjected to the second encoding, and the similarity cost function values of the reconstructed image block B and the reconstructed image block a of each partition manner and/or coding parameter are calculated.

Taking the coding parameter as QP as an example, within a certain interval (e.g. 0 to 51), QP is selected in a certain step (e.g. 1 or 2), and step 1105 is repeatedly executed until the limited QPs are enumerated.

Taking the partitioning manner as an example, for each partitioning manner, step 1105 is executed until the limited partitioning manners are enumerated.

And 1109, entropy coding is carried out on the second coded data and the division mode and/or the coding parameters to generate a second code stream, and second coding of the Nth image block is completed.

The second code stream may include second encoded data for an nth image block.

If the similarity cost function values of the reconstructed image block a and the reconstructed image block B are smaller than the similarity cost function value threshold, the second encoded data is the encoded data corresponding to the reconstructed image block B. The reconstruction of the encoded data corresponding to the image block B means that the encoded data is obtained by using a partition method and/or encoding parameters. The reconstructed image block B may be a reconstructed image of the encoded data.

If the limited iteration is locally optimal, the second coded data is the coded data corresponding to the reconstructed image block B with the limited iteration being locally optimal. The limited iteration local optimization specifically means that all the division modes and/or encoding parameters are used for encoding the nth image block, the similarity between the reconstructed image block B and the reconstructed image block a of each division mode and/or encoding parameter is obtained through calculation, one image block with the highest similarity is selected as the limited iteration local optimization, and the encoded data corresponding to the reconstructed image block B with the highest similarity is used as the second encoded data.

Optionally, it may be determined whether to perform step 1105 in combination with other information. For example, it is determined whether to perform step 1105 according to information such as a frame type used in the first encoding, a reference relationship, or whether temporal motion vector prediction is turned off. Illustratively, when there is only a single frame reference in the first encoding and the temporal motion vector prediction is turned off, step 1105 is performed to encode a random access frame of the second code stream.

Optionally, it may be determined whether to perform step 1105 in combination with other information. For example, it is determined whether to execute step 1105 by combining the intra prediction strong filtering and SAO filtering employed in the first encoding. Illustratively, in the presence of the first encoding with intra prediction strong filtering or SAO filtering turned on, the random access frame is encoded without performing step 1105.

In this embodiment, the encoding of the second code stream is adjusted based on the encoding result of the first code stream including the nth image block, so as to achieve the same or equivalent quality between the reconstructed image block of the first code stream and the reconstructed image block of the corresponding second code stream, thereby improving the decoding quality of the accessed video content, reducing the blocking effect, and eliminating part of the artifact effect on the basis of meeting the requirement of accessing the video content with low delay. By encoding the first code stream and the second code stream at the image block level, the synchronous output of the two code streams at the image block level can be realized, so that a random access frame of the second code stream for accessing the video content can be obtained quickly, and the access time delay is reduced.

The following explains a video image processing method according to an embodiment of the present application, with the above-described image to be encoded as a source video image.

Referring to fig. 12, fig. 12 is a schematic flowchart illustrating a video image processing method according to an embodiment of the present disclosure. Portions of the methods of the embodiments of the present application may be performed by an encoding device. The encoding apparatus may be applied to the source device 12 in the above-described embodiment, for example, as the server 801 in the embodiment shown in fig. 8. The present embodiment takes the image to be encoded as the K-th frame image as an example. The encoding means may encode the K-th frame image to generate a first code stream and a second code stream. The method as shown in fig. 12 may comprise the following implementation steps:

Step 1201, performing first coding on the Kth frame image to generate a first code stream.

The encoding apparatus may receive an input K-th frame image. In the embodiment of the present application, the encoding of the image of the K-th frame is taken as an example for illustration, and other frames may adopt the same or similar processing manner.

The encoding device may perform first encoding on the K frame image using information such as a prediction mode P, a partition method D, and a quantization parameter QP, to generate a first code stream. The first code stream may include first encoded data of a kth frame image. In the first encoding process, a reconstructed frame a of the K-th frame image may also be generated.

And 1202, dividing the Kth frame image in a dividing mode, and performing second coding in a full-frame intra-prediction mode on all divided image blocks to obtain a reconstructed frame B of a second code stream.

And starting coding to generate a second code stream by taking the K frame image and the reconstructed frame A as input information.

Alternatively, information such as the partition method D and the quantization parameter QP related to the first encoding may be input information.

Illustratively, the image of the K-th frame is divided in a dividing manner, and a series of second encoding processes such as intra-frame prediction, transformation, quantization, inverse transformation and the like are performed on all the divided sub-blocks to obtain a reconstructed frame B of the second code stream.

When this step 1202 is performed for the K-th frame image for the first time, the encoding information and/or encoding parameters used in a series of second encoding processes such as partitioning, intra-prediction, transformation, quantization, for example, the partitioning method, QP, code rate, and the like may be the encoding information and/or encoding parameters that are generated randomly by initialization, or the encoding information and/or encoding parameters used in the forward I-frame of the K-th frame image may be used. For example, for QP, the average QP of the K frame picture forward of the nearest I frame or frames may be used.

Reconstructed frame B may be composed of reconstructed blocks of one or more encoded image blocks.

Optionally, when the prediction mode P of the kth frame image is full intra prediction, the second encoded data of the current frame of the second code stream directly uses the first encoded data of the kth frame image of the first code stream. When the prediction mode P of the K-th frame image is not the full intra prediction (for example, the K-th frame image is a P frame, or there is a P block), the K-th frame image is second encoded in the full intra prediction mode through step 1202.

And step 1203, calculating a similarity cost function value of the reconstructed frame A and the reconstructed frame B.

Specifically, the similarity cost function values f (partition manner, QP) of the reconstructed frame a and the reconstructed frame B may be calculated according to the following formula (6).

Wherein I represents an index of a pixel in the reconstructed frame, I represents a total number of pixels in an image of the reconstructed frame, B (partition, QP, I) represents a reconstructed pixel value of an ith pixel position of the reconstructed frame B using a quantization parameter QP, and a (I) represents a reconstructed pixel value of an ith pixel position of the reconstructed frame a. T may be 1 or 2.

Alternatively, the similarity between the two reconstructed frames may be evaluated in other ways besides the formula (6), including but not limited to MAD, SAD, SSD, MSD, SATD, etc.

And 1204, transforming the division mode and/or the coding parameters, and repeatedly executing the step 1202.

The embodiment of the application may have multiple division modes and/or coding parameters, one division mode and/or coding parameter may be selected from the multiple division modes and/or coding parameters, step 1202 is repeatedly executed to traverse the multiple division modes and/or coding parameters, the K-th frame image is encoded, and the similarity cost function values of the reconstructed frame B and the reconstructed frame a of each division mode and/or coding parameter are calculated.

Taking the coding parameter as the QP, the QP is selected at a certain step (e.g. 1 or 2) within a certain interval (e.g. 0 to 51), and step 1202 is repeated until the limited QP is enumerated.

Taking the partitioning manner as an example, step 1202 is executed for each partitioning manner until the limited partitioning manners are enumerated.

And 1205, entropy coding is carried out on the second coded data and the dividing mode and/or the coding parameter to generate a second code stream, and coding of the Kth frame of image is completed.

The second code stream may include second encoded data of a kth frame image.

The second encoded data is the encoded data corresponding to the reconstructed frame B with locally optimal finite iteration. The limited iteration local optimization specifically means that all the division modes and/or encoding parameters are used for encoding the Kth frame image, the similarity between the reconstructed frame B and the reconstructed frame A of each division mode and/or each encoding parameter is obtained through calculation, one frame with the highest similarity is selected as the limited iteration local optimization, and the encoded data corresponding to the reconstructed frame B with the highest similarity is used as the second encoded data.

Optionally, if the similarity cost function value of the reconstructed frame a and the reconstructed frame B is smaller than the similarity cost function value threshold, the second encoded data is the encoded data corresponding to the reconstructed frame B. The reconstruction of the encoded data corresponding to the frame B means that the encoded data is obtained by using a partition method and/or encoding parameters. Reconstructed frame B may be a reconstructed image of the encoded data.

Optionally, in the process of comparing the similarity of the two reconstructed frames, if there is a large difference between the individual pixels, for example, the difference between the gray values exceeds 128, the division manner and the corresponding quantization result may be discarded.

After performing step 1205, step 1201 may be repeatedly performed to start encoding the K +1 th frame image.

In this embodiment, the encoding of the second code stream is adjusted based on the encoding result of the first code stream including the K-th frame image, so as to achieve the same or equivalent quality between the reconstructed frame of the first code stream and the reconstructed frame of the corresponding second code stream, thereby improving the decoding quality of the accessed video content, reducing the blocking effect, and eliminating part of the artifact effect on the basis of meeting the requirement of accessing the video content with low delay. And controlling the quality of the first code stream and the second code stream to be consistent from the frame level through the coding of the first code stream and the second code stream of the frame level.

The video image processing method in the above embodiment controls encoding of the current image of the second code stream according to the reconstructed image of the current image (for example, the current frame or the current block) of the first code stream, so as to achieve the same or equivalent quality between the reconstructed image of the first code stream and the reconstructed image of the corresponding second code stream. The embodiment of the present application further provides a video image processing method in the following embodiment, where the method controls encoding of a second image to be encoded of a second code stream according to at least one first image to be encoded of a first code stream, where the second image to be encoded is a video image before the at least one first image to be encoded, so as to implement that a reconstructed image of the first code stream and a reconstructed image of a corresponding second code stream have the same or equivalent quality.

Referring to fig. 13, fig. 13 is a schematic flowchart illustrating a video image processing method according to an embodiment of the present disclosure. Portions of the methods of the embodiments of the present application may be performed by an encoding device. The encoding apparatus may be applied to the source device 12 in the above-described embodiment, for example, as the server 801 in the embodiment shown in fig. 8. It should be understood that a series of steps or operations related to the embodiments of the present application may be executed in various orders and/or simultaneously, and the execution order is not limited by the size of the step numbers shown in fig. 13. The method as shown in fig. 13 may comprise the following implementation steps:

step 1301, at least one first image to be encoded and at least one second image to be encoded are obtained.

The second image to be encoded is a video image preceding the at least one image to be encoded.

The second image to be encoded and the one or more first images to be encoded of the embodiments of the present application may be video images captured by a camera or other capturing device, or decoded video images. The decoded video image may be an image obtained by decoding a compressed video image.

Step 1302, respectively performing a first encoding on at least one first image to be encoded to generate a first code stream.

The first encoding may include one or more of prediction, transform, quantization, entropy encoding, and so on. For example, at least one first image to be encoded may be predicted, transformed, and quantized, respectively, to generate one or more first encoded data, and then entropy-encoded to generate a first code stream including the one or more first encoded data.

Alternatively, the prediction mode of the first encoding may be inter prediction. Alternatively, the prediction mode of the first encoding may be intra prediction.

And 1303, determining at least one of a first division mode and a first coding parameter used for carrying out second coding on a second image to be coded according to at least one first reconstructed image, wherein the at least one first reconstructed image is a first code stream or a reconstructed image in a first coding process.

The first code stream can be decoded to obtain one or more first reconstructed images. Alternatively, in the first encoding process, one or more first encoded data may be subjected to inverse quantization, inverse transformation, and the like to obtain one or more first reconstructed images. Then, the embodiment of the present application may determine, according to one or more first reconstructed images, a first partition manner and/or a first encoding parameter that is used for performing a second encoding on a second image to be encoded.

In an implementation manner, the at least one first image to be encoded may be at least one first source video image, and accordingly, the at least one first reconstructed image is at least one first reconstructed frame, where the at least one first reconstructed frame is obtained by decoding a first code stream corresponding to each of the at least one first source video image, or is a reconstructed frame of first encoded data corresponding to each of the at least one first source video image in the first encoding process.

And 1304, performing second coding in a full-frame intra-prediction mode on the second image to be coded according to at least one of the first division mode and the first coding parameter to generate a second code stream.

And according to the first division mode and/or the first coding parameter, carrying out second coding in a full-frame intra-prediction mode on a second image to be coded to generate a second code stream. The second code stream may include second encoded data.

The prediction mode of the second encoding may be intra prediction. And carrying out second coding on the second image to be coded so as to generate a second code stream of the frame comprising the full-frame intra-prediction mode.

In some embodiments, a difference between the at least one first reconstructed image and the at least one second reconstructed image is smaller than a difference threshold, or a similarity between the at least one first reconstructed image and the at least one second reconstructed image is higher than a similarity threshold, and the at least one second reconstructed image is obtained by decoding the first code stream by using the third reconstructed image as a reference image. Wherein, the difference threshold or the similarity threshold can be reasonably set according to requirements. The third reconstructed image is the second code stream or the reconstructed image in the second coding process.

Wherein the number of the at least one first reconstructed image is the same as the number of the at least one second reconstructed image. For example, a first encoding is performed on an image to be encoded to generate a first code stream, and a second encoding is performed on a second image to be encoded according to a first reconstructed image to generate a second code stream. The difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold value or the similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold value. The second reconstructed image is obtained by decoding the first code stream by taking the third reconstructed image as a reference image.

For a detailed explanation of the differences and similarities, reference may be made to the relevant explanation of step 1004 in the embodiment shown in fig. 10, and details are not repeated here.

For example, a plurality of images to be encoded are subjected to first encoding to generate a first code stream, and a second image to be encoded is subjected to second encoding according to a plurality of first reconstructed images to generate a second code stream. The difference between the plurality of first reconstructed images and the plurality of second reconstructed images is smaller than a difference threshold or the similarity between the plurality of first reconstructed images and the plurality of second reconstructed images is higher than a similarity threshold. The plurality of second reconstructed images are obtained by decoding the first code stream by taking the third reconstructed image as a reference image. Wherein the difference between the plurality of first reconstructed images and the plurality of second reconstructed images may be a weighted sum of differences between each of the plurality of first reconstructed images and the corresponding second reconstructed image. The similarity between the plurality of first reconstructed images and the plurality of second reconstructed images may be a weighted sum of the similarities between each of the plurality of first reconstructed images and the corresponding second reconstructed image.

Optionally, the at least one first image to be encoded is a first image to be encoded, and correspondingly, the at least one first reconstructed image is a first reconstructed image, and a specific implementation manner of the step 1303 may be that, according to the first reconstructed image, one second division manner is selected from a plurality of second division manners as the first division manner; and/or selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter.

For example, the encoding apparatus may perform multiple second encoding on the second image to be encoded according to multiple second division manners and/or multiple second encoding parameters, respectively, to generate multiple third code streams. Each of the plurality of third code streams may include a third encoded data. Then, the encoding apparatus may decode the first code stream using the plurality of fifth reconstructed images as reference images, respectively, to obtain a plurality of fourth reconstructed images, and select one of the plurality of fourth reconstructed images with the highest similarity as the second reconstructed image by comparing similarities between the plurality of fourth reconstructed images and the first reconstructed image, respectively. In other words, the similarity between the first reconstructed image and the second reconstructed image is the highest of the similarities between the first reconstructed image and the plurality of fourth reconstructed images. And taking a third code stream corresponding to the fourth reconstructed image with the highest similarity as a second code stream, or taking third encoded data corresponding to the fourth reconstructed image with the highest similarity as second encoded data to generate a second code stream comprising the second encoded data.

And the fifth reconstructed images are reconstructed images of the third code streams or reconstructed images in the multiple second coding processes.

Optionally, the at least one first image to be encoded is a plurality of first images to be encoded, and correspondingly, the at least one first reconstructed image is a plurality of first reconstructed images, and a specific implementation manner of the step 1303 may be that, according to the plurality of first reconstructed images, one second division manner is selected from the plurality of second division manners as the first division manner; and/or selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter.

For example, taking the number of the first images to be encoded as m, the encoding apparatus may perform x times of second encoding on the second images to be encoded according to the plurality of second division manners and/or the plurality of second encoding parameters, respectively, to generate x third code streams. The x third code streams may each include one third encoded data. Then, the encoding apparatus may decode the first code stream using the x fifth reconstructed images as reference images, respectively, to obtain x × m fourth reconstructed images. It may be understood that x sets of fourth reconstructed images, a set of fourth reconstructed images comprising m fourth reconstructed images. And decoding the first code stream by taking one fifth reconstructed image in the x fifth reconstructed images as a reference image to obtain m fourth reconstructed images, namely a group of fifth reconstructed images. And comparing the similarity between each of the x groups of fourth reconstructed images and the m first reconstructed images, and selecting a group of second reconstructed images with the highest similarity from the group of fourth reconstructed images as the second reconstructed images corresponding to the m first images to be coded. And taking a third code stream corresponding to the m fourth reconstructed images with the highest similarity as a second code stream, or taking third encoded data corresponding to the m fourth reconstructed images with the highest similarity as second encoded data, so as to generate a code stream comprising the second encoded data. The third code stream corresponding to the m fourth reconstructed images with the highest similarity refers to the m fourth reconstructed images with the highest similarity, and is obtained by decoding the first code stream by using the fifth reconstructed image of the third code stream as a reference image.

The similarity between the second reconstructed image and the m first reconstructed images corresponding to the m first images to be coded is a weighted sum of the similarities between the second reconstructed images and the corresponding first reconstructed images corresponding to the m first images to be coded.

For example, the plurality of first images to be encoded includes m first images to be encoded, the plurality of first reconstructed images includes m first reconstructed images, and the m first reconstructed images are a ₁ 、A ₂ ，……，A _m And m is any positive integer greater than 1. A plurality of fourth reconstructed images corresponding to the ith first image to be coded in the m first images to be coded are C _1i ，……，C _xi X is any positive integer greater than 1, and i is 2 to m. x may represent the xth second encoding.

The fifth reconstructed images are B ₁ ，……，B _x 。

C ₁₁ To be B ₁ Obtained by decoding first coded data corresponding to the 1 st first image to be coded as a reference image, C ₁₂ To be C ₁₁ Decoding first coded data corresponding to the 2 nd first image to be coded as a reference image to obtain the image data of \8230 \ 8230;, C _1m To be C _1(m-1) And decoding the first coded data corresponding to the mth first image to be coded as the reference image.

C _x1 To be B _x Obtained by decoding first coded data corresponding to the 1 st first image to be coded as a reference image, C _x2 To be C _x1 Decoding first coded data corresponding to the 2 nd first image to be coded as a reference image to obtain data of \8230, 8230and C _xm To be C _x(m-1) And decoding the first coded data corresponding to the mth first image to be coded as the reference image.

Optionally, before step 1303, the method for processing a video image according to the embodiment of the present application may further include: before at least one first image to be coded is subjected to first coding, first coding is carried out on a second image to be coded to generate a fourth code stream, and whether the prediction mode of the first coding is intra-frame prediction or not is judged. When the prediction mode of the first encoding is inter prediction, step 1303 is performed. And when the prediction mode of the first code is intra-frame prediction, taking the fourth code stream as the second code stream. In this way, when the second image to be encoded is subjected to the first encoding to obtain the first code stream including the P frame or the P block, the second image to be encoded may be subjected to the second encoding by performing step 1303 and step 1304 to generate the second code stream including the frame of the full intra prediction mode. When the second image to be encoded is subjected to the first encoding to obtain the first code stream including the frame of the full-frame intra prediction mode, the step 1303 and the step 1304 may not be executed, and the frame of the full-frame intra prediction mode may be directly used as the second encoded data, so that the encoding efficiency may be improved.

For example, taking the long GOP stream of another video content and the random access stream shown in fig. 7 as an example, the encoding device performs the first encoding on the second image to be encoded to generate a long GOP stream including the frame number 3 (# 3 frame). The encoding device determines whether the prediction mode of the first encoding is intra prediction. As shown in fig. 7, the prediction mode of the first encoding at this time is inter prediction. After that, the encoding apparatus may perform first encoding on the first image to be encoded by performing steps 1301 through 1304 to generate a long GOP codestream including the frame number 4 (# 4 frame). And carrying out second coding on a second image to be coded according to the reconstructed frame of the frame (# 4 frame) with the number of 4 of the long GOP code stream so as to generate a random access code stream of the frame (# 3 frame) with the number of 3. The prediction mode of the frame number 3 (# 3 frame) in the random access codestream is intra prediction.

It should be noted that, in some embodiments, when at least one first image to be encoded is subjected to first encoding to obtain a first code stream of a frame including a full intra prediction mode, the frame of the full intra prediction mode may not be used as second encoded data, that is, the second code stream may not include the frame of the full intra prediction mode, and may be set reasonably according to a video transmission requirement.

In this embodiment, at least one first image to be encoded is respectively first encoded to generate a first code stream, at least one of a first partition manner and a first encoding parameter used for second encoding of a second image to be encoded is determined according to at least one first reconstructed image, the second image to be encoded is second encoded according to at least one of the first partition manner and the first encoding parameter to generate a second code stream, and the at least one first reconstructed image is the first code stream or a reconstructed image in a first encoding process. Therefore, the encoding of the second code stream is adjusted based on the encoding result of the first code stream, so that the quality of the reconstructed frame of the first code stream is the same as or equal to that of the reconstructed frame of the corresponding second code stream, the decoding quality of the accessed video content is improved, the blocking effect is reduced, and partial artifact effect is eliminated on the basis of meeting the low-delay access of the video content.

The following explains a video image processing method according to an embodiment of the present application, with at least one first to-be-encoded image as a source video image.

Referring to fig. 14, fig. 14 is a flowchart illustrating a video image processing method according to an embodiment of the present disclosure. Portions of the methods of the embodiments of the present application may be performed by an encoding device. The encoding apparatus may be applied to the source device 12 in the above-described embodiment, for example, as the server 801 in the embodiment shown in fig. 8. In this embodiment, at least one first image to be encoded is an image of a K +1 th frame, and a second image to be encoded is an image of a K th frame. The encoding device can adjust the second encoding of the Kth frame image according to the encoding result of the Kth +1 frame image so as to generate a first code stream and a second code stream with consistent quality. The method as shown in fig. 14 may comprise the following implementation steps:

step 1401, the first encoding is performed on the K +1 frame image to generate a first code stream.

The encoding apparatus may perform first encoding on the K +1 frame image using information such as a prediction mode P, a division manner D, and a quantization parameter QP to generate a first code stream. The first code stream may include first encoded data of a K +1 th frame image. In the first encoding process, a reconstructed frame a of the K +1 th frame image may also be generated.

And 1402, dividing the Kth frame image in a dividing mode, and performing second coding in a full-frame intra-prediction mode on all the divided image blocks to obtain a reconstructed frame B of a second code stream.

Illustratively, the image of the K-th frame is divided in a dividing manner, and a series of second encoding processes such as intra-frame prediction, transformation, quantization, inverse transformation and the like are performed on all the divided image blocks to obtain a reconstructed frame B of the second code stream.

When this step 1402 is executed for the first time on the K-th frame image, the coding information and/or coding parameters used in a series of second coding processes such as division, intra prediction, transformation, quantization, etc., for example, the division method, QP, code rate, etc., may be coding information and/or coding parameters that are generated randomly by initialization, or may be coding information and/or coding parameters used in the I frame forward of the K-th frame image. For example, for QP, the average QP of one or more I frames forward of the K frame picture nearest may be used.

Optionally, when the prediction mode P of the kth frame image of the first code stream is full intra prediction, the second encoded data of the kth frame image of the second code stream directly uses the first encoded data of the kth frame image of the first code stream. When the prediction mode P of the kth frame image of the first code stream is not the full intra prediction (for example, the kth frame image of the first code stream is a P frame, or there is a P block), then through step 1302, the second encoding of the full intra prediction mode is performed on the kth frame image.

And 1403, the reconstructed frame B is taken as a reference frame of a K +1 th frame of the decoded first code stream, and the first code stream is decoded to obtain another reconstructed frame C of the K +1 th frame.

And step 1404, calculating a similarity cost function value of the reconstructed frame A and the reconstructed frame C.

Specifically, the similarity cost function values f (partition mode, QP) of the reconstructed frame a and the reconstructed frame C may be calculated according to the following formula (7).

Wherein I represents the index of the pixel in the reconstructed frame, I represents the total number of pixels in the image of the reconstructed frame, C (partition mode, QP, I) represents the reconstructed pixel value of the ith pixel position of the quantization parameter QP adopted in one partition mode of the reconstructed frame C, and a (I) represents the reconstructed pixel value of the ith pixel position of the reconstructed frame a. T may be 1 or 2.

Alternatively, the similarity between the two reconstructed frames may be evaluated in other ways besides the formula (7), including but not limited to MAD, SAD, SSD, MSD, SATD, etc.

Step 1405, transforming the division mode and/or the coding parameters, and repeatedly executing step 1402.

The embodiment of the application may have a plurality of partition modes and/or encoding parameters, one partition mode and/or encoding parameter may be selected from the plurality of partition modes and/or encoding parameters, step 1402 is repeatedly executed to traverse the plurality of partition modes and/or encoding parameters, the K-th frame image is encoded, and the similarity cost function values of the reconstructed frame C and the reconstructed frame a of each partition mode and/or encoding parameter are calculated.

Taking the coding parameter as QP, for example, within a certain interval (e.g. 0 to 51), QP is selected with a certain step size (e.g. 1 or 2), and step 1302 is repeatedly executed until the limited QP is enumerated.

Taking the partitioning manner as an example, step 1302 is executed for each partitioning manner until the limited partitioning manners are enumerated.

And 1406, entropy coding is carried out on the second coded data and the division mode and/or the coding parameters to generate a second code stream, and coding of the Kth frame of image is completed.

The second code stream may include second encoded data for a kth frame image.

The second encoded data is the encoded data corresponding to the reconstructed frame C with local optimal finite iteration. The limited iteration local optimization specifically means that all the division modes and/or encoding parameters are used for encoding the Kth frame image, the similarity between the reconstructed frame C and the reconstructed frame A of each division mode and/or encoding parameter is obtained through calculation, one frame with the highest similarity is selected as the limited iteration local optimization, and the encoded data corresponding to the reconstructed frame C with the highest similarity is used as the second encoded data.

Optionally, if the similarity cost function value of the reconstructed frame a and the reconstructed frame C is smaller than the similarity cost function value threshold, the second encoded data is the encoded data corresponding to the reconstructed frame C. The coded data corresponding to the reconstructed frame C is obtained by performing second coding on the image of the kth frame by using a partition mode and/or a coding parameter. Decoding the encoded data may result in reconstructed frame B. And decoding the first coded data by taking the reconstructed frame B as a reference frame to obtain a reconstructed frame C.

Optionally, in the process of comparing the similarity between two reconstructed frames, if there is a large difference between individual pixels, for example, the difference between gray values exceeds 128, the division manner and the corresponding quantization result may be discarded.

After step 1406 is performed, step 1401 may be repeatedly performed to start encoding the K +2 th frame image.

In this embodiment, based on the encoding result of the first code stream including the K +1 th frame image, the second encoding of the K frame image is adjusted to achieve the same or equivalent quality of the reconstructed frame of the first code stream and the reconstructed frame of the corresponding second code stream, so that on the basis of meeting the requirement of low-latency access to the video content, the decoding quality of the access video content is improved, the blocking effect is reduced, and a part of artifact effect is eliminated. And the encoding mode of the second code stream is adjusted by simulating the decoding mode of the decoding end, so that the inconsistent encoding and decoding effects are reduced, and the elimination of the blocking effect is facilitated.

The following explains the video image processing method according to the embodiment of the present application, with the at least one first to-be-encoded image as a plurality of source video images.

Referring to fig. 15, fig. 15 is a flowchart illustrating a method for processing a video image according to an embodiment of the present disclosure. Portions of the methods of the embodiments of the present application may be performed by an encoding device. The encoding apparatus may be applied to the source device 12 in the above-described embodiment, for example, as the server 801 in the embodiment shown in fig. 8. The embodiment takes at least one first image to be encoded as including the K +1 th frame image, the K +2 th frame image, \8230;, and the K + m frame image, and the second image to be encoded as the K frame image as an example. The encoding device can adjust the second encoding of the K frame image according to the encoding results of the K +1 frame image, the K +2 frame image, \8230 \ 8230;, and the K + m frame image to generate a first code stream and a second code stream with consistent quality. The method as shown in fig. 15 may comprise the following implementation steps:

Step 1501, respectively carrying out first coding on the K +1 frame image, the K +2 frame image, \8230 \ 8230;, and the K + m frame image to generate a first code stream.

The encoding apparatus may perform first encoding on a K +1 frame image, a K +2 frame image, \8230;, and a K + m frame image, respectively, using information such as a prediction mode P, a division manner D, and a quantization parameter QP, to generate a first code stream including first encoded data of the K +1 frame image, the K +2 frame image, \8230;, and the K + m frame image. In the first coding process, a reconstructed frame A of a K +1 frame image can be generated ₁ And a reconstructed frame A of the K +2 frame image ₂ 823060, 8230, and the reconstructed frame A of the K + m frame image _m . m is a positive integer greater than or equal to 2.

And 1502, dividing the Kth frame image in a dividing mode, and performing second coding in a full-frame intra-prediction mode on all the divided image blocks to obtain a reconstructed frame B of a second code stream.

Reconstructing the image of the K +1 frame, the image of the K +2 frame, \ 8230; \8230;, the image of the K + m frame, and the image of the K +1 frame ₁ And a reconstructed frame A of the K +2 frame image ₂ 823060, 8230, and the reconstructed frame A of the K + m frame image _m And starting to encode to generate a second code stream as input information.

Illustratively, the kth frame image is divided in a dividing manner, and a series of second encoding processes such as intra-frame prediction, transformation, quantization, inverse transformation and the like are performed on all the divided sub-blocks to obtain a reconstructed frame B of the second code stream.

When this step 1502 is performed for the first time on the K-th frame image, the coding information and/or coding parameters used in a series of second coding processes such as division, intra prediction, transformation, quantization, etc., for example, the division method, QP, code rate, etc., may be coding information and/or coding parameters that are generated randomly by initialization, or may be coding information and/or coding parameters used in the I frame forward of the K-th frame image. For example, for QP, the average QP of one or more I frames forward of the K frame picture nearest may be used.

Reconstructed frame B may consist of reconstructed blocks of one or more encoded image blocks.

Optionally, when the prediction mode P of the kth frame image of the first code stream is full intra prediction, the second encoded data of the kth frame image of the second code stream directly uses the first encoded data of the kth frame image of the first code stream. When the prediction mode P of the K-th frame image of the first code stream is not the full intra prediction (for example, the K-th frame image of the first code stream is a P frame, or there is a P block), the K-th frame image is subjected to the second encoding of the full intra prediction mode, via step 1502.

Step 1503, the reconstructed frame B is used as a reference frame for decoding the (K + 1) th frame of the first code stream, and the (K + 1) th frame of the first code stream is decoded to obtain another reconstructed frame C of the (K + 1) th frame ₁ Will reconstruct frame C ₁ As a reference frame of the K +2 th frame of the decoded first code stream, the K +2 th frame of the first code stream is decoded to obtain another reconstructed frame C of the K +2 th frame ₂ By analogy, frame C will be reconstructed _m As a reference frame for decoding the K + m frame of the first code stream, decoding the K + m frame of the first code stream to obtain another reconstructed frame C of the K + m frame _m 。

Step 1504, calculating the reconstructed frame A ₁ And reconstruct frame C ₁ And reconstructing a frame A ₂ And reconstruct frame C ₂ 8230and reconstructing frame A _m And reconstruct frame C _m The similarity cost function values of (1) are weighted and accumulated.

Specifically, it can be calculated according to the following formula (8).

Wherein i represents the index of the pixels in the reconstructed frame, N represents the total number of pixels in the image of the reconstructed frame, C _m (division mode, QP, i) represents the pixel value of the ith pixel position of the K + m reconstructed frame of the first code stream decoded by using the reconstructed frame B of the division mode and the quantization parameter QP as a reference frame, A _m (i) Representing the reconstructed pixel value of the reconstructed frame generated by the first encoding at the ith pixel position. T may be 1 or 2,w _m And a weighting coefficient representing the similarity of the m-th reconstructed frame.

Optionally, a different weighting factor may be selected according to the distance from the mth frame image, and

for example, for m =2, w may be selected ₁ ＝0.6，w ₂ ＝0.4。

Optionally, for reconstructed frame A ₁ And reconstructing frame C ₁ And reconstructing a frame A ₂ And reconstruct frame C ₂ 823000 and reconstructing frame A _m And reconstructing frame C _m The weighted sum of similarity cost function values of (1) may be evaluated in other ways besides using equation (8), including but not limited to MAD, SAD, SSD, MSD, SATD, etc.

Step 1505, transform the partition and/or encoding parameters, and repeat step 1502.

The embodiment of the application may have a plurality of partition modes and/or encoding parameters, one partition mode and/or encoding parameter may be selected from the plurality of partition modes and/or encoding parameters, step 1402 is repeatedly executed to traverse the plurality of partition modes and/or encoding parameters, the image of the K-th frame is subjected to the second encoding, and the reconstructed frame C of each partition mode and/or encoding parameter is obtained through calculation ₁ Reconstructing frame C ₂ 823060, 8230min, and reconstructed frame C _m And reconstructing frame A ₁ And reconstructing a frame A ₂ 8230; and reconstructed frame A _m The similarity of (c).

Taking the coding parameter as QP, for example, within a certain interval (e.g. 0 to 51), QP is selected with a certain step size (e.g. 1 or 2), and step 1502 is repeated until the limited QP is enumerated.

Taking the partitioning manner as an example, step 1402 is executed for each partitioning manner until the limited partitioning manners are enumerated.

And step 1506, performing entropy coding on the second coded data and the division mode and/or the coding parameter to generate a second code stream, and completing coding of the Kth frame of image.

The second code stream may include second encoded data of a kth frame image.

The second coded data is a reconstructed frame C with local optimization of finite iteration ₁ And reconstructing a frame C ₂ 8230; and reconstructed frame C _m Corresponding encoded data. The limited iteration local optimization specifically means that all the division modes and/or encoding parameters are used for encoding the Kth frame image, and a reconstructed frame C of each division mode and/or encoding parameter is obtained through calculation ₁ And reconstructing a frame C ₂ 8230; and reconstructed frame C _m And reconstruct frame A ₁ Reconstructing a frame A ₂ 823060, 8230min, and reconstructed frame A _m Selecting one with highest similarity as a local optimum of finite iteration, and reconstructing a frame C with highest similarity ₁ And reconstructing a frame C ₂ 823060, 8230min, and reconstructed frame C _m The corresponding encoded data serves as the second encoded data herein. Reconstructing frame C ₁ And reconstructing a frame C ₂ 8230; and reconstructed frame C _m The corresponding coded data refers to that a dividing mode and/or a coding parameter are adopted to carry out second coding on the Kth frame image to obtain coded data. Decoding the encoded data may result in reconstructed frame B. The reconstructed frame B is used as a reference frame, the K +1 th frame of the first code stream is decoded, and a reconstructed frame C can be obtained ₁ Will reconstruct frame C ₁ As a reference frame, decoding the (K + 2) th frame of the first code stream to obtain a reconstructed frame C ₂ By analogy, a reconstructed frame C can be obtained _m 。

In the embodiment, based on the coding result of the first code stream comprising the K +1 frame image, the K +2 frame image, \8230 \\8230;, and the K + m frame image, the second code of the K frame image is adjusted to realize the same or equivalent quality of the reconstructed frame of the K frame image of the first code stream and the reconstructed frame of the K frame image of the second code stream, so that on the basis of meeting the requirement of low-delay access of video content, the decoding quality of the access video content is improved, the block effect is reduced, and part of artifact effect is eliminated. And the encoding mode of the second code stream is adjusted by simulating the decoding mode of the decoding end, so that the inconsistent encoding and decoding effects are reduced, and the elimination of the blocking effect is facilitated.

The embodiments of the present application further provide the following embodiments, which specifically explain some other realizations of the embodiments shown in fig. 9. For the second encoding process, in the following embodiments, according to the encoding information of the first encoding, second encoding in the full frame intra prediction mode is performed on the image to be encoded or the first reconstructed image, so as to generate a second code stream.

The coding information of the first code may include one or more of a partition manner of the first code, a quantization parameter of the first code, and coding distortion information of the first code.

In this way, the second encoding in the full-frame intra prediction mode for the image to be encoded or the first reconstructed image according to the encoding information of the first encoding may include one or more of the following: performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting a division mode same as the first coding; or, carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting the quantization parameter same as the first coding; or, according to the coding distortion information of the first coding, determining a quantization parameter of a second coding, and according to the quantization parameter of the second coding, performing a second coding of the full-frame intra prediction mode on the image to be coded or the first reconstructed image.

(1) Dividing method

The partition manner of the first coding may include a TU partition manner, a PU partition manner, or a CU partition manner of the first coding.

For example, as distortion of coding mainly comes from quantization, the second coding selects a TU partition mode consistent with the first coding, for example, fig. 16 is a schematic diagram that the first coding and the second coding provided in this embodiment of the present application use the same TU partition mode, and the first coding and the second coding may both use the TU partition mode as shown in fig. 16. Therefore, the consistency of the distorted boundaries can be effectively ensured, namely, the distortion exists on the same boundary, and the quality of the first code stream and the quality of the second code stream are equal or the same.

Alternatively, in addition to the TU partition of the first coding, the PU or CU partition of the first coding may be transferred. And controlling the second coding based on the PU or CU partition mode of the first coding. The PU partition mode of the first code is used for quickly deciding the prediction direction of the PU at the corresponding position of the second code. For example, if the PU of a certain coding block of the first code is intra, the PU of the coding block corresponding to the second code may be kept consistent with the first code; alternatively, the second coded PU does not cross the PU boundary of the first coded counterpart location. The CU partition for the first coding may be referred to as the second coding pre-partition CU. For example, the second coded CU makes partition decisions only within the first coded CU, ensuring that the second coded CU does not cross the first coded CU boundaries.

(2) Quantization parameter

The image contents of the first coding input and the second coding input are similar or identical, for example, the first coding input is an image to be coded, the second coding input is an image to be coded and a first reconstructed image, and the first reconstructed image is a reconstructed image of the image to be coded subjected to the first coding, so that the video space-time complexity of the first coding input and the video space-time complexity of the second coding input are similar or identical. In order to reduce the computational complexity, the second encoding may utilize quantization parameter distribution information of the first encoding. The quantization parameter distribution information combines the difference of the sensitivity of human eyes to different space-time complexity to design the quantization parameters of each coding block. In order to improve the quality of the second coding, a quantization parameter offset may be superimposed on the quantization parameter of each coding block on the premise of ensuring the quantization parameter distribution difference, wherein qp _ offset represents the quantization parameter offset. Fig. 17 is a schematic diagram of a first encoded quantization parameter and a second encoded quantization parameter provided in an embodiment of the present application, and taking fig. 17 as an example, the second encoded quantization parameter may be obtained based on the first encoded quantization parameter and qp _ offset (shown as-3 in the figure). Specifically, the QP of each coding block of the second encoding may be obtained using QP-3 of each coding block of the first encoding. For example, the QP of the coding block of the first row and the first column of the first coding is 32 as shown in fig. 17, and based on this, the QP of the coding block of the first row and the first column of the second coding can be obtained as 29.

Alternatively, the quantization parameter transfer unit sizes may be equal, for example, each checkerboard (i.e., quantization parameter transfer unit) size shown in fig. 17 is 16x16 or 64x64 pixel units, i.e., the same QP value is used for each 16x16 or 64x64 pixel unit position.

Alternatively, the quantization parameter transfer unit sizes may not be equal. Fig. 18 is a schematic diagram of quantization parameters transmitted from a first code to a second code according to an embodiment of the present disclosure, and as shown in fig. 18, sizes of different small squares (i.e., quantization parameter passing units) may be different.

Alternatively, due to the quantization parameter used in each coding quantization process, the quantization parameter is calculated by the quantization parameter of the parameter set (e.g. the syntax element init _ qp _ minus26 in PPS), the slice level quantization parameter offset (e.g. slice _ qp _ delta in slice head), and the quantization parameter offset of the current coding block (e.g. mb _ delta _ quant). Therefore, the quantization parameter passed by the first encoding to the second-pass code includes, but is not limited to, one or any combination of quantization parameter in parameter set, slice-level quantization parameter offset, and quantization parameter offset of the encoding block, or the final quantization parameter used in the quantization process of the final first encoding.

(3) Encoding distortion information

In the second encoding process, the encoding distortion threshold of the second encoding may be determined according to the encoding distortion information of the first encoding. For example, for a certain coding block of the first code stream, the coding distortion information adopts the MAD index, and the value of the MAD index is 4. When the second coding is used for making a coding decision, if the MAD index between the predicted frame and the frame to be coded or between the reconstructed frame and the frame to be coded is less than 4, the decision judgment is quitted in advance, and the current coding strategy is taken as the optimal coding strategy.

Alternatively, the coding distortion information may use one or more of the common indicators such as MAD, SAD, SSD, MSD, SATD, etc.

In another implementation manner, in the embodiment of the present application, performing second encoding in the full frame intra prediction mode on the image to be encoded or the first reconstructed image according to the encoding information of the first encoding may include: and determining the quantization parameter of the second code according to the coding information of the first code and the characteristic information of the image to be coded. And according to the quantization parameter of the second coding, carrying out second coding in the full-frame intra-prediction mode on the image to be coded or the first reconstructed image.

The feature information of the image to be encoded may include one or more of content complexity of the image to be encoded, color classification information of the image to be encoded, contrast information of the image to be encoded, and content segmentation information of the image to be encoded.

(1) Content complexity

The content of the image input by the first coding and the second coding is similar or identical, for example, the input of the first coding is an image to be coded, the input of the second coding is the image to be coded and a first reconstructed image, and the first reconstructed image is a reconstructed image of the image to be coded which is subjected to the first coding, so that the space-time complexity of videos input by the first coding and the second coding is similar or identical, the content complexity obtained by analysis in the first coding process can be transmitted to the second coding, and the second coding does not need repeated calculation, and directly calculates coding parameters such as quantization coefficients and the like according to the content complexity so as to guide the second coding to generate a second code stream.

(2) Region information

The second encoding can set different quantization parameter offsets qp _ offset for different regions according to the region information of the first encoding because the contents have different complexities and the human eyes have different sensitivities to distortion. For example, qp _ offset is-5 for complex regions and-3 for simple regions. It is understood that other values may be used, and the embodiments of the present application are not necessarily illustrated.

Alternatively, the second encoding may be adjusted to set different encoding parameters for different regions according to the encoding distortion information, color classification, contrast information, content segmentation, and other information of the first encoding.

The video image processing method of the present application can complete encoding of the first code stream and the second code stream through any of the above embodiments, and in order to facilitate transmission and video decoding, the embodiments of the present application can also carry identification information of the first code stream and the second code stream through the following embodiments, so that the decoding device can distinguish the first code stream and the second code stream according to the identification information, and decode the corresponding code streams, so as to quickly access video content.

If the scene needing random access does not occur, the encoding device can encapsulate and transmit the first code stream. The decoding device may receive, decode, or display the first codestream. If a time needs random access, the encoding apparatus may select a second code stream corresponding to a random access frame corresponding to the time or a next time to encapsulate and transmit. The encoding device may encapsulate and transmit the first code stream at the subsequent time. The decoding device may receive, decode, or display the second code stream first, then receive the first code stream, and decode or display the first code stream based on a reconstructed image of the second code stream.

(1) Adding bitstream identification information to a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), or a Picture Parameter Set (PPS)

The code stream receiving device (e.g., a decoding apparatus) may determine whether the received code stream supports the single-frame random access function according to the identification information, and distinguish whether the received code stream belongs to the long GOP stream or the basic stream (i.e., the first code stream), or includes two code streams (i.e., the first code stream and the second code stream).

Taking PPS as an example, as shown in table 1:

table 1 adding code stream identification information to PPS parameter set

In table 1, u (1) represents a 1-bit unsigned integer in the coding standard, ue (v) represents golomb code coding, information is added in a PPS parameter set, and whether the current code stream supports single-frame random access or not is identified, and the type of the current code stream is identified. The syntax elements have the following meanings:

single frame random access enable flag (single _ insert _ enabled _ flag): the value of 1 indicates that the code stream supports single-frame random access, and the value of 0 indicates that the code stream does not support single-frame random access.

Stream identification (stream _ id): this value exists when single _ insert _ enabled _ flag is 1.

When the value is 0, it indicates that the current code stream is a long GOP code stream/basic stream.

When the value is 1, the current code stream is indicated as a random access stream.

When the value is 2, the current code stream contains both long GOP code stream data and random access code stream data, and the long GOP code stream data corresponding to the same video frame content (the same PTS) is in front of the random access code stream data. Fig. 19 is a schematic diagram of arrangement forms of the first code stream and the second code stream when the stream identifier (stream _ id) provided in the embodiment of the present application is 2. For example, as shown in fig. 19, the first code stream may include a long GOP frame 1 and a long GOP frame 2, the second code stream may include a random access frame 1 and a random access frame 2, the long GOP frame 1 and the random access frame 1 are the same video frame content, the long GOP frame 2 and the random access frame 2 are the same video frame content, and the stream identification (stream _ id) in the PPS is 2, as shown in fig. 19, the long GOP frame 1 precedes the random access frame 1, and the long GOP frame 2 precedes the random access frame 2.

When the value is 3, the current code stream not only contains the data of the long GOP code stream, but also contains the data of the random access code stream, and the data of the long GOP code stream corresponding to the same video frame content (the same PTS) is behind the data of the random access code stream. Fig. 20 is a schematic diagram of arrangement forms of a first code stream and a second code stream when a stream identifier (stream _ id) provided in this embodiment is 3. For example, as shown in fig. 20, the first code stream may include a long GOP frame 1 and a long GOP frame 2, the second code stream may include a random access frame 1 and a random access frame 2, the long GOP frame 1 and the random access frame 1 are the same video frame content, the long GOP frame 2 and the random access frame 2 are the same video frame content, and the stream identification (stream _ id) in the PPS is 3, as shown in fig. 20, the long GOP frame 1 follows the random access frame 1, and the long GOP frame 2 follows the random access frame 2.

Alternatively, information such as a single frame random access enable flag (single _ insert _ enabled _ flag) and stream identification (stream _ id) may also be carried in the VPS or SPS.

Alternatively, a single frame random access enable flag (single _ insert _ enabled _ flag) and a stream identification (stream _ id) may be combined into a single syntax element, which when 0 indicates a general stream that does not support single frame random access; represents a long GOP stream when it is 1; when it is 2, a random access stream is indicated.

(2) Adding code stream identification information in slice _ segment _ header

The code stream receiving device (e.g., a decoding apparatus) may determine whether the received code stream supports a single frame random access function according to the header information of each slice, and distinguish whether the received code stream belongs to a long GOP stream or a basic stream. A carrying manner of the slice _ segment _ header is shown in table 2:

table 2 adding code stream identification information to slice _ segment _ header

In table 2, the syntax elements have the following meanings:

slice _ support _ single _ insert _ enable: the value of 1 indicates that the code stream supports single-frame random access, and the value of 0 indicates that the code stream does not support single-frame random access.

stream _ id: this value exists when slice _ support _ single _ insert _ enable is 1.

When the value is 0, the current code stream is a long GOP code stream;

when the value is 1, the current code stream is represented as a random access stream;

alternatively, slice _ support _ single _ insert _ enable and slice _ id may be combined into a single syntax element, which when 0 represents a general code stream that does not support single-frame random access; represents a long GOP stream when it is 1; when it is 2, a random access stream is indicated.

Optionally, in the storage or transmission process, the long GOP stream and the random access stream may be combined into one code stream in a binary cascade manner, and the long GOP stream or the random access stream is distinguished by using the above identification information. Fig. 21 is a combination manner of combining three first code streams and a second code stream into one code stream according to the embodiment of the present application. Fig. 21 (a) shows two code stream data arrangements in one code stream when the parameter sets of the long GOP stream and the random access stream are the same. In (b) of fig. 21, VPS1, SPS1, PPS1 indicate that the parameter set belongs to a long GOP stream, VPS2, SPS2, PPS2 indicate that the parameter set belongs to a random access stream, and the parameter set placed in front of each type of stream data can be directly decoded. In fig. 21 (c), the parameter sets of the two streams are arranged together, so that it is convenient to send the parameter sets in advance in some scenarios (such as DASH streaming), at this time, different values need to be set for the slice _ pic _ parameter _ set _ id in the slice _ segment _ header data of the long GOP stream and the random access stream, according to the standard protocol, the corresponding PPS parameter set can be found according to the slice _ pic _ parameter _ set _ id, and the PPS parameter set points to the corresponding SPS through the PPS _ seq _ parameter _ set _ id, so as to find the corresponding parameter set.

Alternatively, one stream combined by two streams may only contain one or more parameter sets of VPS, SPS or PPS in the long GOP stream or random access stream.

According to specific conditions (such as channel change, user request and the like), a receiving end or a transmitting end carries out alternative encapsulation, transmission, reception, decoding or display on code stream data corresponding to long GOP frames and random access frames with the same POC value in the code stream. If the scene needing random access does not occur, selecting code stream data of the long GOP frame for packaging, sending, receiving, decoding or displaying; and if a scene needing random access occurs, selecting code stream data corresponding to the random access frame for packaging, sending, receiving, decoding or displaying.

Optionally, if two code streams are combined into one code stream, the long GOP stream data (including parameter set) and the random access stream (including parameter set) may not carry the distinction identifier of the data type of the long GOP stream or the random access stream in the VPS, SPS, PPS, or slice _ segment _ header, and if a scene requiring random access does not occur, the code stream data of the long GOP frame may be selected to be packaged, transmitted, received, decoded, or displayed; if a scene needing random access occurs, judging whether the long GOP frame is a prediction block in the whole frame or not, if so, selecting code stream data of the long GOP frame for packaging, sending, receiving, decoding or displaying, and otherwise, selecting code stream data corresponding to the random access frame for packaging, sending, receiving, decoding or displaying.

(3) Carrying code stream identification information in auxiliary enhancement information (SEI)

TABLE 3 general SEI information syntax

Sub-stream concatenation SEI message syntax

In table 3, a new type 182 is added for the SEI type, which is used to indicate the single frame access information of the current bitstream, and information single _ picture _ info _ insert (payloadSize) is added. The syntax elements included have the following meanings:

single _ insert _ enabled _ flag: the value of 1 indicates that the code stream supports single-frame random access, and the value of 0 indicates that the code stream does not support single-frame random access.

stream _ id: this value exists when single _ insert _ enabled _ flag is 1.

When the value is 0, the current code stream is represented as a long GOP code stream/basic stream;

(4) Carrying code stream identification information in code stream packaging

And encapsulating each sub-code stream, wherein each sub-code stream can be independently encapsulated in a track, such as a sub-picture track. The syntax description information indicating whether the sub-stream can be spliced or not can be added to the sub-picture track, and the examples are as follows:

the following syntax is added to the spco box:

/>

the semantics are as follows:

track _ class: when the bit rate is 0, the bit rate indicates that the common bit stream does not support single-frame random access; a long GOP stream when it is 1; when it is 2, a random access stream is indicated.

(5) Adding description code stream identification information in file format

This embodiment adds description stream type information to a file format specified by an ISO base media file format (ISOBMFF). In the file format, for the long GOP stream and the random access stream, a Sample Entry Type: 'srad' is added in the video track. When the sample entry name is 'normal', indicating that the current video track is a general code stream which does not support single-frame random access; when the sample entry name is 'base', indicating a long GOP stream; when the sample entry name is 'insert', a random access stream is indicated.

(6) Carrying code stream identification information in file description information

The code stream identification information may be carried in file description information, for example, the code stream identification information is carried in a Media Presentation Description (MPD) file in a DASH protocol. This embodiment provides a sample of describing stream type information in MPD:

the new EssentialProperty property sranded @ value is specified in this example. The srande @ value attribute is described in Table 4.

TABLE 4 srand @ value attribute description in "urn: mpeg: dash: srandj2014

Syntax element semantics are as follows:

file _ class, which represents a general code stream that does not support single-frame random access when the file _ class is 0; a long GOP stream when it is 1; when it is 2, a random access stream is indicated.

(7) Carrying code stream identification information in user-defined message

The code stream may be sent in a self-defined TLV (type, length, value) message mode, and at this time, the type may carry code stream identification information.

For example, the TLV message may include a type (type) field, a length (length) field, and a payload (payload) field. Type (8 bits): data type, length (32 bits): payload length, payload (indefinite length): and code stream data.

TABLE 5 different types (types) with different loads (payload)

Type	Semantics	Payload
			0x00	General code stream not supporting single frame random access	General code stream data
0x01	Long GOP stream	Long GOP code stream data
			0x02	Random access streams	Random access stream data
Others	Retention	Code streams or other data

Therefore, the encoding end in the embodiment of the present application may carry the identification information of the first code stream and the second code stream in the code streams, or carry the identification information in the encapsulation layer, or carry the identification information in the transport protocol layer, etc., so that the decoding end distinguishes the first code stream and the second code stream based on the identification information, and correctly decodes the first code stream and the second code stream to obtain the video content.

The method for processing a video image according to the embodiment of the present application is described in detail with reference to the drawings, and the apparatus for processing a video image according to the embodiment of the present application is described with reference to fig. 22. It should be understood that the processing apparatus for video images is capable of executing the processing method for video images of the embodiments of the present application. In order to avoid unnecessary repetition, the description of repetition is appropriately omitted below when describing the video image processing apparatus of the embodiment of the present application.

Referring to fig. 22, fig. 22 is a schematic structural diagram of a video image processing apparatus according to an embodiment of the present disclosure. As shown in fig. 22, the video image processing apparatus 2200 may include: an acquisition module 2201, a first encoding module 2202, and a second encoding module 2203.

An obtaining module 2201, configured to obtain an image to be encoded. The first encoding module 2202 is configured to perform first encoding on an image to be encoded to generate a first code stream. The second encoding module 2203 is configured to perform, according to the encoding information of the first encoding, second encoding in the full frame intra prediction mode on the image to be encoded or the first reconstructed image to generate a second code stream, where the first reconstructed image is the first code stream or a reconstructed image in the first encoding process.

In some embodiments, the coding information of the first coding includes one or more of a partitioning manner of the first coding, a quantization parameter of the first coding, and coding distortion information of the first coding.

In some embodiments, the second encoding module 2203 is configured to perform at least one of: performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting a division mode same as the first coding; or, carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting the quantization parameter same as the first coding; or, according to the coding distortion information of the first coding, determining a quantization parameter of a second coding, and according to the quantization parameter of the second coding, performing second coding in a full-frame intra prediction mode on the image to be coded or the first reconstructed image.

In some embodiments, the second encoding module 2203 is configured to: determining a quantization parameter of a second code according to the coding information of the first code and the characteristic information of the image to be coded; and according to the quantization parameter of the second coding, carrying out second coding in the full-frame intra-prediction mode on the image to be coded or the first reconstructed image.

In some embodiments, the feature information of the image to be encoded includes one or more of content complexity of the image to be encoded, color classification information of the image to be encoded, contrast information of the image to be encoded, and content segmentation information of the image to be encoded.

In some embodiments, the second encoding module 2203 is configured to determine at least one of a first partition manner or a first encoding parameter used for performing the second encoding on the image to be encoded or the first reconstructed image according to the encoding information of the first encoding and the first reconstructed image. The second encoding module 2203 is further configured to perform a second encoding in the full intra prediction mode on the image to be encoded or the first reconstructed image according to at least one of the first partition manner or the first encoding parameter.

In some embodiments, a difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, and the second reconstructed image is a second code stream or a reconstructed image in a second encoding process.

In some embodiments, the second encoding module 2203 is configured to: determining a plurality of second division modes according to the first coded coding information and the first reconstructed image, and selecting one second division mode from the plurality of second division modes as a first division mode; and/or determining a plurality of second coding parameters according to the first coded coding information and the first reconstructed image, and selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter.

In some embodiments, the second encoding module 2203 is further configured to: a prediction mode of the first encoding is obtained. And when the prediction mode of the first coding is inter-frame prediction, executing the steps of acquiring coding information of the first coding, and performing second coding of a full-frame prediction mode on the image to be coded or the first reconstructed image according to the coding information of the first coding so as to generate a second code stream. And when the prediction mode of the first code is intra-frame prediction, taking the first code stream as a second code stream.

In some embodiments, the image to be encoded is a source video image; or the image to be coded is an image block obtained by dividing the source video image.

It should be noted that the video image processing apparatus 2200 may execute any one of fig. 9 to 12, or any one of the methods of the encoding apparatus shown in fig. 16 to 21. For specific implementation principles and technical effects, reference may be made to the specific explanations of the above method embodiments, which are not described herein again.

The embodiment of the present application also provides another video image processing apparatus, which has the same structure as the processing apparatus shown in fig. 22. The acquisition module is used for acquiring at least one first image to be coded and a second image to be coded, wherein the second image to be coded is a video image before the at least one first image to be coded. The first coding module is used for respectively carrying out first coding on at least one first image to be coded so as to generate a first code stream. And the second coding module is used for determining at least one of a first division mode or a first coding parameter adopted for carrying out second coding on a second image to be coded according to at least one first reconstructed image, wherein the at least one first reconstructed image is a first code stream or a reconstructed image in a first coding process. And the second coding module is further used for carrying out second coding on the second image to be coded according to at least one of the first division mode or the first coding parameter so as to generate a second code stream.

In some embodiments, the number of the at least one first to-be-encoded image is one, the number of the at least one first reconstructed image is one, a difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold, or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, the second reconstructed image is obtained by decoding the first code stream using the third reconstructed image as a reference image, and the third reconstructed image is the second code stream or a reconstructed image in the second encoding process.

In some embodiments, the number of the at least one first to-be-encoded image is one, the number of the at least one first reconstructed image is one, and the second encoding module is configured to: selecting one second division mode from a plurality of second division modes as a first division mode according to the first reconstructed image; and/or selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter according to the first reconstructed image.

The similarity between the first reconstructed image and the second reconstructed image is the highest similarity between the first reconstructed image and a plurality of fourth reconstructed images, the plurality of fourth reconstructed images comprise the second reconstructed image, the plurality of fourth reconstructed images are obtained by decoding a first code stream by taking a plurality of fifth reconstructed images as reference images respectively, the plurality of fifth reconstructed images are reconstructed images of a plurality of third code streams, the plurality of third code streams are obtained by performing secondary encoding on the second image to be encoded for a plurality of times respectively according to a plurality of second division modes and/or a plurality of second encoding parameters, or the plurality of fifth reconstructed images are reconstructed images obtained by performing secondary encoding on the second image to be encoded for a plurality of times respectively according to a plurality of second division modes and/or a plurality of second encoding parameters.

In some embodiments, the first encoding module is further to: before the first coding is respectively carried out on at least one first image to be coded, the first coding is carried out on a second image to be coded so as to generate a fourth code stream. The second encoding module is further to: a prediction mode for the first encoding is obtained. And when the prediction mode of the first encoding is inter-frame prediction, determining at least one of a first partition mode or a first encoding parameter adopted by second encoding of the second image to be encoded according to at least one first reconstructed image. And when the prediction mode of the first code is intra-frame prediction, taking the fourth code stream as the second code stream.

In some embodiments, the at least one first image to be encoded is at least one first source video image and the second image to be encoded is a second source video image.

It should be noted that the processing apparatus for video images may execute the method of the encoding apparatus in any one of fig. 13 to fig. 15. For specific implementation principles and technical effects, reference may be made to the specific explanations of the above method embodiments, which are not described herein again.

Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or any communication medium including a medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described herein. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described herein to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by interoperating hardware units (including one or more processors as described above).

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for processing video images, comprising:

acquiring an image to be coded;

performing first coding on the image to be coded to generate a first code stream;

and performing second coding in a full-frame intra-prediction mode on the image to be coded or a first reconstructed image according to the coding information of the first coding so as to generate a second code stream, wherein the first reconstructed image is the first code stream or a reconstructed image in the first coding process.

2. The method of claim 1, wherein the first encoded coding information comprises one or more of a partition type of the first encoding, a quantization parameter of the first encoding, and coding distortion information of the first encoding.

3. The method according to claim 2, wherein said second encoding of the image to be encoded or the first reconstructed image in the full intra prediction mode according to the encoding information of the first encoding comprises at least one of:

performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image in a same division mode as the first coding; or,

performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by using the same quantization parameter as the first coding; or,

determining the quantization parameter of the second coding according to the quantization parameter and the quantization parameter offset of the first coding, and performing second coding in a full intra prediction mode on the image to be coded or the first reconstructed image according to the quantization parameter of the second coding; or,

and determining the quantization parameter of the second coding according to the coding distortion information of the first coding, and performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to the quantization parameter of the second coding.

4. The method according to claim 1 or 2, wherein said second encoding in full-frame intra prediction mode for the image to be encoded or the first reconstructed image according to the encoding information of the first encoding comprises:

Determining the quantization parameter of the second code according to the coding information of the first code and the characteristic information of the image to be coded;

and carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to the quantization parameter of the second coding.

5. The method according to claim 4, wherein the feature information of the image to be encoded comprises one or more of content complexity of the image to be encoded, color classification information of the image to be encoded, contrast information of the image to be encoded and content segmentation information of the image to be encoded.

6. The method according to claim 1 or 2, wherein said second encoding in full-frame intra prediction mode for the image to be encoded or the first reconstructed image according to the encoding information of the first encoding comprises:

determining at least one of a first partition mode or a first coding parameter for carrying out second coding on the image to be coded or the first reconstructed image according to the coding information of the first coding and the first reconstructed image;

and performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to at least one of the first partition mode or the first coding parameter.

7. The method according to claim 6, wherein a difference between the first reconstructed image and a second reconstructed image is smaller than a difference threshold or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, and the second reconstructed image is the second code stream or a reconstructed image in the second encoding process.

8. The method according to claim 6, wherein determining at least one of a first partition manner or a first coding parameter for second coding of the image to be coded or the first reconstructed image according to the first coded coding information and the first reconstructed image comprises:

determining a plurality of second division modes according to the first coded coding information and the first reconstructed image, and selecting one second division mode from the plurality of second division modes as the first division mode; and/or the presence of a gas in the atmosphere,

determining a plurality of second encoding parameters according to the first encoded encoding information and the first reconstructed image, and selecting one second encoding parameter from the plurality of second encoding parameters as the first encoding parameter;

the similarity between the first reconstructed image and the second reconstructed image is the highest similarity between the first reconstructed image and a plurality of third reconstructed images, the plurality of third reconstructed images include the second reconstructed image, the plurality of third reconstructed images are reconstructed images obtained by performing multiple second encoding on the image to be encoded or the first reconstructed image according to the plurality of second division modes and/or the plurality of second encoding parameters, respectively, or the plurality of third reconstructed images are reconstructed images of a plurality of third code streams, and the plurality of third code streams are obtained by performing multiple second encoding on the image to be encoded or the first reconstructed image according to the plurality of second division modes and/or the plurality of second encoding parameters, respectively.

9. The method of any of claims 6-8, wherein the first encoding parameter comprises a quantization parameter or a code rate.

10. The method according to any one of claims 1-9, further comprising:

obtaining a prediction mode of the first code;

when the prediction mode of the first coding is inter-frame prediction, executing the step of obtaining the coding information of the first coding, and performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to the coding information of the first coding so as to generate a second code stream;

and when the prediction mode of the first code is intra-frame prediction, taking the first code stream as the second code stream.

11. The method according to any one of claims 1 to 10, wherein the image to be encoded is a source video image, or wherein the image to be encoded is an image block obtained by dividing the source video image.

12. A method for processing video images, comprising:

acquiring at least one first image to be coded and a second image to be coded, wherein the second image to be coded is a video image before the at least one first image to be coded;

Respectively carrying out first coding on the at least one first image to be coded to generate a first code stream;

determining at least one of a first partition mode and a first coding parameter adopted for carrying out second coding on the second image to be coded according to at least one first reconstructed image, wherein the at least one first reconstructed image is the first code stream or a reconstructed image in the first coding process;

and performing the second coding of the full-frame intra-prediction mode on the second image to be coded according to at least one of the first division mode or the first coding parameter to generate a second code stream.

13. The method according to claim 12, wherein the number of the at least one first image to be encoded is one, the number of the at least one first reconstructed image is one, and a difference between the first reconstructed image and a second reconstructed image is smaller than a difference threshold, or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, the second reconstructed image is obtained by decoding the first code stream using a third reconstructed image as a reference image, and the third reconstructed image is the second code stream or a reconstructed image in the second encoding process.

14. The method according to claim 12, wherein the number of the at least one first to-be-encoded image is one, the number of the at least one first reconstructed image is one, and the determining at least one of the first partition method or the first encoding parameter for performing the second encoding on the second to-be-encoded image according to the at least one first reconstructed image comprises:

selecting one second division mode from a plurality of second division modes as the first division mode according to the first reconstructed image; and/or the presence of a gas in the gas,

selecting one second coding parameter from a plurality of second coding parameters as the first coding parameter according to the first reconstructed image;

the similarity between the first reconstructed image and the second reconstructed image is the highest similarity between the first reconstructed image and a plurality of fourth reconstructed images, the plurality of fourth reconstructed images include the second reconstructed image, the plurality of fourth reconstructed images are obtained by decoding the first code stream with a plurality of fifth reconstructed images as reference images respectively, the plurality of fifth reconstructed images are reconstructed images of a plurality of third code streams, and the plurality of third code streams are reconstructed images obtained by performing a plurality of times of second encoding on the second image to be encoded respectively according to the plurality of second division modes and/or the plurality of second encoding parameters, or the plurality of fifth reconstructed images are reconstructed images obtained by performing a plurality of times of second encoding on the second image to be encoded respectively according to the plurality of second division modes and/or the plurality of second encoding parameters.

15. The method according to any of claims 12-14, wherein prior to the first encoding of the at least one first image to be encoded, respectively, the method further comprises:

performing the first coding on the second image to be coded to generate a fourth code stream;

obtaining a prediction mode of the first code;

when the prediction mode of the first encoding is inter-frame prediction, the step of determining at least one of a first partition mode or a first encoding parameter adopted by second encoding of the second image to be encoded according to at least one first reconstructed image is executed;

and when the prediction mode of the first code is intra-frame prediction, taking the fourth code stream as the second code stream.

16. The method according to any of claims 12-15, wherein said at least one first image to be encoded is at least one first source video image and said second image to be encoded is a second source video image.

17. The method of any of claims 12-16, wherein the first encoding parameter comprises a quantization parameter or a code rate.

18. A video image processing apparatus, comprising:

The acquisition module is used for acquiring an image to be coded;

the first coding module is used for carrying out first coding on the image to be coded so as to generate a first code stream;

and the second coding module is used for carrying out second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to the coding information of the first coding so as to generate a second code stream, wherein the first reconstructed image is the first code stream or a reconstructed image in the first coding process.

19. The apparatus of claim 18, wherein the first encoded coding information comprises one or more of a partition type of the first encoding, a quantization parameter of the first encoding, and coding distortion information of the first encoding.

20. The apparatus of claim 19, wherein the second encoding module is configured to perform at least one of:

performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by adopting a division mode same as the first coding; or,

performing second coding in a full-frame intra-prediction mode on the image to be coded or the first reconstructed image by using the quantization parameter same as the first coding; or,

21. The apparatus of claim 18 or 19, wherein the second encoding module is configured to:

22. The apparatus according to claim 21, wherein the feature information of the image to be encoded includes one or more of content complexity of the image to be encoded, color classification information of the image to be encoded, contrast information of the image to be encoded, and content segmentation information of the image to be encoded.

23. The apparatus according to claim 18 or 19, wherein the second encoding module is configured to determine at least one of a first partition manner and a first encoding parameter for second encoding of the image to be encoded or the first reconstructed image according to the first encoded encoding information and the first reconstructed image;

the second encoding module is further configured to perform second encoding in a full intra prediction mode on the image to be encoded or the first reconstructed image according to at least one of the first partition manner or the first encoding parameter.

24. The apparatus of claim 23, wherein a difference between the first reconstructed image and the second reconstructed image is smaller than a difference threshold or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, and the second reconstructed image is the second code stream or the reconstructed image in the second encoding process.

25. The apparatus of claim 23, wherein the second encoding module is configured to:

Determining a plurality of second coding parameters according to the coding information of the first coding and the first reconstructed image, and selecting one second coding parameter from the plurality of second coding parameters as the first coding parameter;

26. The apparatus of any of claims 23-25, wherein the first encoding parameter comprises a quantization parameter or a code rate.

27. The apparatus of any one of claims 18-26, wherein the second encoding module is further configured to:

obtaining a prediction mode of the first code;

when the prediction mode of the first coding is inter-frame prediction, executing the step of obtaining the coding information of the first coding, and performing second coding of a full-frame intra-prediction mode on the image to be coded or the first reconstructed image according to the coding information of the first coding to generate a second code stream;

28. The apparatus according to any of claims 18-27, wherein the image to be encoded is a source video image, or wherein the image to be encoded is an image block obtained by dividing the source video image.

29. A video image processing apparatus, comprising:

the device comprises an acquisition module, a coding module and a decoding module, wherein the acquisition module is used for acquiring at least one first image to be coded and a second image to be coded, and the second image to be coded is a video image before the at least one first image to be coded;

the first coding module is used for respectively carrying out first coding on the at least one first image to be coded so as to generate a first code stream;

A second encoding module, configured to determine, according to at least one first reconstructed image, at least one of a first partition manner and a first encoding parameter used to perform second encoding on the second image to be encoded, where the at least one first reconstructed image is the first code stream or a reconstructed image in the first encoding process;

the second encoding module is further configured to perform the second encoding in the full intra prediction mode on the second image to be encoded according to at least one of the first partition manner or the first encoding parameter, so as to generate a second code stream.

30. The apparatus according to claim 29, wherein the at least one first image to be encoded is one in number, the at least one first reconstructed image is one in number, a difference between the first reconstructed image and a second reconstructed image is smaller than a difference threshold, or a similarity between the first reconstructed image and the second reconstructed image is higher than a similarity threshold, the second reconstructed image is obtained by decoding the first codestream using a third reconstructed image as a reference image, and the third reconstructed image is the second codestream or a reconstructed image in the second encoding process.

31. The apparatus according to claim 29, wherein there is one number of the at least one first image to be encoded and one number of the at least one first reconstructed image, and wherein the second encoding module is configured to:

32. The apparatus of any one of claims 29-31, wherein the first encoding module is further configured to: before the at least one first image to be coded is subjected to first coding, the second image to be coded is subjected to the first coding to generate a fourth code stream;

the second encoding module is further to: obtaining a prediction mode of the first code;

33. The apparatus according to any of claims 29-32, wherein said at least one first image to be encoded is at least one first source video image and said second image to be encoded is a second source video image.

34. The apparatus of any of claims 29-33, wherein the first encoding parameter comprises a quantization parameter or a code rate.

35. A video image processing apparatus, comprising:

One or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-17.

36. A computer-readable storage medium comprising a first codestream and a second codestream obtained according to the method of any one of claims 1-17.

37. A computer program product, characterized in that it causes a computer to carry out the method of processing video images according to any one of claims 1 to 17, when the computer program product is run on the computer.

38. A computer-readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the method of processing video images of any of claims 1-17.