CN115988202B

CN115988202B - Apparatus and method for intra prediction

Info

Publication number: CN115988202B
Application number: CN202211595363.5A
Authority: CN
Inventors: 阿列克谢·康斯坦丁诺维奇·菲利波夫; 瓦西里·亚历斯维奇·拉夫特斯基
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2023-11-03
Anticipated expiration: 2038-06-29
Also published as: AU2018429284B2; JP2021528919A; US20210120241A1; NZ771864A; CN115988202A; KR20230127354A; CA3104611A1; AU2023201715A1; US11563939B2; SG11202012838SA; CA3104611C; AU2018429284A1; BR112020026879A2; CN112262574A; JP2023073286A; JP7293263B2; KR20210015963A; EP3804313A1; MX2021000168A; WO2020005093A1

Abstract

The present application relates to the field of image processing, for example, still image (picture/image) and/or video image (picture/image) coding. In particular, the present application relates to an apparatus and a corresponding method for intra prediction of a prediction block of a video image. The apparatus is for selecting a directional intra-prediction mode from a set of directional intra-prediction modes, wherein each directional intra-prediction mode corresponds to a different intra-prediction angle. The apparatus is also configured to select a filter from a set of filters based on the selected directional intra-prediction mode. The apparatus is also operative to determine reference pixels for a given prediction pixel of the prediction block from a set of reference pixels according to the selected directional intra prediction mode, and apply the selected filter to the determined reference pixels.

Description

Apparatus and method for intra prediction

Cross reference to related applications

The present application is a divisional application, the application number of the original application is 201880094557.2, the original application date is 2018, 06 and 29, and the whole content of the original application is incorporated by reference.

Technical Field

Embodiments of the present invention relate to the field of image processing, for example, still image (picture/image) and/or video image (picture/image) encoding. In particular, the present invention relates to an apparatus for intra prediction, i.e. for intra predicting a prediction block of a video image. The device may be part of a video image encoder or a video image decoder. The apparatus is particularly adapted to perform directional intra-prediction of a prediction block. The invention also relates to a corresponding intra prediction method.

Background

Video coding (video encoding and decoding) is widely used in digital video applications such as broadcast digital television, video transmission via the internet and mobile networks, real-time session applications such as video chat, video conferencing, DVD and blu-ray discs, video content acquisition and editing systems, and security applications for camcorders.

Since the hybrid video coding method based on blocks in the h.261 standard developed in 1990, new video coding techniques and tools have been developed, which lays a foundation for new video coding standards. One of the goals of most video coding standards is to reduce the bit rate while guaranteeing image quality, compared to previous generation standards. Other video coding standards include MPEG-1 video, MPEG-2 video, ITU-T H.262/MPEG-2, ITU-T H.263, ITU-T H.264/MPEG-4, section 10, advanced video coding (Advanced Video Coding, AVC), ITU-T H.265, high efficiency video coding (High Efficiency Video Coding, HEVC), and extensions of these standards, such as scalability and/or three-dimensional (3D) extensions.

Video compression can reduce the bit rate as desired, but is complex. In particular, video compression is limited by two contradictory parameters: compression efficiency and computational complexity. Video coding standards such as ITU-T h.264/AVC or ITU-T h.265/HEVC make a good trade-off between these parameters. Thus, almost all video compression applications require that video coding standards must be supported.

The most advanced video coding standard is based on dividing the source image into blocks. The processing of these blocks depends on their size, spatial position and the coding mode specified by the encoder.

Depending on the prediction type, the coding modes can be divided into two groups: intra prediction mode and inter prediction mode. The intra prediction mode uses pixels of the same image to generate reference pixels to calculate predicted values of pixels of the block being reconstructed. Intra prediction is also referred to as spatial prediction. Inter prediction modes are used for temporal prediction, using reference pixels of a previous or subsequent image to predict pixels of a block of the current image.

After the prediction phase, the prediction error (the difference between the original signal and its predicted value) is transform coded. The transform coefficients and side information are then encoded using entropy encoders (e.g., CABAC in AVC/h.264 and HEVC/h.265 standards). Recently adopted ITU-T h.265/HEVC standard (ISO/IEC 23008-2:2013, section 2 efficient coding and media transmission in information technology-heterogeneous environments) efficient video coding (month 11 of 2013) announces a set of most advanced video coding tools that reasonably trade-off coding efficiency and computational complexity.

Similar to the ITU-T h.264/AVC video coding standard, the HEVC/h.265 video coding standard specifies dividing a source image into blocks (e.g., coding Units (CUs)). Each CU may be further divided into smaller CUs or Prediction Units (PUs). The PU may be intra-or inter-predicted depending on the type of processing applied to the pixels of the PU. For inter prediction, a PU represents an area of pixels that are processed by motion compensation using a motion vector specified for the PU. For intra prediction, a current block is predicted using neighboring pixels of neighboring blocks as reference pixels.

A PU specifies a prediction mode selected from a set of intra prediction modes for all Transform Units (TUs) contained in the PU. That is, the intra prediction mode is the same for each TU of the PU. TUs may have different sizes (e.g., 4 x 4, 8 x 8, 16 x 16, and 32 x 32 pixels) and the manner of processing may be different. The TUs are transform coded, i.e. the prediction errors are transformed and quantized by discrete cosine transform or discrete sine transform (applied to intra coded blocks in the HEVC/h.265 standard). Thus, the reconstructed pixel contains quantization noise that the in-loop filter such as DBF, SAO, ALF tries to suppress (quantization noise may become as noticeable as blocky structures between cells, ringing artifacts along sharp edges, etc.). Advanced predictive coding (e.g., motion compensation and intra prediction) and partitioning techniques (e.g., quad-Tree (QT) for CU and PU and Residual Quad-Tree (RQT) for TU) in the HEVC/h.265 standard, and a joint exploration model (Joint Exploration Model, JEM) starting from JEM-3.0 version refer to the software Quad-Tree joint binary Tree (Quad-Tree and Binary Tree, QTBT) so that the standardization committee can greatly reduce redundancy of PU. The basic difference between QT and QTBT partitioning mechanisms is that the latter partition based on quadtrees and binary trees, so that not only square blocks are supported, but also rectangular blocks are supported. The present invention relates to directional intra prediction and introduces new modifications of the directional intra prediction mode.

There are 35 available intra prediction modes according to the HEVC/h.265 standard. As shown in fig. 9, the set contains the following patterns:

plane mode (intra prediction mode index 0);

DC mode (intra prediction mode index 1);

the directional mode indicated by the solid arrow in fig. 9 (intra prediction mode index values are 2 to 34). The directional intra-prediction mode set is extended to 65 modes (i.e., nearly twice) by reducing the angular step between the directional intra-prediction modes by a factor of 2. These additional modes are shown by the dashed arrows in fig. 9.

A new partitioning mechanism called QTBT is proposed for JEM-3.0 software. As shown in fig. 10, not only square blocks but also rectangular blocks can be provided by QTBT division. Of course, QTBT partitioning is costly in terms of adding some signaling overhead and computational complexity on the encoder side, compared to the partitioning-based traditional QT used in, for example, the HEVC/h.265 standard. However, QTBT-based partitioning has better partitioning characteristics, and thus, the coding efficiency of QTBT is much higher than that of conventional QT.

However, when QTBT is introduced, the set of available directional intra-prediction modes does not change accordingly. As shown in fig. 11, in particular, asymmetry of the rectangular block is not considered. Thus, the same number of reference pixels is used on both the shorter and longer sides of the rectangular block. In the current implementation of the QTBT framework, the number of directional intra-prediction modes is not dependent on the aspect ratio of the block nor on the actual availability of reference pixels. Therefore, it is almost impossible to use the reference pixels for the short sides of the rectangular block, and it is also possible to not use the reference pixels for the long sides of the rectangular block.

Notably, as shown in fig. 12, herein, the terms "vertical direction block" ("vertical direction of block") and "horizontal direction block" ("horizontal direction of block") apply to rectangular blocks generated by QTBT framework. Fig. 12 shows in particular (a) a horizontal block and (b) a vertical block.

In the document jfet-D0113 it is further proposed to apply a mechanism in which the number of directional intra prediction modes is adjustable. Specifically, it is proposed to further increase the number of directional intra-prediction modes to 131 for larger block sizes, while reducing the number of directional intra-prediction modes for smaller block sizes. The number of directional intra-prediction modes based on block size is switched by two thresholds, represented in SPS by log2 value minus 4 and log2 value minus 6, respectively. Wherein the first threshold value indicates a maximum block size of 35 intra prediction mode directions, the second threshold value indicates a maximum block size of 67 intra prediction mode directions, and 131 intra prediction mode directions are used for the other blocks. In the default setting, thresholds of 4 and 6 are signaled, respectively, and for high resolution images, the thresholds are set to 5 and 8.

In this implementation, the directional intra-prediction mode index is always represented by the mode range 131, regardless of the number of directional intra-prediction modes actually used. For 67 intra prediction modes actually being used, only one intra prediction mode is allowed to be used for every two angle (directivity) modes; and for 35 modes, only one intra prediction mode per four angle (directional) modes is allowed. Accordingly, in signaling the intra prediction mode, if the intra prediction mode direction used by the current block is less than 131, it may be necessary to round the intra prediction mode of the neighboring block to the nearest, second or fourth angle intra prediction mode, as shown in fig. 13. This conversion is accomplished by applying left and right shifts 1 or 2 to the intra prediction mode. If the mode is not MPM, the signaling mode follows the same procedure as in JEM-3.0, but with a different number of intra prediction modes. The planar mode and the DC mode remain unchanged and no mode conversion is required. To accommodate the increase in the number of intra prediction modes, the 4-tap intra filter is extended from 1/32 fractional pixels to 1/64 fractional pixels.

In addition, a technique has recently been proposed to solve the problem of how many directional intra-prediction modes should be included in the intra-prediction modes set for the rectangular block. As shown in fig. 14, according to the proposed technique, a directional intra-prediction mode set may be extended according to an aspect ratio of a prediction block, and may be signaled by mapping an increased directional intra-prediction mode to a legacy subset.

In view of this, fig. 15 shows a case of intra prediction in the diagonal direction, the angle of which is equal to 45 ° associated with the directional intra prediction mode. In this case, the corresponding HEVC intra mode indexes are 2 (from bottom left) and 35 (from top right).

However, if a similar intra prediction mechanism is applied to an angle smaller than 45 °, i.e., for an extended directional intra prediction mode, the case is as shown in fig. 16. That is, when the intra-prediction direction is designated as an acute angle direction (i.e., less than 45 °), a significant discontinuity can be observed in the prediction process. These discontinuities are caused, inter alia, in that the difference between the reference pixel positions between two adjacent predicted pixel rows may become larger than one reference pixel. The problem relates to a method of processing reference pixels and a method of performing intra prediction interpolation.

Disclosure of Invention

In view of the above embodiments, the present invention aims to further improve hybrid video coding. In particular, it is an object of the present invention to provide an apparatus and method for improving intra prediction of a prediction block of a video image. In particular, the present invention aims to obtain additional coding gain without increasing hardware and computational complexity. In particular, the present invention aims to overcome the above-mentioned problems in the case of acute angles smaller than 45 °, i.e. the present invention aims to solve the discontinuities caused by these acute angles. The present invention should be easy to implement in a codec using conventional directional intra-prediction mechanisms.

The object of the invention is achieved according to embodiments of the invention defined by the features of the independent claims. Further advantageous embodiments of the embodiments are defined by the features of the dependent claims.

In particular, the present invention proposes to reduce the discontinuity by extending the filter length for intra-prediction acute angles (i.e. less than 45 °). The scheme is mainly suitable for rectangular blocks generated by the partition frames such as QTBT, MTT and the like.

A first aspect of the present invention provides an apparatus for intra-predicting a predicted block of a video image, the apparatus for selecting a directional intra-prediction mode from a set of directional intra-prediction modes, wherein each directional intra-prediction mode corresponds to a different intra-prediction angle; selecting a filter from a set of filters according to the selected directional intra-prediction mode; determining a reference pixel for a given prediction pixel of the prediction block from a set of reference pixels according to the selected directional intra-prediction mode; and applying the selected filter to the determined reference pixel.

The device according to the first aspect has the following advantages:

additional coding gain may be obtained.

Many potential applications that can be used in hybrid video coding paradigms that are compatible with the HM software and VPX video codec families of the latest and next generation video coding frameworks, respectively, and with the JEM and VTM software and VPX/AV1 video codec systems of the latest and next generation video coding frameworks, respectively.

Hardware and computational complexity are low.

The device is easy to implement in a codec using conventional directional intra prediction mechanisms.

In particular, by selecting the filter length according to the angle, the above-described problem of an intra prediction acute angle of less than 45 ° can be overcome. If the distance between two reference pixels for intra-predicting two neighboring prediction pixels becomes large such that the two reference pixels are no longer neighboring, a larger filter length is selected to avoid discontinuity. For angles above 45 deg., where the reference pixels are adjacent to each other, a smaller filter length may be selected to preserve detail.

Notably, the prediction block may be a TU or PU. The apparatus is for processing each prediction pixel in the prediction block as described for a given prediction pixel. Thus, the apparatus is for performing intra prediction of an entire prediction block in a video image. A pixel point is the intersection of a channel and a pixel in a video image. For example, each pixel of a video image may include three pixel points of red, green, and blue.

In an implementation manner of the first aspect, the apparatus is configured to: determining a filter length according to the selected directional intra-prediction mode; a filter having at least the determined filter length is selected as the filter.

The device thus ensures that the filter length is in each case long enough to avoid discontinuities.

In another implementation of the first aspect, the filter set comprises filters having different filter lengths, in particular filters having filter lengths spanning 1, 3 or 5 adjacent reference pixels.

In another implementation of the first aspect, each filter in the set of filters performs a different smoothing operation on the determined reference pixel and one or more neighboring reference pixels when applied to the determined reference pixel.

For smaller acute angles, a higher smoothness may be selected by selecting the filter accordingly, e.g., smoothing more adjacent reference pixels, while for angles that are not too small (or are not acute), a lower smoothness may be selected, e.g., smoothing fewer reference pixels.

In another implementation manner of the first aspect, the apparatus is configured to: determining the intra-prediction angle corresponding to the selected directional intra-prediction mode; the filter is selected according to the determined intra-prediction angle.

Accordingly, an optimal filter may be selected for each angle of intra prediction.

In another implementation manner of the first aspect, the apparatus is configured to: determining the intra-prediction angle corresponding to the selected directional intra-prediction mode; designating another reference pixel for another prediction pixel of the prediction block from the reference pixel set according to the selected directional intra-prediction mode; determining a distance between the determined reference pixel and the other reference pixel; and selecting the filter according to the determined distance.

The apparatus may be configured to execute a filter selection algorithm with the selected directional intra-prediction mode as an input to obtain an intra-prediction angle as an output. The device may determine the intra-prediction angle from an index of the selected directional intra-prediction mode. Further, the apparatus may be configured to determine the angle based on an aspect ratio of the predicted block.

The further reference pixel may be assigned to the further prediction pixel in the same way as the determined reference pixel is determined for a given prediction pixel, in particular according to the intra prediction direction, i.e. the intra prediction angle, of the selected mode. The distance between the determined reference pixel and the other reference pixel may be derived from the distance between the given predicted pixel and the other predicted pixel in the prediction block and an intra prediction angle associated with the selected mode. The distance may be determined based on an integer or fraction of the reference pixels.

If the determined distance is smaller, a filter with a longer filter length may be selected, and if the determined distance is larger, a filter with a shorter filter length may be selected. In particular, a filter having a filter length of at least the determined distance may be selected. If there are no selectable filters having a filter length of at least the determined distance, a filter may be selected from the filter set having the largest filter length.

Thus, the filter may be selected such that the distance between the reference pixels does not lead to discontinuities after intra prediction.

In another implementation manner of the first aspect, the apparatus is configured to: selecting the same filter for each directional intra-prediction mode selected from the first subset of directional intra-prediction modes; a different filter is selected for each directional intra-prediction mode selected from the second subset of directional intra-prediction modes.

For example, the first subset may include directional intra-prediction modes associated with intra-prediction angles of 45 ° and greater than 45 °, and the second subset may include directional intra-prediction modes associated with intra-prediction angles of less than 45 °.

In another implementation manner of the first aspect, the apparatus is configured to: intra-predicting the given predicted pixel directly from the determined reference pixel, wherein the apparatus is configured to apply the selected filter to the determined reference pixel before or during the intra-predicting the given predicted pixel.

In another implementation manner of the first aspect, the apparatus is configured to: generating transposed reference pixels by interpolating the determined reference pixels according to the selected intra-prediction mode; intra-predicting the given predicted pixel from the transposed reference pixel, wherein the apparatus is configured to apply the selected filter to the determined reference pixel prior to or during generation of the transposed reference pixel.

In another implementation manner of the first aspect, the apparatus is configured to: each reference pixel in the set of reference pixels is transposed, wherein a row of reference pixels becomes a column of transposed reference pixels and a column of reference pixels becomes a row of transposed reference pixels.

In another implementation of the first aspect, the reference pixels in the set of reference pixels are arranged in rows in the video image adjacent to upper and right-upper positions of the prediction block and/or in columns in the video image adjacent to left-side and upper-left positions of the prediction block.

In another implementation of the first aspect, the apparatus is used for encoding and/or decoding the video image or the apparatus is a video encoder and/or a video decoder.

For example, the apparatus of the first aspect may be included in or may be an intra prediction unit of an encoder or decoder.

A second aspect of the present invention provides a method for intra-predicting a prediction block of a video image, the method comprising: selecting a directional intra-prediction mode from a set of directional intra-prediction modes, wherein each directional intra-prediction mode corresponds to a different intra-prediction angle; wherein each directional intra-prediction mode corresponds to a different intra-prediction angle; selecting a filter from a set of filters according to the selected directional intra-prediction mode; determining a reference pixel for a given prediction pixel of the prediction block from a set of reference pixels according to the selected directional intra-prediction mode; the selected filter is applied to the determined reference pixel.

In one implementation manner of the second aspect, the method includes: determining a filter length according to the selected directional intra-prediction mode; a filter having at least the determined filter length is selected as the filter.

In another implementation of the second aspect, the filter set comprises filters having different filter lengths, in particular filters having filter lengths spanning 1, 3 or 5 adjacent reference pixels.

In another implementation of the second aspect, each filter in the set of filters performs a different smoothing operation on the determined reference pixel and one or more neighboring reference pixels when applied to the determined reference pixel.

In another implementation manner of the second aspect, the method includes: determining the intra-prediction angle corresponding to the selected directional intra-prediction mode; the filter is selected according to the determined intra-prediction angle.

In another implementation manner of the second aspect, the method includes: determining the intra-prediction angle corresponding to the selected directional intra-prediction mode; designating another reference pixel for another prediction pixel of the prediction block from the reference pixel set according to the selected directional intra-prediction mode;

in another implementation manner of the second aspect, the method includes: selecting the same filter for each directional intra-prediction mode selected from the first subset of directional intra-prediction modes; a different filter is selected for each directional intra-prediction mode selected from the second subset of directional intra-prediction modes.

In another implementation manner of the second aspect, the method includes: intra-predicting the given predicted pixel directly from the determined reference pixel, wherein the method comprises: the selected filter is applied to the determined reference pixel before or during intra prediction of the given predicted pixel.

In another implementation manner of the second aspect, the method includes: generating transposed reference pixels by interpolating the determined reference pixels according to the selected intra-prediction mode; intra-predicting the given prediction pixel from the transposed reference pixel, wherein the method comprises: the selected filter is applied to the determined reference pixels before or during generation of the transposed reference pixels.

In another implementation manner of the second aspect, the method includes: each reference pixel in the set of reference pixels is transposed, wherein a row of reference pixels becomes a column of transposed reference pixels and a column of reference pixels becomes a row of transposed reference pixels.

In another implementation manner of the second aspect, the reference pixels in the reference pixel set are arranged in rows in the video image adjacent to upper and right positions of the prediction block and/or in columns in the video image adjacent to left and upper left positions of the prediction block.

In another implementation of the second aspect, the method is performed to encode and/or decode the video image or is performed in a video encoder and/or video decoder.

The above-mentioned advantages and effects of the device of the first aspect and its corresponding implementation are achieved by the method of the second aspect and its implementation.

It should be noted that all devices, elements, units and means described in the present application may be implemented in software or hardware elements or any kind of combination thereof. All steps performed by the various entities described in this application and the functions described to be performed by the various entities are intended to indicate that the various entities are adapted to or for performing the respective steps and functions. Although in the following description of specific embodiments, specific functions or steps to be performed by external entities are not reflected in the description of specific elements of the entity performing the specific steps or functions, it should be clear to a skilled person that the methods and functions may be implemented in respective hardware or software elements or any combination thereof.

The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the drawings:

fig. 1 is a block diagram illustrating an exemplary structure of a video encoder for implementing an embodiment of the present invention;

fig. 2 is a block diagram showing an exemplary structure of a video decoder for implementing an embodiment of the present invention;

fig. 3 is a block diagram illustrating an example of a video coding system for implementing an embodiment of the present invention;

fig. 4 shows (a) the cause of discontinuity when the intra-prediction angle is less than 45 °, and (b) the cause of inter-line discontinuity when the intra-prediction angle is less than 45 °.

Fig. 5 is a block diagram illustrating an apparatus of an embodiment of the present invention.

Fig. 6 is a flow chart illustrating a reference pixel filter selection mechanism performed by a device according to an embodiment of the invention, wherein the reference pixel filter selection mechanism depends on intra prediction angles.

Fig. 7 illustrates a reference pixel pre-interpolation mechanism performed by a device in accordance with an embodiment of the present invention.

Fig. 8 is a flow chart of a method according to an embodiment of the invention.

Fig. 9 shows intra-prediction modes in both HM and JEM software (only the angle/directivity mode marked with dotted lines is introduced for JEM, HM is not introduced).

Fig. 10 schematically shows QTBT partitioning.

Fig. 11 shows a current implementation of directional intra-prediction mechanisms in QT and QTBT frameworks:

fig. 12 shows the orientation of rectangular blocks, and in particular, rectangular blocks having (a) a horizontal orientation and (b) a vertical orientation.

Fig. 13 shows the intra mode selection proposed in jfet-D0113.

Fig. 14 shows an extension of the proposed directional intra prediction mode.

Fig. 15 schematically shows the distance between reference pixels for intra prediction of adjacent two rows of prediction pixels having an intra prediction angle equal to 45 °.

Fig. 16 schematically shows the distance between reference pixels for intra prediction of adjacent two rows of prediction pixels having an intra prediction angle smaller than 45 °.

Detailed Description

In the following description, reference is made to the accompanying drawings which form a part hereof and which show by way of illustration specific aspects in which embodiments of the invention may be practiced. It is to be understood that embodiments of the invention may be used in other respects and may include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.

For example, it should be understood that the disclosure in connection with the described methods may apply equally to the corresponding devices or systems for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may include one or more units (e.g., functional units) to perform the one or more described method steps (e.g., one unit performing one or more steps, or each of the plurality of units performing one or more of the plurality of steps), even if the one or more units are not explicitly described or illustrated in the figures. On the other hand, if a specific apparatus is described in terms of one or more units, such as functional units, for example, the corresponding method may include one step to implement the functionality of the one or more units (e.g., one step to implement the functionality of the one or more units, or each of the multiple steps to implement the functionality of the one or more units), even if such one or more steps are not explicitly described or illustrated in the figures. Furthermore, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless otherwise indicated.

Video coding generally refers to the processing of a sequence of images that make up a video or video sequence. In the field of video coding, the term "frame" or "picture" may be used as synonyms. Video coding (coding) includes two parts, video coding and video decoding. Video encoding is performed on the source side, typically involving processing (e.g., compressing) the original video image to reduce the amount of data required to represent the video image (and thus more efficient storage and/or transmission). Video decoding is performed on the destination side, typically involving inverse processing with respect to the encoder to reconstruct the video image. "encoding" with respect to a video image (or generally referred to as an image, which will be explained below) to which the embodiments relate is understood as "encoding" or "decoding" of the video image. The coding portion and decoding portion are also collectively referred to as coding and decoding (CODEC).

In the case of lossless video coding, the original video image may be reconstructed, i.e. the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, the amount of data required to represent the video image is reduced by further compression, such as quantization, and the video image cannot be fully reconstructed at the decoder, i.e. the quality of the reconstructed video image is lower or worse than the quality of the original video image.

Several video coding standards since h.261 belong to the group of "lossy hybrid video codecs" (i.e. spatial prediction and temporal prediction in the pixel domain are combined with 2D transform coding in the transform domain for applying quantization). Each picture in a video sequence is typically divided into non-overlapping sets of blocks, typically encoded on a block level basis. In other words, an encoder typically processes (i.e., encodes) video at a block (video block) level, e.g., generates a prediction block by spatial (intra) prediction and temporal (inter) prediction; subtracting the predicted block from the current block (current processing/block to be processed) to obtain a residual block; the residual block is transformed and quantized in the transform domain to reduce the amount of data to be transmitted (compressed), while the decoder applies the inverse process with respect to the encoder to the encoded or compressed block to reconstruct the current block for representation. Furthermore, the encoder repeats the processing steps of the decoder such that the encoder and decoder generate the same predictions (e.g., intra-prediction and inter-prediction) and/or reconstructions for processing (i.e., encoding) the subsequent blocks.

Since video image processing (also referred to as moving image processing) and still image processing (the term "processing" includes encoding) share many concepts and technologies or tools, the term "image" is used hereinafter to refer to video images (as described above) and/or still images of a video sequence to avoid unnecessary repetition and distinction of video images from still images when not needed. If the above description refers only to still images (still pictures or still images), the term "still image" shall be used.

Before describing embodiments of the present invention in more detail with respect to fig. 4 through 11, an encoder 100, a decoder 200, and an encoding system 300 for implementing embodiments of the present invention are described with respect to fig. 1 through 3.

Fig. 3 is a conceptual or schematic block diagram of one embodiment of an encoding system 300 (e.g., image encoding system 300). The encoding system 300 includes a source device 310, the source device 310 being configured to provide encoded data 330 (e.g., an encoded image 330) to a destination device 320 or the like to decode the encoded data 330.

The source device 310 includes the encoder 100 or the encoding unit 100, and may additionally (i.e., optionally) include an image source 312, a preprocessing unit 314 (e.g., image preprocessing unit 314), and a communication interface or communication unit 318.

The image source 312 may include or be any type of image acquisition device, such as a device for acquiring real world images, and/or any type of image generation device, such as a computer graphics processor for generating computer animated images, or any type of device for acquiring and/or providing real world images, computer animated images (e.g., screen content, virtual Reality (VR) images), and/or any combination thereof (e.g., augmented reality (augmented reality, AR) images). In the following, unless otherwise specifically stated, all these types of images and any other type of images will be referred to as "images", while the previous explanation of the term "images" (including "video images" and "still images") will still apply unless explicitly stated differently.

The (digital) image is or can be regarded as a two-dimensional array or matrix of pixel points having intensity values. The pixels in the array may also be referred to as pixels (pixels or pels) (abbreviations for picture elements). The size and/or resolution of an image is defined by the number of pixels of the array or image in the horizontal and vertical directions (or axes). Three color components are typically used to represent the color, i.e., the image may be represented as or include three arrays of pixels. In RBG format or color space, an image includes corresponding arrays of red, green, and blue pixel dots. However, in video coding, each pixel is typically represented by a luminance/chrominance format or in a color space, e.g., YCbCr, including a luminance component indicated by Y (sometimes also indicated by L) and two chrominance components indicated by Cb and Cr. The luminance (or luma) component Y represents the luminance or gray level intensity (e.g., as in a gray level image), while the two chrominance (or chroma) components Cb and Cr represent the chrominance or color information components. Accordingly, the YCbCr format image includes a luminance pixel dot array composed of luminance pixel dot values (Y) and two chrominance pixel dot arrays composed of chrominance values (Cb and Cr). The RGB formatted image may be converted or transformed into YCbCr format and vice versa, a process also known as color conversion or color conversion. If the image is monochromatic, the image may include only an array of luminance pixels.

For example, the image source 312 may be a camera for capturing images, a memory (e.g., image memory) that includes or stores previously captured or generated images, and/or any type of (internal or external) interface for capturing or receiving images, etc. For example, the camera may be a local or integrated camera integrated in the source device and the memory may be a local or integrated memory (e.g., integrated in the source device). For example, the interface may be an external interface that receives images from an external video source, wherein the external video source is an external image acquisition device such as a camera, an external memory, or an external image generation device such as an external computer graphics processor, computer, or server. The interface may be any type of interface according to any proprietary or standardized interface protocol, such as a wired or wireless interface, an optical interface. The interface used to acquire image data 312 may be the same interface as communication interface 318 or may be part of communication interface 318.

The image or image data 313 may also be referred to as an original image or original image data 313, unlike the preprocessing unit 314 and the processing performed by the preprocessing unit 314.

The preprocessing unit 314 is configured to receive (raw) image data 313 and to preprocess the image data 313 to obtain a preprocessed image 315 or preprocessed image data 315. The preprocessing performed by the preprocessing unit 314 may include clipping, color format conversion (e.g., conversion from RGB to YCbCr), color correction, or denoising, etc.

The encoder 100 is configured to receive the preprocessed image data 315 and to provide encoded image data 171 (described in detail below with respect to fig. 1, etc.).

Communication interface 318 of source device 310 may be used to receive encoded image data 171 and transmit it directly to other devices (e.g., destination device 320 or any other device for storage or direct reconstruction); or process encoded image data 171 before storing encoded data 330 and/or transmitting encoded data 330 to other devices (e.g., destination device 320, or any other device for decoding or storage), respectively.

The destination device 320 includes a decoder 200 or decoding unit 200 and may additionally (i.e., optionally) include a communication interface or communication unit 322, a post-processing unit 326, and a display device 328.

The communication interface 322 of the destination device 320 is operable to receive encoded image data 171 or encoded data 330, for example, directly from the source device 310 or any other source (e.g., memory such as encoded image data memory).

Communication interface 318 and communication interface 322 may be used to transmit or receive encoded image data 171 or encoded data 330, respectively, via a direct communication link (direct wired or wireless connection, etc.) between source device 310 and destination device 320, or via any type of network (e.g., wired or wireless network, or any combination thereof, or any type of private and public networks), or any combination thereof.

For example, communication interface 318 may be used to encapsulate encoded image data 171 into a suitable format (e.g., data packets) for transmission via a communication link or communication network, and may also be used to perform data loss protection and data loss recovery.

For example, communication interface 322, which corresponds to communication interface 318, may be used to decapsulate encoded data 330 to obtain encoded image data 171, and may also be used to perform data loss protection and data loss recovery, including error concealment, for example.

Communication interface 318 and communication interface 322 may each be configured as a one-way communication interface (as indicated by the arrow pointing from source device 310 to encoded image data 330 of destination device 320 in fig. 3), or a two-way communication interface, and may be used to send and receive messages, etc., for example, to establish a connection, to acknowledge and/or resend lost or delayed data (including image data), to interact with any other information related to a communication link and/or data transfer (e.g., the transfer of encoded image data).

The decoder 200 is for receiving the encoded image data 171 and providing decoded image data 231 or decoded image 231 (which will be described in further detail below with respect to fig. 2, etc.).

The post-processor 326 of the destination device 320 is configured to post-process the decoded image data 231 (e.g., the decoded image 231) to obtain post-processed image data 327 (e.g., the post-processed image 327). For example, the post-processing performed by post-processing unit 326 may include color format conversion (e.g., conversion from YCbCr to RGB), color correction, cropping or resampling, or any other processing, e.g., for preparing decoded image data 231 for display by display device 328 or the like.

The display device 328 of the destination device 320 is operable to receive post-processing image data 327 to display the image to a user or viewer, etc. The display device 328 may be or include any type of display for displaying reconstructed images, such as an integrated or external display or monitor. For example, the display may include a Cathode Ray Tube (CRT), a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light emitting diode (organic light emitting diode, OLED) display, or any other type of display, beamer, or hologram (3D).

Although fig. 3 depicts source device 310 and destination device 320 as separate devices, device embodiments may also include two devices or two functions, namely source device 310 or corresponding function and destination device 320 or corresponding function. In such embodiments, the source device 310 or corresponding function and the destination device 320 or corresponding function may be implemented using the same hardware and/or software or by hardware and/or software alone or any combination thereof.

From the description, it will be apparent to the skilled person that the presence and (exact) division of the different units or functions within the source device 310 and/or the destination device 320 shown in fig. 3 may vary depending on the actual device and application.

Accordingly, the source device 310 and the destination device 320 shown in fig. 3 are merely exemplary embodiments of the present invention, and the embodiments of the present invention are not limited to the embodiments shown in fig. 3.

Source device 310 and destination device 320 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, a cell phone, a smart phone, a tablet computer (tablet/tablet computer), a video camera, a desktop computer, a set-top box, a television, a display device, a digital media player, a video game console, a video streaming device, a broadcast receiver device, etc., and may not use or use any type of operating system.

Encoder and encoding method

Fig. 1 is a schematic/conceptual block diagram of an embodiment of an encoder 100 (e.g., an image encoder 100), wherein the encoder 100 includes an input 102, a residual calculation unit 104, a transform unit 106, a quantization unit 108, an inverse quantization unit 110, an inverse transform unit 112, a reconstruction unit 114, a buffer 118, a loop filter 120, a decoded image buffer (decoded picture buffer, DPB) 130, a prediction unit 160 (including an inter estimation unit 142, an inter prediction unit 144, an intra estimation unit 152, and an intra prediction unit 154), a mode selection unit 162, an entropy encoding unit 170, and an output 172. The video encoder 100 as shown in fig. 1 may also be referred to as a hybrid video encoder or a hybrid video codec based video encoder.

For example, the residual calculation unit 104, the transformation unit 106, the quantization unit 108, and the entropy encoding unit 170 form a forward signal path of the encoder 100, while for example, the inverse quantization unit 110, the inverse transformation unit 112, the reconstruction unit 114, the buffer 118, the loop filter 120, the decoded image buffer (DPB) 130, the inter prediction unit 144, and the intra prediction unit 154 form an inverse signal path of the encoder. The inverse signal path of the encoder corresponds to the signal path of the decoder (see decoder 200 in fig. 2).

The encoder 100 is arranged to receive an image 101 or an image block 103 of an image 101 (e.g. an image forming an image sequence of a video or video sequence) via an input 102 or the like. Image block 103 may also be referred to as a current image block or image block to be encoded, and image 101 may also be referred to as a current image or image to be encoded (particularly in video encoding to distinguish a current image from other images (e.g., previously encoded and/or decoded images) in the same video sequence (i.e., a video sequence that also includes a current image).

Residual calculation

The residual calculation unit 104 is configured to calculate a residual block 105 from the image block 103 and the prediction block 165 (the prediction block 165 will be described in detail below), for example, by subtracting the pixel value of the prediction block 165 from the pixel value of the image block 103 pixel by pixel (pixel by pixel) to obtain the residual block 105 in the pixel domain.

Transformation

The transform unit 106 is configured to perform a transform such as a spatial frequency transform or a linear spatial (frequency) transform (e.g., a discrete cosine transform (discrete cosine transform, DCT) or a discrete sine transform (discrete sine transform, DST)) on the pixel values of the residual block 105 to obtain transform coefficients 107 in the transform domain.

Transform unit 106 may be used to apply integer approximations of DCT/DST, such as the core transform specified for HEVC/H.265. The integer approximation is typically scaled by some factor compared to the orthogonal DCT transform. In order to maintain the norms of the residual block subjected to the forward and inverse transform processes, other scaling factors are used as part of the transform process. The scaling factor is typically selected based on certain constraints, e.g., the scaling factor is a power of 2 for the shift operation, the bit depth of the transform coefficients, a tradeoff between accuracy and implementation cost, etc. For example, on the decoder 200 side, a specific scaling factor is specified for inverse transformation by the inverse transformation unit 212 or the like (and on the encoder 100 side, a corresponding inverse transformation by the inverse transformation unit 112 or the like), and accordingly, on the encoder 100 side, a corresponding scaling factor may be specified for forward transformation by the transformation unit 106 or the like.

Quantization

The quantization unit 108 is configured to quantize the transform coefficient 107 by applying scalar quantization, vector quantization, or the like to obtain a quantized coefficient 109. The quantized coefficients 109 may also be referred to as quantized residual coefficients 109. For example, for scalar quantization, different degrees of scaling may be performed to achieve finer or coarser quantization. Smaller quantization step sizes correspond to finer quantization, while larger quantization step sizes correspond to coarser quantization. The appropriate quantization step size may be indicated by a quantization parameter (quantization parameter, QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization steps. For example, a small quantization parameter may correspond to fine quantization (small quantization step size) and a large quantization parameter may correspond to coarse quantization (large quantization step size) and vice versa. Quantization may include dividing by a quantization step size, while corresponding dequantization performed by dequantization unit 110 or the like may include multiplying by a quantization step size. Embodiments according to HEVC may be used to determine quantization step sizes using quantization parameters. In general, the quantization step size may be calculated from quantization parameters using a fixed-point approximation of an equation including division. Additional scaling factors may be introduced for quantization and dequantization to recover the norm of the residual block, scaling being used in the fixed point approximation of the equation for the quantization step size and quantization parameter, possibly modifying the norm of the residual block. In one exemplary implementation, the inverse transform and the dequantized scaling may be combined. Alternatively, the custom quantization table may be used and transmitted from the encoder signal to the decoder in, for example, a code stream. Quantization is a lossy operation, where the larger the quantization step size, the larger the loss.

An embodiment of the encoder 100 (or in particular an embodiment of the quantization unit 108) may be adapted to output the quantization scheme and the quantization step size by means of corresponding quantization parameters or the like, so that the decoder 200 may receive and perform a corresponding dequantization. Embodiments of encoder 100 (or quantization unit 108) may be used to output quantization schemes and quantization step sizes either directly or after entropy encoding via entropy encoding unit 170 or any other entropy encoding unit.

The dequantization unit 110 is configured to perform dequantization of the quantization unit 108 on quantized coefficients by, for example, the following manner to obtain dequantized coefficients 111: the inverse of the quantization scheme performed by the quantization unit 108 is performed according to or using the same quantization step size as the quantization unit 108. The dequantized coefficients 111 may also be referred to as dequantized residual coefficients 111, corresponding to the transform coefficients 108. But the dequantized coefficients 111 are typically different from the transform coefficients 108 due to quantization-induced losses.

The inverse transform unit 112 is for performing inverse transform of the transform performed by the transform unit 106, such as inverse Discrete Cosine Transform (DCT) or inverse Discrete Sine Transform (DST), to obtain an inverse transform block 113 in the pixel domain. The inverse transform block 113 may also be referred to as an inverse transformed dequantized block 113 or an inverse transformed residual block 113.

The reconstruction unit 114 is configured to combine the inverse transform block 113 and the prediction block 165 by adding pixel values of the decoded residual block 113 and pixel values of the prediction block 165, etc. pixel by pixel to obtain a reconstructed block 115 in the pixel domain.

A buffer unit 116 (or simply "buffer" 116) (e.g., a column buffer 116) is used to buffer or store reconstructed blocks and corresponding pixel point values for intra estimation and/or intra prediction, etc. In other embodiments, the encoder may be configured to use unfiltered reconstructed blocks and/or corresponding pixel values stored in the buffer unit 116 for any type of estimation and/or prediction.

The loop filtering unit 120 (or simply "loop filter" 120) is configured to filter the reconstructed block 115 by applying a deblocking pixel adaptive offset (SAO) filter or other filter (e.g., a sharpening or smoothing filter or a collaborative filter) or the like to obtain a filtered block 121. The filtering block 121 may also be referred to as a filtered reconstruction block 121.

An embodiment of the loop filter unit 120 may comprise (not shown in fig. 1) a filter analysis unit and an actual filter unit, wherein the filter analysis unit is arranged to determine loop filter parameters for the actual filter. The filter analysis unit may be adapted to apply fixed predetermined filter parameters to the actual loop filter, to adaptively select filter parameters from a set of predetermined filter parameters, or to adaptively calculate filter parameters for the actual loop filter.

Embodiments of the loop filter unit 120 may comprise (not shown in fig. 1) one or more filters (loop filter components/sub-filters), e.g. one or more different kinds or types of filters, e.g. connected in series or in parallel or in any combination, wherein each filter may comprise a filter analysis unit, alone or in combination with other filters of the plurality of filters, for determining corresponding loop filter parameters, e.g. as described in the previous paragraph.

Embodiments of encoder 100, and in particular loop filter unit 120, may be used to output loop filter parameters directly or after entropy encoding via entropy encoding unit 170 or any other entropy encoding unit, so that decoder 200 may receive and apply the same loop filter parameters for decoding, etc.

The decoded picture buffer (decoded picture buffer, DPB) 130 is used to receive and store the filter block 121. The decoded picture buffer (decoded picture buffer, DPB) 130 may also be used to store other previously filtered blocks (e.g., previously reconstructed filtered block 121) of the same current picture or a different picture (e.g., a previously reconstructed picture), and may provide a complete previously reconstructed (i.e., decoded) picture (and corresponding reference blocks and pixels) and/or a partially reconstructed current picture (and corresponding reference blocks and pixels) for inter estimation and/or inter prediction, etc.

Other embodiments of the present invention may also be used to use the previously filtered block and corresponding filtered pixel values of decoded image buffer 130 for any type of estimation or prediction, such as intra-frame estimation and prediction and inter-frame estimation and prediction.

Motion estimation and prediction

A prediction unit 160, also referred to as block prediction unit 160, for receiving or acquiring image blocks 103 (current image block 103 of current image 101) and decoded or at least reconstructed image data, e.g. reference pixels of the same (current) image from buffer 116 and/or decoded image data 231 of one or more previously decoded images from decoded image buffer 130, and for processing these data for prediction, i.e. providing a prediction block 165, wherein prediction block 165 may be an inter prediction block 145 or an intra prediction block 155.

The mode selection unit 162 may be used to select a prediction mode (e.g., intra or inter prediction mode) and/or a corresponding prediction block 145 or 155, to be used as the prediction block 165 to calculate the residual block 105 and to reconstruct the reconstructed block 115.

Embodiments of the mode selection unit 162 may be used to select (e.g., from among the prediction modes supported by the prediction unit 160) the prediction mode that provides the best match or the smallest residual (smallest residual means better compression for transmission or storage), or has the smallest signaling overhead (smallest signaling overhead means better compression for transmission or storage), or both. The mode selection unit 162 may be adapted to determine a prediction mode from the rate-distortion optimization (rate distortion optimization, RDO), i.e. to select the prediction mode providing the least rate-distortion optimization, or to select the prediction mode of the associated rate-distortion that at least meets the prediction mode selection criteria.

The prediction process (e.g., performed by the prediction unit 160) and the mode selection (e.g., performed by the mode selection unit 162) performed by the exemplary encoder 100 will be described in more detail below.

As described above, the encoder 100 is configured to determine or select the best or optimal prediction mode from a (predetermined) set of prediction modes. The set of prediction modes may include, for example, intra prediction modes and/or inter prediction modes, etc.

The set of intra prediction modes may include 32 different intra prediction modes, for example, a non-directional mode such as a DC (or mean) mode and a plane mode, or a directional mode as defined in h.264, or may include 65 different intra prediction modes, for example, a non-directional mode such as a DC (or mean) mode and a plane mode, or a directional mode as defined in h.265.

The set of (possible) inter prediction modes depends on the available reference pictures (i.e. the previously at least partially decoded pictures stored in DPB 230, for example) and other inter prediction parameters, e.g. on whether the entire reference picture is used or only a part of the reference picture is used (e.g. a search window area near the area of the current block), to search for the best matching reference block, and/or on whether pixel interpolation (e.g. half/half pixel interpolation and/or quarter pixel interpolation) is applied.

In addition to the above prediction modes, a skip mode and/or a direct mode may be used.

Prediction unit 160 may also be used to divide block 103 into smaller blocks or sub-blocks by, among other things: the iteration uses a quad-tree (QT) partition, a binary-tree (BT) partition, or a ternary-tree (TT) partition, or any combination thereof, and is used to perform prediction or the like on each of the blocks or sub-blocks, wherein the mode selection includes selecting a tree structure that partitions the block 103 and selecting a prediction mode used by each of the blocks or sub-blocks.

An inter-estimation unit (inter-estimation unit/inter picture estimation unit) 142 for receiving or acquiring an image block 103 (current image block 103 of current image 101) and a decoded image 231, or at least one or more previously reconstructed blocks (e.g., reconstructed blocks of one or more other/different previously decoded images 231) for inter-estimation (inter-estimation/inter picture estimation). For example, the video sequence may include a current image and a previously decoded image 231, or in other words, the current image and the previously decoded image 231 may be part of or form a series of images that make up the video sequence.

For example, the encoder 100 may be configured to select a reference block from a plurality of reference blocks of the same or different images among a plurality of other images, and provide the reference image (or reference image index, etc.) and/or an offset (spatial offset) between a position (x-coordinate and y-coordinate) of the reference block and a position of the current block as the inter estimation parameter 143 to the inter prediction unit 144. This offset is also called Motion Vector (MV). Inter-frame estimation is also referred to as motion estimation (motion estimation, ME), and inter-frame prediction is also referred to as motion prediction (motion prediction, MP).

The inter prediction unit 144 is configured to obtain (e.g., receive) the inter prediction parameter 143, and perform inter prediction according to or using the inter prediction parameter 143 to obtain the inter prediction block 145.

Although fig. 1 shows two different units (or steps) for inter-coding, i.e., inter-estimation unit 142 and inter-prediction unit 152, both functions may be performed as a whole by, among other things, calculating inter-prediction blocks (inter-estimation typically includes calculating inter-prediction blocks, i.e., inter-prediction 154 of the above or a "class"): all possible inter prediction modes or a predetermined subset of the possible inter prediction modes are iteratively tested, while the current best inter prediction mode and the corresponding inter prediction block are stored and taken as (final) inter prediction parameters 143 and inter prediction blocks 145 without performing inter prediction 144 again.

The intra estimation unit 152 is configured to obtain (e.g., receive) the image block 103 (current image block) and one or more previously reconstructed blocks (e.g., reconstructed neighboring blocks) of the same image for intra estimation. For example, the encoder 100 may be configured to select an intra prediction mode from a plurality of intra prediction modes and provide the intra prediction mode as the intra estimation parameter 153 to the intra prediction unit 154.

Embodiments of encoder 100 may be configured to select an intra-prediction mode based on optimization criteria such as minimum residual (e.g., intra-prediction mode providing a prediction block 155 most similar to current image block 103) or minimum rate distortion.

The intra prediction unit 154 is configured to determine an intra prediction block 155 according to an intra prediction parameter 153, e.g. a selected intra prediction mode 153.

Although fig. 1 shows two different units (or steps) for intra-coding, i.e., intra-estimation unit 152 and intra-prediction unit 154, both functions may be performed as a whole by, among other things, calculating intra-prediction blocks (intra-estimation typically includes calculating intra-prediction blocks, i.e., the above-described or a "class" of intra-prediction 154): all possible intra prediction modes or a predetermined subset of the possible intra prediction modes are iteratively tested, while the current best intra prediction mode and the corresponding intra prediction block are stored and taken as (final) intra prediction parameters 153 and intra prediction blocks 155 without performing the intra prediction 154 once again.

The present invention may be applied to encoder 100 as further described below in conjunction with apparatus 500 (fig. 5) and method 800 (fig. 8) of embodiments of the present invention. That is, the apparatus 500 may be part of the encoder 100, specifically, the intra prediction unit 154.

The entropy encoding unit 170 is configured to perform an entropy encoding algorithm or scheme (e.g., a variable length coding (variable length coding, VLC) scheme, a context adaptive VLC (CALVC) scheme, an arithmetic coding scheme, a context adaptive binary arithmetic coding (context adaptive binary arithmetic coding, CABAC)) on the quantized residual coefficients 109, the inter-prediction parameters 143, the intra-prediction parameters 153, and/or the loop filter parameters, alone or in combination (or not), to obtain encoded image data 171. The output terminal 172 may output the encoded image data 171 using the form of the encoded code stream 171 or the like.

Fig. 2 shows an exemplary video decoder 200. The video decoder 200 is configured to receive encoded image data (e.g., an encoded code stream) 171 encoded by the encoder 100, for example, to obtain a decoded image 231.

Decoder 200 includes an input 202, an entropy decoding unit 204, an inverse quantization unit 210, an inverse transformation unit 212, a reconstruction unit 214, a buffer 216, a loop filter 220, a decoded image buffer 230, a prediction unit 260 (including an inter prediction unit 244 and an intra prediction unit 254), a mode selection unit 260, and an output 232.

The entropy decoding unit 204 is configured to perform entropy decoding on the encoded image data 171 to obtain quantized coefficients 209 and/or decoded encoding parameters (not shown in fig. 2), etc., such as any or all of (decoded) inter-prediction parameters 143, intra-prediction parameters 153, and/or loop filter parameters.

In an embodiment of the decoder 200, the inverse quantization unit 210, the inverse transformation unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the decoded image buffer 230, the prediction unit 260 and the mode selection unit 260 are configured to perform an inverse process of the encoder 100 (and corresponding functional units) to decode the encoded image data 171.

Specifically, inverse quantization unit 210 may be functionally identical to inverse quantization unit 110, inverse transformation unit 212 may be functionally identical to inverse transformation unit 112, reconstruction unit 214 may be functionally identical to reconstruction unit 114, buffer 216 may be functionally identical to buffer 116, and loop filter 220 may be functionally identical to loop filter 220 (with respect to an actual loop filter, since loop filter 220 typically does not include a filter analysis unit, but rather receives or acquires filter parameters for encoding (explicitly or implicitly) from entropy decoding unit 204, etc. (explicitly or implicitly), decoded image buffer 230 may be functionally identical to decoded image buffer 130, as filter parameters are determined from original image 101 or block 103).

Prediction unit 260 may include inter prediction unit 244 and intra prediction unit 254, where inter prediction unit 144 may be functionally identical to inter prediction unit 244 and intra prediction unit 154 may be functionally identical to intra prediction unit 254. The prediction unit 260 and the mode selection unit 262 are typically used for performing block prediction and/or for obtaining only the prediction block 265 from the encoded data 171 (without any other information of the original image 101) and for receiving or obtaining the prediction parameters 143 or 153 and/or information about the selected prediction mode from the entropy decoding unit 204 or the like (explicitly or implicitly).

The present invention may be applied to decoder 200 as further described below in conjunction with apparatus 500 (see fig. 5) and method 800 (see fig. 8) of embodiments of the present invention. That is, the apparatus 500 may be part of the decoder 200, specifically, the intra prediction unit 154.

Decoder 200 is operative to output decoded image 230 via output 232 or the like for presentation to a user or for viewing by the user.

Fig. 4 (a) shows in more detail, in conjunction with fig. 15 and 16, the reason for the discontinuity that can be eliminated according to embodiments of the invention. In particular, these discontinuities are generated because two vertically adjacent prediction pixels 401 in a prediction block 400 (e.g., PU or TU) can be predicted from a reference pixel 403. Since the intra-prediction angle is an acute angle, these reference pixels 403 are not adjacent to each other. This is a drawback of interpolation. Although by applying a reference pixel smoothing filter or length N _f The interpolation filter of (2) may partially compensate for the defect, but in case the intra prediction angle is much smaller than 45 deg., the fixed length may not be large enough. The discontinuity effect can be reduced during filtering by convolving the reference pixel 403 shown in fig. 4 during filtering. However, if the reference pixels 403 selected for the vertically adjacent predicted pixels 401 are too far apart, discontinuities may still occur. Examples of such discontinuities are shown in (b), and can be intuitively observed for the case of synthetic references (upper row) and the like.

Fig. 5 schematically illustrates an apparatus 500 according to an embodiment of the invention. The apparatus for intra-predicting a prediction block 400 of a video image in an improved manner can eliminate the cause of the above-described discontinuity shown in fig. 4. The apparatus 500 may be the encoder 100 shown in fig. 1 or the decoder 200 shown in fig. 2, or a part of the encoder 100 shown in fig. 1 or the decoder 200 shown in fig. 2, in particular, the intra prediction unit 154 or 254.

The apparatus 500 is used to perform several functions, for example, by a processor or other type of processing circuitry. In particular, the apparatus 500 is configured to select a directional intra-prediction mode 501a from a set of directional intra-prediction modes 501, wherein each directional intra-prediction mode 501 corresponds to a different intra-prediction angle. These directional intra-prediction modes 501 may include the directional intra-prediction modes/intra-angle prediction modes shown in fig. 9 (and modes defined in the standard), and may include extended directional intra-prediction modes corresponding to other intra-prediction angles, for example, as shown in fig. 14. Specifically, for the rectangular prediction block 400, the directional intra-prediction mode 501 may include a mode related to an acute angle (an angle smaller than 45 °) of intra-prediction. The intra-prediction angle is based on the direction in which the prediction pixel 401 is intra-predicted from the reference pixel 403. For example, an angle is defined between the intra prediction direction and the upper edge (horizontal edge) of the prediction block 400.

Further, the apparatus 500 is configured to select the filter 402a from the set of filters 402 according to the selected directional intra-prediction mode 501 a. Specifically, the apparatus 500 may be used to: determining a filter length according to the selected directional intra-prediction mode 401 a; one filter 402 is selected from a set of at least determined filter lengths as filter 402a.

The apparatus 500 is also for: determining a reference pixel 403a from the set of reference pixels 403 for a given prediction pixel 401 of the prediction block 400 according to the selected directional intra-prediction mode 501 a; the selected filter 402a is applied to the determined reference pixel 403a. Specifically, the apparatus 500 is configured to proceed with the above steps for each of the prediction pixels 401 of the prediction block 400. That is, for each predicted pixel 401, the device 500 may determine a reference pixel 403a from the reference pixels 403 and may apply the selected filter 402a to each reference pixel 403. In this way, the apparatus 500 is able to intra-predict the entire prediction block 400.

The following table shows an exemplary set of filters from which the device 500 may select the filter 402. Specifically, the filter set includes different filters 402. For example, the filter set 402 may be packaged Including having different filter lengths N _f Specifically, includes a filter length N having a length of across 1, 3, or 5 adjacent reference pixels 403 _f Is provided) and a filter 402 of the filter is provided. Further, when applied to a determined reference pixel 403a, each filter 402 in the set of filters 402 may perform a different smoothing operation on the determined reference pixel 403a and one or more neighboring reference pixels 403. As shown in the table, this smoothing operation may be represented by different coefficients, wherein the number of coefficients indicates the relative weight of the determined reference pixel 403a with respect to other neighboring reference pixels (determining the intermediate number of reference pixels 403a with respect to 0, 2 or 4 other numbers of neighboring reference pixels 403, respectively).

Index	0	1	2	3
					Coefficients of	[1]	[1,2,1]	[2,3,6,3,2]	[1,4,6,4,1]
Filter length N _f	1	3	5	5

Fig. 6 shows a schematic flow chart of a reference pixel filter selection mechanism 600. The device 500 may be used to perform this mechanism. Specifically, the apparatus 500 is able to select the reference pixel filter 402a according to the intra prediction angle. For mechanism 600, assume that a set of filters (denoted here as F) is per filter length N _f Sequencing from small to large.

Step 601: the apparatus 500 is arranged to derive an intra-prediction angle oc as an input to the selection mechanism 600. The apparatus 500 may be used to determine an intra prediction angle corresponding to the selected directional intra prediction mode 501.

Step 602: the device 500 is used to derive a distance Δp between a determined reference pixel 403a and another reference pixel 403b _∝ (see, e.g., FIG. 4), the distance Δp _∝ May be specified from the reference pixel set 403 for another prediction pixel 401 of the prediction block 400 according to the selected directional intra prediction mode 501 a.

Step 603: the filter index is initialized to i=0. Step 604: a filter 402 having a current index i is selected from the set of filters. For example, the above table shows that filter 402 can be indexed according to i=0-3.

Step 605: the apparatus 500 is used to determine the length N of the filter 402 extracted from the set _f Whether or not it is smaller than the distance deltap _∝ . If not, the selection mechanism 600 ends and the currently employed filter 402 is selected as filter 402a to be applied to the determined reference pixel 403a.

Otherwise, in step 606, the device 500 is used to check if the current filter index i is smaller than k, where k may be the largest possible filter index and/or indicate the number of filters 402 in the filter set. If not, the selection mechanism 600 ends and the currently employed filter 402 is selected as the filter 402a applied to the determined reference pixel 403a, in which case the filter 402 corresponds to the filter with the largest filter length N, assuming that the sets are ordered by filter length _f Is provided) and a filter 402 of the filter is provided. Otherwise, in step 607The filter index is incremented by 1 and the selection mechanism continues to step 604 (i.e., the next filter 402 in the set is fetched).

As shown in fig. 7, the apparatus 500 may also be used to pre-process the reference pixels 403. In particular, the apparatus 500 may be used to generate the transposed reference pixel 700a from the determined reference pixel 403a, i.e., to interpolate the determined reference pixel 403a from the selected intra prediction mode 501 a. The apparatus 500 may then be used to predict a given predicted pixel 401 from the transposed reference pixel 700a, rather than directly from the determined reference pixel 403a.

Fig. 7 (a) illustrates a first step of preprocessing, which may include computing a transposed reference pixel set 700 (denoted as R) from a given top row of reference pixels 403 (denoted as R)). The input to this step may be a set of reference pixels 403, these reference pixels 403 being located at the top and upper right side of the block 400 to be predicted. These reference pixels 403 may be filtered according to intra prediction angles as described above. That is, the apparatus 500 may be used to select the filter 402a as described above and then apply the selected filter 402a to the determined reference pixel 403a before or during generation of the transposed reference pixel 700 a.

Specifically, the first step is performed by interpolating two parts of R. Marked as R _L Part of the set of (a) is located at block P _TR To the left of the upper right pixel of (c). During the execution of the first step, position P _TR The reference pixel 403 at will not change, i.eMarked as R _R Is located at P _TR To the right of (a). For both parts, interpolation is performed using the same mechanism as used to predict pixels (denoted B) within the block to be predicted 400. The prediction angles ∈are the same for both parts, but the prediction directions are opposite.

The second step of the preprocessing is, as shown in fig. 7 (b), to intra-predict the predicted pixel 401 of the block 400 to be predicted. I.e., performing intra-prediction interpolation from the transposed reference pixel set 700 calculated in the first step shown in (a). If intra prediction is not performed in the top line, i.e., the angle of the intra prediction direction oc is greater than 180 degrees, the block and the corresponding reference pixel are transposed (the row index becomes the column index and vice versa), and intra prediction as described above is performed. In this case, the calculated prediction block is transposed to obtain a final result.

Fig. 8 illustrates a method 800 of an embodiment of the invention. Method 800 is used for intra-prediction of a prediction block 400 of a video image, and method 800 may be performed by apparatus 500 shown in fig. 5. Specifically, the method 800 includes step 801: the directional intra-prediction mode 501a is selected from a set of directional intra-prediction modes 501, wherein each directional intra-prediction mode 501 corresponds to a different intra-prediction angle. Furthermore, the method 800 comprises step 802: a filter 402a is selected from the set of filters 402 according to the selected directional intra-prediction mode 501 a. The method 800 then comprises step 803: determining a reference pixel 403a for a given predicted pixel 401 of the prediction block 400 from the set of reference pixels 403 according to the selected directional intra-prediction mode 501 a; step 804: the selected filter 402a is applied to the determined reference pixel 403a.

Note that the present specification provides an explanation of an image (frame), but in the case of an interlaced image signal, a field-substituted image is used.

Although embodiments of the present invention have been described primarily in terms of video encoding, it should be noted that embodiments of encoder 100 and decoder 200 (and accordingly system 300) may also be used for still image processing or encoding, i.e., processing or encoding of a single image independent of any preceding or successive image in video encoding. In general, if image processing encoding is limited to the case of a single image 101, only the inter-frame estimation 142, the inter-frame predictions 144 and 242 are not available. Most, if not all, of the other functions (also referred to as tools or techniques) of the video encoder 100 and video decoder 200, including partitioning, transformation (scaling) 106, quantization 108, inverse quantization 110, inverse transformation 112, intra estimation 142, intra prediction 154, 254 and/or loop filtering 120, 220, and entropy encoding 170 and entropy decoding 204, etc., may be equally used for the still image.

Those skilled in the art will appreciate that the "blocks" ("units") of the various figures (methods and apparatus) represent or describe the functions of the embodiments of the invention (rather than necessarily separate "units" in hardware or software), and thus equally describe the functions or features of the apparatus embodiments as well as the method embodiments (unit equivalent steps).

The term "unit" is merely used to illustrate the function of an embodiment of the encoder/decoder and is not intended to limit the application.

In several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the described apparatus embodiments are merely exemplary. For example, the unit division is merely a logical function division, and may be another division when actually implemented. For example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be implemented through some interfaces. An indirect coupling or communication connection between devices or units may be made through electrical, mechanical, or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, functional units in embodiments of the invention may be integrated into one processing unit, or each unit may physically reside in a single unit, or two or more units may be integrated into one unit.

Embodiments of the invention may also include an apparatus, e.g., an encoder and/or decoder, comprising processing circuitry to perform any of the methods and/or processes described herein.

Embodiments of encoder 100 and/or decoder 200 may be implemented as hardware, firmware, software, or any combination thereof. For example, the functions of the encoder/encoder or decoder/decoder may be performed by processing circuitry, whether or not there is firmware or software, such as a processor, microcontroller, digital signal processor (digital signal processor, DSP), field programmable gate array (field programmable gate array, FPGA), application-specific integrated circuit (application-specific integrated circuit, ASIC), or the like.

The functions of the encoder 100 (and corresponding encoding method 100) and/or the decoder 200 (and corresponding decoding method 200) may be implemented by program instructions stored on a computer readable medium. The program instructions, when executed, cause a processing circuit, computer, processor, etc. to perform the steps of the encoding and/or decoding method. The computer readable medium may be any medium that stores the program, including non-transitory storage media, such as blu-ray disc, DVD, CD, USB (flash) drive, hard disk, server storage available via a network, etc.

Embodiments of the present invention include or are a computer program comprising program code. Which when executed on a computer is adapted to carry out any of the methods described herein.

Embodiments of the present invention include or are a computer-readable medium containing program code. The program code, when executed by a processor, causes a computer system to perform any of the methods described herein.

List of reference numerals

FIG. 1

100. Encoder with a plurality of sensors

103. Image block

102. Input (e.g. input port, input interface)

104. Residual calculation [ Unit or step ]

105. Residual block

106. Transformation (e.g. also including scaling) [ units or steps ]

107. Transform coefficients

108. Quantification [ Unit or step ]

109. Quantization coefficient

110. Inverse quantization [ Unit or step ]

111. Dequantizing coefficients

112. Inverse transforms (e.g. also including scaling) [ units or steps ]

113. Inverse transform block

114. Reconstruction [ Unit or step ]

115. Reconstruction block

116 (column) buffers [ Unit or step ]

117. Reference pixel

120. Loop filter [ element or step ]

121. Filtering block

130. Decoding image buffer (decoded picture buffer, DPB) [ Unit or step ]

142. Inter-frame estimation (inter-estimation/inter picture estimation) [ Unit or step ]

143. Inter estimation parameters (e.g., reference picture/reference picture index, motion vector/offset)

144. Inter prediction (inter prediction/inter picture prediction) [ Unit or step ]

145. Inter prediction block

152. Intra estimation (intra-estimation/intra picture estimation) [ Unit or step ]

153. Intra prediction parameters (e.g., intra prediction modes)

154. Intra prediction (intra prediction/intra frame/picture prediction) [ Unit or step ]

155. Intra prediction block

162. Mode selection [ Unit or step ]

165. Prediction block (inter prediction block 145 or intra prediction block 155)

170. Entropy coding [ Unit or step ]

171. Encoding image data (e.g., a code stream)

172. Output terminal (output port, output interface)

231. Decoding an image

FIG. 2

200. Decoder

171. Encoding image data (e.g., a code stream)

202. Input end (Port/interface)

204. Entropy decoding

209. Quantization coefficient

210. Inverse quantization

211. Dequantizing coefficients

212. Inverse transform (zoom)

213. Inverse transform block

214. Reconstruction (Unit)

215. Reconstruction block

216 (column) buffer

217. Reference pixel

220. Loop filter (in-loop filter)

221. Filtering block

230. Decoding image buffer (decoded picture buffer, DPB)

231. Decoding an image

232. Output end (Port/interface)

244. Inter prediction (inter prediction/inter frame/picture prediction)

245. Inter prediction block

254. Intra prediction (intra prediction/intra frame/picture prediction)

255. Intra prediction block

260. Mode selection

265. Prediction block (inter prediction block 245 or intra prediction block 255)

FIG. 3

300. Coding system

310. Source device

312. Image source

313 (original) image data

314. Preprocessor/preprocessing unit

315. Preprocessing image data

318. Communication unit/interface

320. Destination device

322. Communication unit/interface

326. Post processor/post processing unit

327. Post-processing image data

328. Display device/unit

330. Transmitting/receiving/communicating (encoding) image data

FIG. 4

400. Prediction block

401. Prediction pixel

402. Filter device

403. Reference pixel

FIG. 5

402. Filter device

402a selected filter

403. Reference pixel

403A determined reference pixel

500. Apparatus and method for controlling the operation of a device

501. Directional intra prediction modes

501a selected directional intra prediction mode

FIG. 6

600. Filter selection mechanism

Functional steps of 601-607 mechanism

FIG. 7

400. Prediction block

401. Prediction pixel

403. Reference pixel

403a defined reference pixel

700. Transposed reference pixels

700a transposed reference pixel

FIG. 8

800. Method for intra-frame prediction of a prediction block

801. Intra prediction mode selection step

802. Filter selection step

803. Step of determining a reference pixel for a given predicted pixel

804. Step of applying the selected filter to the reference pixel

Claims

1. A method for intra-predicting a prediction block of a video image, the method comprising:

selecting a directional intra-prediction mode from a set of directional intra-prediction modes, wherein each directional intra-prediction mode corresponds to a different intra-prediction angle, the set of directional intra-prediction modes comprising an extended directional intra-prediction mode, the extended directional intra-prediction mode being applicable only to rectangular blocks;

determining a reference pixel for a given prediction pixel of the prediction block from a set of reference pixels according to the selected directional intra-prediction mode;

selecting a filter from a set of filters according to the selected directional intra-prediction mode;

the selected filter is applied to the determined reference pixel.

2. The method according to claim 1, characterized in that the method comprises:

Determining the intra-prediction angle corresponding to the selected directional intra-prediction mode;

and determining whether to apply a filter to the determined reference pixel according to the determined intra-frame prediction angle.

3. The method of claim 1, wherein the extended directional intra-prediction mode comprises a mode related to an acute angle of intra-prediction; or the extended directional intra-prediction mode includes an intra-prediction mode greater than 66.

4. The method of claim 1, wherein the set of filters comprises filters having a filter length that spans 1, 3, or 5 adjacent reference pixels.

5. The method of claim 4, wherein a filter length is determined based on the selected directional intra-prediction mode;

determining that no filter is applied to the determined reference pixel when the filter length is 1; or determining to apply a filter to the determined reference pixel when the filter length is 3.

6. The method of claim 1, wherein the method comprises the steps of,

when applied to the determined reference pixel, each filter in the set of filters performs a different smoothing operation on the determined reference pixel and one or more neighboring reference pixels.

7. A method according to any one of claims 1 to 3, characterized in that:

selecting the same filter for each directional intra-prediction mode selected from the first subset of directional intra-prediction modes;

a different filter is selected for each directional intra-prediction mode selected from the second subset of directional intra-prediction modes.

8. A method according to any one of claims 1 to 3, wherein the method further comprises:

intra-predicting the given predicted pixel directly from the determined reference pixel, wherein

The selected filter is applied to the determined reference pixel before or during intra prediction of the given predicted pixel.

9. A method according to any one of claims 1 to 3, wherein the method further comprises:

generating transposed reference pixels by interpolating the determined reference pixels according to the selected intra-prediction mode;

intra-predicting the given predicted pixel from the transposed reference pixel, wherein,

the selected filter is applied to the determined reference pixels before or during generation of the transposed reference pixels.

10. An apparatus for intra-predicting a predicted block of a video image, the apparatus being configured to:

the selected filter is applied to the determined reference pixel.

11. The apparatus of claim 10, wherein the apparatus is configured to:

12. The apparatus of claim 10, wherein the extended directional intra-prediction mode comprises a mode related to an acute angle of intra-prediction; or the extended directional intra-prediction mode includes an intra-prediction mode greater than 66.

13. The apparatus of claim 10, wherein the filter set comprises filters having a filter length that spans 1, 3, or 5 adjacent reference pixels.

14. The apparatus of claim 13, wherein the apparatus is configured to:

determining a filter length according to the selected directional intra-prediction mode;

15. The apparatus according to any one of claims 10 to 12, characterized in that the apparatus is adapted to:

16. A computer storage medium having stored therein computer program code which, when run on a processor, causes the processor to perform the method of any of claims 1-9.