CN116648911A - Intra prediction using an enhanced interpolation filter - Google Patents

Intra prediction using an enhanced interpolation filter Download PDF

Info

Publication number
CN116648911A
CN116648911A CN202180084615.5A CN202180084615A CN116648911A CN 116648911 A CN116648911 A CN 116648911A CN 202180084615 A CN202180084615 A CN 202180084615A CN 116648911 A CN116648911 A CN 116648911A
Authority
CN
China
Prior art keywords
block
filter
intra
smoothing
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180084615.5A
Other languages
Chinese (zh)
Inventor
B·雷
V·塞雷金
M·卡尔切维茨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/645,024 external-priority patent/US20220201329A1/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority claimed from PCT/US2021/073040 external-priority patent/WO2022140765A1/en
Publication of CN116648911A publication Critical patent/CN116648911A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Techniques for processing video data using an enhanced interpolation filter for intra prediction are described herein. For example, an apparatus may determine an intra prediction mode for predicting a block of video data. The apparatus may determine a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold. The apparatus may also intra-predict the block of video data using the determined type of the smoothing filter and the intra-prediction mode.

Description

Intra prediction using an enhanced interpolation filter
Technical Field
The present application relates to video coding (e.g., including encoding and/or decoding of video data). For example, aspects of the present application relate to systems and techniques for intra prediction using an enhanced interpolation filter.
Background
Many devices and systems allow processing and outputting video data for consumption. Digital video data includes a large amount of data to meet consumer and video provider needs. For example, consumers of video data desire the highest quality video with high fidelity, resolution, frame rate, etc. Thus, the large amount of video data required to meet these demands places a burden on the communication networks and devices that process and store the video data.
Various video codec techniques may be used to compress video data. Video encoding and decoding are performed according to one or more video encoding and decoding standards. For example, video codec standards include general Video codec (VVC), high Efficiency Video Codec (HEVC), advanced Video Codec (AVC), MPEG-2 part 2 codec (MPEG is an abbreviation for moving picture experts group), and the like, as well as proprietary Video codecs/formats such as AOMedia Video 1 (AV 1) developed by the alliance for open media. Video coding typically uses prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy present in video pictures or sequences. The goal of video codec technology is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality. With the advent of ever-evolving video services, coding techniques with higher codec efficiency are needed.
Disclosure of Invention
In some examples, systems and techniques are described for intra prediction using an enhanced interpolation filter that can apply variable types and smoothness based on information such as block size, intra prediction mode, and the like. According to at least one illustrative example, a method for processing video data is provided. The method comprises the following steps: determining an intra-prediction mode for predicting a block of video data; determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and intra-predicting the block of video data using the determined type of smoothing filter and the intra-prediction mode.
In another example, an apparatus for processing video data is provided that includes at least one memory (e.g., configured to store data, such as virtual content data, one or more images, etc.) and at least one processor (e.g., implemented in circuit form) coupled to the at least one memory. The one or more processors are configured and operable to: determining an intra-prediction mode for predicting a block of video data; determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and intra-predicting the block of video data using the determined type of smoothing filter and the intra-prediction mode.
In another example, a non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to: determining an intra-prediction mode for predicting a block of video data; determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and intra-predicting the block of video data using the determined type of smoothing filter and the intra-prediction mode.
In another example, an apparatus for processing video data is provided. The device comprises: means for determining an intra prediction mode for predicting a block of video data; means for determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and means for intra predicting the block of video data using the determined type of the smoothing filter and the intra prediction mode.
In some aspects, the process, apparatus, and computer readable medium may further comprise: based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, using a first smoothing interpolation filter as the type of smoothing filter determined; and determining reference pixels for intra prediction of the block of video data using the first smooth interpolation filter.
In some aspects, the first smooth interpolation filter comprises a 6 tap gaussian filter.
In some aspects, the process, apparatus, and computer readable medium may further comprise: using a second smoothing interpolation filter as the type of smoothing filter determined based at least in part on determining that at least one of the width of the block and the height of the block is not greater than the first threshold; and determining reference pixels for intra prediction of the block of video data using the second smooth interpolation filter.
In some aspects, the second smooth interpolation filter comprises a 4 tap gaussian filter.
In some aspects, the process, apparatus, and computer readable medium may further comprise: determining a minimum offset between an angular direction of the intra prediction mode and one of a vertical intra prediction mode and a horizontal intra prediction mode; and determining the type of smoothing filter for the block of video data based on comparing the determined minimum offset to a second threshold.
In some aspects, the process, apparatus, and computer readable medium may further comprise: the low pass filter is determined to be of the type of smoothing filter based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra prediction mode is an integer angle mode associated with an integer value reference pixel location.
In some aspects, the low pass filter performs reference pixel smoothing without interpolation, the low pass filter comprising a [1 2 1] filter.
In some aspects, the process, apparatus, and computer readable medium may further comprise: the method may further include determining a gaussian filter as the type of smoothing filter based at least in part on the determination that the determined minimum offset is greater than the second threshold and the determination that the intra-prediction mode is a fractional angle mode associated with a fractional value reference pixel location.
In some aspects, the gaussian filter performs smooth interpolation without reference pixel smoothing.
In some aspects, the gaussian filter comprises a 6 tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is greater than the first threshold.
In some aspects, the gaussian filter comprises a 4-tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is not greater than a first threshold.
In some aspects, the process, apparatus, and computer readable medium may further comprise: based at least in part on a determination that the determined minimum offset is not greater than the second threshold: using an interpolation filter as the type of the determined smoothing filter, wherein the interpolation filter comprises a 4 tap cubic (cubic) filter; and intra-predicting the block of video data using the interpolation filter without applying reference pixel smoothing.
In some aspects, the process, apparatus, and computer readable medium may further comprise: the low pass filter is determined to be of the type of smoothing filter based at least in part on the determination that the intra prediction mode is an integer angle mode and the determination that the minimum offset is greater than the second threshold.
In some aspects, the process, apparatus, and computer readable medium may further comprise: reference pixel smoothing is performed using a large tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, wherein the large tap low pass filter applies a greater degree of reference pixel smoothing than a small tap low pass filter.
In some aspects, the process, apparatus, and computer readable medium may further comprise: a small tap low pass filter is used for reference pixel smoothing based at least in part on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold, wherein the small tap low pass filter applies a lesser degree of reference pixel smoothing than a large tap low pass filter.
In some aspects, the process, apparatus, and computer readable medium may further comprise: the intra-prediction mode is determined to be an integer angle mode based at least in part on comparing a slope of the intra-prediction mode to one or more pixel locations determined from the width of the block and the height of the block.
In some aspects, the process, apparatus, and computer readable medium may further comprise: determining that an offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than a second threshold; and intra-predicting the block of video data using a cubic interpolation filter based on determining that the offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than the second threshold.
In some aspects, the process, apparatus, and computer readable medium may further comprise: reference line extension is performed using a weak interpolation filter, wherein: performing the reference line extension using the weak interpolation filter before performing the intra prediction using the cubic interpolation filter; and the cubic interpolation filter has a higher cut-off (cutoff) frequency than the weak interpolation filter and applies a greater degree of smoothing than the weak interpolation filter.
In some aspects, the weak interpolation filter comprises a 4 tap sinc-based interpolation filter and a 6 bit 4 tap interpolation filter.
In some aspects, the type of smoothing filter is signaled in the video bitstream.
In some aspects, this type of smoothing filter is signaled for each (index) in a set of prediction blocks, codec Tree Units (CTUs), slices, or sequences.
In some aspects, the process, apparatus, and computer readable medium may further comprise: the type of smoothing filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in the video bitstream.
In some aspects, the process, apparatus, and computer readable medium may further comprise: determining a residual data block for the block of video data; and decoding the block of video data using the residual block of data and a predictive block determined based on the intra prediction of the block of video data.
In some aspects, the process, apparatus, and computer readable medium may further comprise: an encoded video bitstream is generated that includes information associated with the block of video data.
In some aspects, the process, apparatus, and computer readable medium may further comprise: the encoded video bitstream is stored (e.g., in the at least one memory of the apparatus).
In some aspects, the process, apparatus, and computer readable medium may further comprise: the encoded video bitstream is transmitted (e.g., using a transmitter of the apparatus).
In some aspects, each of the above-described apparatuses may be the following or a portion thereof: a mobile device (e.g., a mobile phone or so-called "smart phone", tablet computer, or other type of mobile device), a network-connected wearable device, an augmented reality device (e.g., a Virtual Reality (VR) device, an Augmented Reality (AR) device, or a Mixed Reality (MR) device), a personal computer, a laptop computer, a server computer (e.g., a video server or other server device), a television, a vehicle (or a computing device or system of a vehicle), a camera (e.g., a digital camera, an Internet Protocol (IP) camera, etc.), a multi-camera system, a robotic device or system, an aeronautical device or system, or other device. In some aspects, each of the apparatuses may include at least one camera for capturing one or more images or video frames. For example, each of the devices may include a camera (e.g., an RGB camera) or cameras for capturing one or more images and/or one or more videos including video frames. In some aspects, each of the devices may include a display for displaying one or more images, videos, notifications, or other displayable data. In some aspects, each of the apparatus may include a transmitter configured to transmit one or more video frames and/or syntax data to at least one device over a transmission medium. In some aspects, each of the devices may include one or more sensors.
This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification of this patent, any or all of the accompanying drawings, and each claim.
The foregoing, along with other features and embodiments, will become more apparent when reference is made to the following description, claims and accompanying drawings.
Drawings
Illustrative embodiments of the application are described in detail below with reference to the following drawings:
fig. 1 is a block diagram illustrating an example of an encoding device and a decoding device according to some examples;
FIG. 2A is a diagram illustrating an example of an angular prediction mode according to some examples;
fig. 2B is a diagram illustrating an example of a directional intra-prediction mode in a general video codec (VVC) according to some examples;
FIG. 3 is a diagram illustrating an example of a Mode Dependent Intra Smoothing (MDIS) process according to some examples;
FIG. 4 is a diagram illustrating an example of a reference line extension according to some examples;
FIG. 5 is a diagram illustrating an example of switchable Gaussian filtering based on one or more of block size and intra-prediction modes, according to some examples;
FIG. 6 is a flow chart illustrating an example of a process for intra prediction using an enhanced interpolation filter according to some examples;
FIG. 7 is a block diagram illustrating an example video encoding device according to some examples; and
fig. 8 is a block diagram illustrating an example video decoding device according to some examples.
Detailed Description
Certain aspects and embodiments of the disclosure are provided below. Some of these aspects and embodiments may be applied independently and some of them may be applied in combination, as will be apparent to those skilled in the art. In the following description, for purposes of explanation, specific details are set forth in order to provide a thorough understanding of embodiments of the application. It may be evident, however, that the various embodiments may be practiced without these specific details. The drawings and description are not intended to be limiting.
The following description merely provides exemplary embodiments and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the application as set forth in the appended claims.
Digital video data may include large amounts of data, particularly as the demand for high quality video data continues to grow. For example, consumers of video data often desire video of increasingly higher quality, with high fidelity, resolution, frame rate, and the like. However, the large amount of video data required to meet these demands places a significant burden on the communication network and the devices that process and store the video data.
Video codec devices implement video compression techniques to efficiently encode and decode video data. Video compression techniques may include applying different prediction modes, including spatial prediction (e.g., intra-frame prediction), temporal prediction (e.g., inter-frame prediction), inter-layer prediction (across different layers of video data), and/or other prediction techniques to reduce or eliminate redundancy inherent in video sequences. The video encoder may divide each picture of the original video sequence into rectangular areas called video blocks or codec units (described in more detail below). These video blocks may be encoded using a particular prediction mode.
The video blocks may be divided into groups of one or more smaller blocks in one or more ways. The blocks may include coding tree blocks, prediction blocks, transform blocks, and/or other suitable blocks. Unless otherwise specified, a general reference to a "block" may refer to such video block (e.g., a coding tree block, a codec block, a prediction block, a transform block, or other suitable block or sub-block, as will be appreciated by one of ordinary skill in the art).
For inter prediction modes, a video encoder may search for a block similar to the block being encoded in a frame (or picture) located at another temporal location, referred to as a reference frame or reference picture. The video encoder may limit the search to a certain spatial displacement from the block to be encoded. A two-dimensional (2D) motion vector comprising a horizontal displacement component and a vertical displacement component may be used to locate the best match. For intra prediction modes, a video encoder may use spatial prediction techniques based on data from previously encoded neighboring blocks within the same picture to form the predicted block.
The video encoder may determine a prediction error. For example, the prediction may be determined as the difference between the pixel values in the block being encoded and the pixel values in the predicted block. The prediction error may also be referred to as a residual. The video encoder may also apply a transform (e.g., a Discrete Cosine Transform (DCT) or other suitable transform) to the prediction error to generate transform coefficients. After transformation, the video encoder may quantize the transform coefficients. The quantized transform coefficients and motion vectors may be represented using syntax elements and form, together with control information, a decoded representation of the video sequence. In some examples, the video encoder may entropy encode the quantized transform coefficients and/or syntax elements, thereby further reducing the number of bits required for its representation.
After entropy decoding and dequantizing the received bitstream, the video decoder may use the syntax elements and control information described above to construct predictive data (e.g., predictive blocks) for decoding the current frame. For example, a video decoder may add a predicted block and a compressed prediction error. The video decoder may determine the compressed prediction error by weighting the transform basis function using the quantized coefficients. The difference between the reconstructed frame and the original frame is referred to as the reconstruction error.
Video encoding and decoding may be performed according to a specific video encoding and decoding standard. Examples of Video codec standards include, but are not limited to, ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262 or ISO/IEC MPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, advanced Video Codec (AVC) or ITU-T H.264 (including Scalable Video Codec (SVC) and Multiview Video Codec (MVC) extensions thereof), high Efficiency Video Codec (HEVC) or ITU-T H.265 (including range and screen content codecs thereof), 3D Video codec (3D-HEVC), multiview (MV-HEVC) and Scalable (SHVC) extensions, universal Video codec (VVC) or ITU-T H.266 and extensions thereof, VP9, open media alliance (AOMedia) Video 1 (AV 1), basic Video codec (EVC), and the like.
As described above, a video encoder may divide each picture of an original video sequence into one or more smaller blocks or rectangular regions, which may then be encoded using, for example, intra-frame prediction (or intra-frame prediction) to remove spatial redundancy inherent to the original video sequence. If the block is encoded in intra prediction mode, a prediction block is formed based on previously encoded and reconstructed blocks that may be used to form a prediction reference in both the video encoder and the video decoder. For example, the pixel values of neighboring previously encoded blocks may be used to determine a spatial prediction of pixel values inside a current block (e.g., currently encoded or currently decoded). These pixel values are used as reference pixels. The reference pixels may be organized into one or more reference pixel lines and/or reference pixel groups. In some examples, intra prediction may be applied to both luma and chroma components of a block.
Different spatial prediction techniques may be provided using a variety of different intra-prediction modes to form a predicted reference or predicted block based on data from previously encoded neighboring blocks (e.g., from reference pixels) within the same image. Intra-prediction modes may include planar and DC modes and/or directional intra-prediction modes (also referred to as "normal intra-prediction modes"). In some examples, a single planar intra prediction and a single DC intra prediction mode may be used along with multiple directional intra prediction modes. Intra-prediction modes describe different variations or methods for calculating pixel values in the region being encoded based on reference pixel values. In the illustrative example, the HEVC standard provides 33 directional intra-prediction modes. In another illustrative example, VVC and/or VVC test model 5 (VTM 5) extends HEVC directional intra-prediction modes to provide a total of 93 directional intra-prediction modes.
At the video decoder, the selection of the intra-prediction mode for each encoded block (e.g., the selection of the intra-prediction mode by the video encoder when generating the encoded block) may be determined (e.g., derived) by the decoder or may be signaled to the video decoder. For example, in some cases, the intra-prediction modes between neighboring blocks may be correlated (e.g., if intra-prediction mode 2 is used to predict two neighboring previously encoded blocks, then the best intra-prediction mode for the current block is also likely to be intra-prediction mode 2). In some examples, for each current block, the video encoder and video decoder may calculate the most probable intra prediction mode. The video encoder may also signal the intra-prediction mode to the video decoder (e.g., using a flag, a mode parameter, a mode selector, etc.).
In the current VVC standard, 93 directional intra-prediction modes are provided, as described previously. Each intra-prediction mode is associated with a different angular direction such that the intra-prediction modes are unique and non-overlapping. The directional intra-prediction mode may be classified as an integer angle mode or a fractional (non-integer angle) mode. For a given block of video data, the integer-angle intra-prediction mode has reference pixels at integer locations, e.g., the integer-angle intra-prediction mode has slopes that pass through locations of reference pixels located at the perimeter of the current decoded block. In contrast, the fractional intra prediction mode does not have a reference pixel at an integer position, but rather has a slope that passes through a point somewhere between two adjacent reference pixels (e.g., the slope of a pixel at fractional position i+f (i: integer portion, f: fractional portion) passes through pixel i and pixel i+1).
According to the VVC standard, one or more smoothing filters and/or operations may be applied to the reference pixels based on the intra prediction mode. By smoothing or filtering the reference pixels, a more accurate intra prediction result can be obtained, since the intra prediction result is calculated from the smoothed reference pixels. In some examples, reference pixel smoothing may be performed for both a fractional intra prediction mode and an integer (e.g., integer slope) intra prediction mode. In addition to the smoothing filter for reference pixel smoothing, the VVC standard also specifies the use of one or more interpolation filters. In some examples, smoothing may be performed by directly smoothing the reference pixels. In some examples, the smoothing operation may be performed in combination with or in conjunction with an interpolation operation (e.g., by applying a smoothing interpolation filter).
For example, interpolation filters may be used to interpolate for fractional intra prediction modes. The fractional intra prediction mode has a non-integer value slope and is therefore associated with fractional reference pixel locations (e.g., at locations between neighboring reference pixels). Thus, intra prediction for the fractional intra prediction mode may interpolate between the values of adjacent reference pixels to calculate interpolated values for fractional reference pixel positions. In some scenarios, a majority of the directional intra-prediction modes may be fractional (e.g., non-integer) modes. For example, in the VVC standard, intra-prediction modes-14, -12, -10, -6, 2, 18, 34, 50, 66, 72, 76, 78, and 80 may be integer intra-prediction modes (also referred to as "integer slope modes"), with the remaining modes of the 93 directional intra-prediction modes being fractional intra-prediction modes.
The VVC standard specifies the use of a fixed smoothness for all block sizes. For example, according to the VVC standard, a codec device (e.g., a video encoding device and/or a video decoding device) may use a 4-tap gaussian interpolation filter and/or a low-pass filter for all block sizes. In some cases, using fixed smoothness for all block sizes (e.g., using a 4-tap gaussian interpolation filter and/or a [1 2 1] low pass filter for all block sizes) may result in reduced intra prediction performance. For example, larger block sizes (e.g., blocks having a width and/or height of 16 or more samples) may benefit from higher smoothness than smaller block sizes (e.g., blocks having a width and/or height of less than 16 samples). When intra prediction is performed according to the VVC standard, large and small block sizes may be encountered because the block partitioning scheme in VVC allows for different block sizes based on different inputs, parameters, and other analysis factors. In some cases, a larger block size may be associated with portions of the original video sequence image that already include relatively smooth edges and/or a relatively small number of features. A small block size may be associated with a portion of the original video sequence image that contains a relatively high number of features, directions, etc.
Because the creation of larger block sizes is typically associated with the presence of relatively smooth video data within a block, in some examples, larger block size intra-prediction may benefit from applying higher smoothness, while smaller block size intra-prediction may benefit from applying lower smoothness.
As described in greater detail herein, systems, devices, methods, and computer-readable media (collectively, "systems and techniques") for providing improved intra-prediction are described herein. For example, as described in more detail herein, the systems and techniques may use multiple smoothing and/or interpolation filters to make intra-prediction, each having a different degree of smoothing and/or filtering. According to some aspects, the systems and techniques may include selecting one or more smoothing and interpolation filters (and associated types of smoothing and/or associated smoothness) based on the size of the current decoded block. For example, one or more of the width of the block and the height of the block may be compared to a predetermined threshold, where smaller blocks (e.g., blocks having a width and/or height less than the threshold) receive different degrees or smoothness than larger blocks (e.g., blocks having a width and/or height greater than the threshold).
In some examples, the smoothing and/or interpolation filter may additionally or alternatively be selected based on an intra prediction mode being used for a picture or portion of a picture (e.g., block, slice, etc.). The relationship between a particular intra prediction mode and a smoothing or interpolation filter may be determined in advance and/or in real-time (e.g., when a picture, block, slice, etc. is being encoded or decoded). In an illustrative example, the intra-prediction mode of the current decoded block may be compared to a vertical intra-prediction mode and a horizontal intra-prediction mode in order to determine a minimum distance (e.g., an angular distance or offset) between the intra-prediction mode of the current block and one of the vertical and horizontal intra-prediction modes. The minimum distance may be compared to a predetermined threshold (defined in the VVC standard in some examples) to determine whether smoothing and/or filtering should be applied to the current decoded block. In some examples, variable smoothing of reference pixels with block level switching may provide enhanced intra prediction, as described herein below in greater depth.
Further details regarding the systems and techniques will be described with reference to the accompanying drawings.
Fig. 1 is a block diagram illustrating an example of a system 100 including an encoding device 104 and a decoding device 112. The encoding device 104 may be part of a source device and the decoding device 112 may be part of a receiving device. The source device and/or the receiving device may include an electronic device such as a mobile or stationary telephone handset (e.g., smart phone, cellular phone, etc.), desktop computer, laptop or notebook computer, tablet computer, set-top box, television, camera, display device, digital media player, video game console, video streaming device, internet Protocol (IP) camera, or any other suitable electronic device. In some examples, the source device and the receiving device may include one or more wireless transceivers for wireless communications. The codec techniques described herein may be applied to video codecs in a variety of multimedia applications, including streaming video transmission (e.g., over the internet), television broadcasting or transmission, encoding digital video stored on a data storage medium, decoding digital video stored on a data storage medium, or other applications. As used herein, the term coding may refer to encoding and/or decoding. In some examples, system 100 may support unidirectional or bidirectional video transmission to support, for example, video conferencing, video streaming, video playback, video broadcasting, gaming, and/or video telephony.
The encoding device 104 (or encoder) may be used to encode and decode video data using a video codec standard, format, codec, or protocol to generate a decoded video bitstream. Examples of video codec standards and formats/codecs include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262 or ISO/IEC MPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including Scalable Video Codec (SVC) and Multiview Video Codec (MVC) extensions thereof, high Efficiency Video Codec (HEVC) or ITU-T H.265, and general video codec (VVC) or ITU-T H.266. Various extensions to HEVC exist to handle multi-layer video codecs, including range and screen content codec extensions, 3D video codec (3D-HEVC) and multiview extension (MV-HEVC), and scalable extension (SHVC). HEVC and its extensions have been developed by the video codec Joint Cooperation group (JCT-VC), the three-dimensional video codec extension development Joint Cooperation group (JCT-3V) of the ITU-T Video Codec Expert Group (VCEG) and the ISO/IEC Moving Picture Expert Group (MPEG). VP9, AOMedia Video 1 (AV 1) developed by the open media alliance (AOMedia), and Elementary Video Codec (EVC) are other Video codec standards to which the techniques described herein may be applied.
VVC is the latest video codec standard developed by the joint video expert group (jfet) of ITU-T and ISO/IEC to at least partially implement a large amount of compression capability beyond HEVC for a wide range of applications. The VVC specification was completed in month 7 of 2020 and is promulgated by ITU-T and ISO/IEC. The VVC specification specifies standard bitstream and picture formats, high Level Syntax (HLS) and codec unit level syntax, parsing procedures, decoding procedures, and the like. VVC also specifies profile/layer/level (PTL) restrictions, byte stream format, hypothetical reference decoder, and Supplemental Enhancement Information (SEI) in the appendix.
The systems and techniques described herein may be applied to any existing video codec (e.g., VVC, HEVC, AVC or other suitable existing video codec) and/or may be an efficient codec tool for any video codec standard being developed and/or future video codec standards. For example, examples described herein may be performed using a video codec such as VVC, HEVC, AVC and/or extensions thereof. However, the techniques and systems described herein may also be applicable to other codec standards, codecs, or formats, such as MPEG, JPEG (or other codec standards for still images), VP9, AV1, extensions thereof, or other suitable codec standards that have been or have not been available or developed. For example, in some examples, the encoding device 104 and/or the decoding device 112 may operate in accordance with a proprietary video codec/format (e.g., AV1, an extension of AV1, and/or a subsequent version of AV1 (e.g., AV 2)) or other proprietary format or industry standard. Thus, while the techniques and systems described herein may be described with reference to a particular video codec standard, one of ordinary skill in the art will understand that the description should not be construed as applicable to only that particular standard.
Referring to fig. 1, a video source 102 may provide video data to an encoding device 104. The video source 102 may be part of a source device or may be part of a device other than a source device. Video source 102 may include a video capture device (e.g., video camera, camera phone, video phone, etc.), a video archive containing stored video, a video server or content provider providing video data, a video feed interface receiving video from a video server or content provider, a computer graphics system for generating computer graphics video data, a combination of such sources, or any other suitable video source.
Video data from video source 102 may include one or more input pictures or frames. A picture or frame is a still image, in some cases a portion of a video. In some examples, the data from the video source 102 may be a still image that is not part of the video. In HEVC, VVC, and other video codec specifications, a video sequence may include a series of pictures. A picture may include three sample arrays, denoted SL, SCb, and SCr, respectively. SL is a two-dimensional array of luma samples, SCb is a two-dimensional array of Cb chroma samples, and SCr is a two-dimensional array of Cr chroma samples. Chroma samples may also be referred to herein as "chroma" samples. A pixel may refer to all three components (luminance and chrominance samples) of a given location in a picture array. In other cases, the picture may be monochromatic and may include only an array of luminance samples, in which case the terms pixel and sample may be used interchangeably. Regarding example techniques described herein involving a single sample for purposes of illustration, the same techniques may be applied to pixels (e.g., for all three sample components for a given location in a picture array). Regarding the example techniques described herein that relate to pixels (e.g., all three sample components for a given location in a picture array) for purposes of illustration, the same techniques may be applied to each sample.
The encoder engine 106 (or encoder) of the encoding device 104 encodes the video data to produce an encoded video bitstream. In some examples, the encoded video bitstream (or "video bitstream" or "bitstream") is a series of one or more decoded video sequences. A decoded video sequence (CVS) includes a series of Access Units (AUs) starting from an AU having random access point pictures and having certain attributes in the base layer until and excluding the next AU having random access point pictures and having certain attributes in the base layer. For example, some attributes of the random access point picture starting the CVS may include a RASL flag (e.g., noRaslOutputFlag) equal to 1. Otherwise, the random access point picture (RASL flag equal to 0) does not start CVS. An Access Unit (AU) includes one or more decoded pictures and control information corresponding to the decoded pictures sharing the same output time. The decoded slices of a picture are encapsulated in a data unit called a Network Abstraction Layer (NAL) unit in the bitstream level. For example, an HEVC video bitstream may include one or more CVSs that include NAL units. Each NAL unit has a NAL unit header. In one example, the header is one byte (except for multi-layer extensions) for h.264/AVC and two bytes for HEVC. The syntax elements in the NAL unit header take specified bits and are therefore visible to various systems and transport layers, such as transport streams, real-time transport (RTP) protocols, file formats, etc.
There are two types of NAL units in the HEVC standard, including Video Codec Layer (VCL) NAL units and non-VCL NAL units. The VCL NAL units include one slice or slice segment (described below) of the decoded picture data, and the non-VCL NAL units include control information related to one or more decoded pictures. In some cases, NAL units may be referred to as packets. HEVC AUs include VCL NAL units that include decoded picture data and non-VCL NAL units (if any) that correspond to the decoded picture data.
The NAL units may include a sequence of bits forming a decoded representation of video data (e.g., an encoded video bitstream, a CVS for the bitstream, etc.), such as a decoded representation of a picture in video. The encoder engine 106 generates a decoded representation of the pictures by dividing each picture into a plurality of slices. One slice is independent of the other slices such that the information in that slice is encoded and decoded independent of data from other slices within the same picture. A slice includes one or more slice bar segments, including an independent slice bar segment, and one or more dependent slice bar segments (if present) that depend on a previous slice bar segment. These slices are divided into Coded Tree Blocks (CTBs) of luma samples and chroma samples. CTBs of luma samples and one or more CTBs of chroma samples, together with the syntax of the samples, are referred to as Codec Tree Units (CTUs). CTUs may also be referred to as "treeblocks" or "largest codec units" (LCUs). CTU is the basic processing unit for HEVC coding. The CTU may be divided into a plurality of different sized Codec Units (CUs). A CU contains an array of luma and chroma samples called a Codec Block (CB).
The luminance and chrominance CBs may be further divided into Prediction Blocks (PB). PB is a block of samples of the luma or chroma component that uses the same motion parameters for inter prediction or intra copy prediction (when available or capable of use). The luma PB and the one or more chroma PB and associated syntax together form a Prediction Unit (PU). For inter prediction, a set of motion parameters (e.g., one or more motion vectors, reference indices, etc.) is signaled for each PU in the bitstream and used for inter prediction of luma PB and one or more chroma PB. The motion parameters may also be referred to as motion information. The CB may also be partitioned into one or more Transform Blocks (TBs). TB represents a square block of samples of a color component on which a residual transform (e.g., in some cases the same two-dimensional transform) is applied to codec a prediction residual signal. A Transform Unit (TU) represents the TBs of luma and chroma samples, as well as corresponding syntax elements.
The size of a CU corresponds to the size of the codec mode and may be square. For example, the size of a CU may be 8×8 samples, 16×16 samples, 32×32 samples, 64×64 samples, or any other suitable size up to the size of the corresponding CTU. The phrase "N x N" is used herein to refer to the pixel dimension (e.g., 8 pixels x 8 pixels) of a video block in both the vertical and horizontal dimensions. The pixels in a block may be arranged in rows and columns. In some examples, the number of pixels of the block in the horizontal direction may be different from the number of pixels in the vertical direction. Syntax data associated with a CU may describe, for example, partitioning the CU into one or more PUs. The partition modes may differ between whether a CU is intra-prediction mode coded or inter-prediction mode coded. PU may be divided into non-square shapes. Syntax data associated with a CU may also describe partitioning the CU into one or more TUs, e.g., according to CTUs. TUs may be square or non-square in shape.
According to the HEVC standard, a Transform Unit (TU) may be used for the transform. TUs may be different for different CUs. The TUs adjust the size of the TUs based on the sizes of PUs within a given CU. The TU may be the same size as the PU or smaller than the PU. In some examples, a quadtree structure called a Residual Quadtree (RQT) may be used to subdivide residual samples corresponding to a CU into smaller units. The leaf nodes of the RQT may correspond to TUs. The pixel differences associated with TUs may be transformed to produce transform coefficients. The transform coefficients may be quantized by the encoder engine 106.
Once a picture of video data is partitioned into CUs, encoder engine 106 predicts each PU using a prediction mode. The prediction unit or block is subtracted from the original video data to obtain a residual (as described below). For each CU, prediction modes may be signaled inside the bitstream using syntax data. The prediction modes may include intra prediction (or intra-picture prediction) or inter prediction (or inter-picture prediction). Intra prediction exploits the correlation between spatially neighboring samples within a picture. For example, each PU is predicted from neighboring image data in the same picture using intra prediction, an average value of the PU is found using, for example, DC prediction, a planar surface is adapted to the PU using planar prediction, a neighboring data is extrapolated using directional prediction, or any other suitable type of prediction is used. Inter prediction uses temporal correlation between pictures in order to derive motion-compensated predictions of blocks of image samples. For example, using inter prediction, each PU is predicted using motion compensated prediction from image data in one or more reference pictures (either before or after the current picture in output order). The decision whether to use inter-picture prediction or intra-picture prediction to codec a picture region may be made, for example, at the CU level.
The encoder engine 106 and the decoder engine 116 (described in more detail below) may be configured to operate according to VVC. According to VVC, a video codec, such as encoder engine 106 and/or decoder engine 116, divides a picture into a plurality of Codec Tree Units (CTUs) (where CTBs of luma samples and one or more CTBs of chroma samples are referred to as CTUs along with the syntax of the samples). The video codec may partition the CTUs according to a tree structure such as a quadtree-binary tree (QTBT) structure or a multi-type tree (MTT) structure. QTBT structures remove the concept of multiple partition types, such as distinguishing between CUs, PUs, and TUs for HEVC. The QTBT structure includes two levels, including a first level partitioned according to a quadtree partitioning and a second level partitioned according to a binary tree partitioning. The root node of the QTBT structure corresponds to the CTU. Leaf nodes of the binary tree correspond to coding and decoding units (CUs).
In the MTT partition structure, the blocks may be partitioned using a quadtree partition, a binary tree partition, and one or more types of trigeminal tree partitions. The trigeminal tree division is a division in which a block is divided into three sub-blocks. In some examples, the treelet partition divides the block into three sub-blocks instead of dividing the original block through the center. The partition types (e.g., quadtree, binary tree, and trigeminal tree) in the MTT may be symmetrical or asymmetrical.
When operating in accordance with the AV1 codec, the encoding device 104 and the decoding device 112 may be configured to encode and decode video data into blocks. In AV1, the largest codec block that can be processed is called a super block. In AV1, the super block may be 128×128 luminance samples or 64×64 luminance samples. However, in a subsequent video codec format (e.g., AV 2), the superblock may be defined by a different (e.g., larger) luma sample size. In some examples, the superblock is the top level of a block quadtree. The encoding device 104 may further divide the super block into smaller codec blocks. The encoding device 104 may partition super blocks and other codec blocks into smaller blocks using square or non-square partitions. Non-square blocks may include N/2xN, nxN/2, N/4xN, and NxN/4 blocks. The encoding device 104 and the decoding device 112 may perform separate prediction and transformation processes for each codec block.
AV1 also defines tiles (tiles) of video data. A tile is a rectangular array of super blocks that can be encoded independently of other tiles. That is, encoding device 104 and decoding device 112 may encode and decode, respectively, the codec blocks within a tile without using video data from other tiles. However, encoding device 104 and decoding device 112 may filter across tile boundaries. The tiles may be uniform in size or non-uniform in size. Tile-based codec may enable parallel processing and/or multithreading for encoder and decoder implementations.
In some examples, the encoding device 104 and decoding device 112 may use a single QTBT or MTT structure to represent each of the luma and chroma components, while in other examples, the video codec may use two or more QTBT or MTT structures, such as one QTBT or MTT structure for the luma component and another QTBT or MTT structure for the two chroma components (or two QTBT and/or MTT structures for the respective chroma components).
Encoding device 104 and decoding device 112 may be configured to use quadtree partitioning per HEVC, QTBT partitioning, MTT partitioning, or other partitioning structures.
In some examples, a slice type is assigned to one or more slices (slices) of a picture. Slice types include I slice, P slice, and B slice. An I-slice (intra, independently decodable) is a slice of a picture that is only encoded by intra prediction, and is therefore independently decodable, because the I-slice only requires intra data to predict any prediction unit or prediction block of the slice. P slices (unidirectional predicted frames) are slices of pictures that can be coded and decoded with intra prediction and unidirectional inter prediction. Each prediction unit or prediction block within a P slice is coded by intra prediction or inter prediction. When inter prediction is applied, the prediction unit or prediction block is predicted by only one reference picture, and thus the reference samples are from only one reference region of one frame. B slices (bi-predictive frames) are slices of pictures that can be encoded and decoded using intra prediction as well as using inter prediction (e.g., bi-prediction or uni-prediction). The prediction unit or prediction block of a B slice may bi-directionally predict from two reference pictures, where each picture contributes one reference region, and the sample sets of the two reference regions are weighted (e.g., with equal weights or with different weights) to produce a bi-directionally predicted block prediction signal. As described above, slices of one image are independently encoded and decoded. In some cases, one picture may be encoded as only one slice.
As described above, intra-picture prediction exploits correlation between spatially neighboring samples within a picture. There are multiple intra prediction modes (also referred to as "intra modes"). In some examples, the intra prediction of the luma block includes 35 modes including a plane mode, a DC mode, and 33 angle modes (e.g., a diagonal intra prediction mode and an angle mode adjacent to the diagonal intra prediction mode). The encoding device 104 and/or the decoding device 112 may select a prediction mode (e.g., based on a Sum of Absolute Errors (SAE), sum of Absolute Differences (SAD), sum of Absolute Transformed Differences (SATD), or other similarity metric) for each block that minimizes the residual between the predicted block and the block to be encoded. For example, SAE may be calculated by taking the absolute difference between each pixel (or sample) in the block to be encoded and the corresponding pixel (or sample) in the predicted block for comparison. The differences of the pixels (or samples) are summed to create a measure of block similarity, such as the L1 norm of the difference image, the manhattan distance between two image blocks, or other calculation. Using SAE as an example, SAE for each prediction using each intra prediction mode indicates the magnitude of the prediction error. The intra prediction mode having the best match with the actual current block is given by the intra prediction mode giving the smallest SAE.
The 35 modes of intra prediction are indexed as shown in table 1 below. In other examples, more intra modes may be defined, including predicted angles that may not have been represented by 33 angle modes. In other examples, the prediction angles associated with the angle mode may be different than those used in HEVC.
Intra prediction mode Association names
0 INTRA_PLANAR
1 INTRA_DC
2..34 INTRA_ANGULAR2..INTRA_ANGULAR34
TABLE 1 intra prediction modes and associated names Specification
For planar prediction of NxN blocks, for each sample p located at (x, y) xy The predicted sample values may be calculated by applying a bilinear filter to four specific neighboring reconstructed samples (used as reference samples for intra prediction). The four reference samples include an upper right reconstructed sample TR, a lower left reconstructed sample BL, and the same column (r x,-1 ) Sum row (r) -1,y ) Is a sample of the two reconstructed samples. The planar mode can be formulated as follows:
p xy =((N-x1)*·L+(N-y1)*·T+x1*·R+y1*·B)/(2*N),
where x1=x+1, y1=y+1, r=tr and b=bl.
For DC mode, the prediction block is filled with the average of neighboring reconstructed samples. In general, both planar and DC modes are used to model smoothly varying and constant image areas.
For angular intra prediction modes including 33 different prediction directions in HEVC, the intra prediction process may be described as follows. For each given angle of intra prediction mode, the intra prediction direction may be identified accordingly; for example, intra mode 18 corresponds to a pure horizontal prediction direction and intra mode 26 corresponds to a pure vertical prediction direction. The angular prediction mode is shown in the example diagram 200a of fig. 2A. In some codecs, a different number of intra prediction modes may be used. For example, in addition to planar and DC modes, 93 angular modes may be defined, with mode 2 indicating a predicted direction of-135, mode 34 indicating a predicted direction of-45, and mode 66 indicating a predicted direction of 45. In some codecs (e.g., VVCs), angles exceeding-135 ° (less than-135 °) and exceeding 45 ° (greater than 45 °) may also be defined; these may be referred to as wide angle intra modes. Although the description herein is with respect to intra-mode designs in HEVC (i.e., having 35 modes), the disclosed techniques may also be applied to more intra-frame modes (e.g., intra-modes defined by VVCs or other codecs).
The coordinates (x, y) of each sample of the prediction block are projected along a particular intra prediction direction (e.g., one of the angular intra prediction modes). For example, given a particular intra prediction direction, the coordinates (x, y) of samples of a prediction block are first projected in the intra prediction direction to a row/column of adjacent reconstructed samples. In case (x, y) is projected to a fractional position α between two adjacent reconstructed samples L and R; the predicted value of (x, y) can then be calculated using a double-tap bilinear interpolation filter, as follows:
p xy =(1-a)·L+a·R
to avoid floating point operations, in HEVC, the above computation may be approximated using integer arithmetic:
p xy =((32-a’)·L+a’·R+16)>>5,
wherein a' is an integer equal to 32 x a.
In some examples, prior to intra prediction, adjacent reference samples are filtered using a 2-tap bilinear or 3-tap (1, 2, 1)/4 filter, which may be referred to as intra reference smoothing or Mode Dependent Intra Smoothing (MDIS). When intra prediction is performed, given an intra prediction mode index (predModeIntra) and a block size (nTbS), it is determined whether to perform reference smoothing processing and which smoothing filter to use. The intra prediction mode index is an index indicating an intra prediction mode.
Inter-picture prediction uses temporal correlation between pictures in order to derive motion compensated predictions of blocks of image samples. Using a translational motion model, the position of a block in a previously decoded picture (reference picture) is indicated by a motion vector (Δx, Δy), where Δx specifies the horizontal displacement of the reference block relative to the position of the current block and Δy specifies the vertical displacement of the reference block relative to the position of the current block. In some cases, the motion vector (Δx, Δy) may be an integer sample precision (also referred to as an integer precision), in which case the motion vector points to an integer pixel grid (or integer pixel sampling grid) of the reference frame. In some cases, the motion vectors (Δx, Δy) may have fractional sampling precision (also referred to as fractional pixel precision or non-integer precision) to more accurately capture the motion of the underlying object, rather than being limited to an integer pixel grid of the reference frame. The accuracy of a motion vector may be represented by the quantization level of the motion vector. For example, the quantization level may be an integer precision (e.g., 1 pixel) or a fractional pixel precision (e.g., 1/4 pixel, 1/2 pixel, or other sub-pixel value). When the corresponding motion vector has fractional sample precision, the reference picture is interpolated to derive a prediction signal. For example, samples available at integer locations may be filtered (e.g., using one or more interpolation filters) to estimate values at fractional locations. The previously decoded reference picture is indicated by a reference index (refIdx) to the reference picture list. The motion vector and the reference index may be referred to as motion parameters. Two inter-picture predictions may be made, including unidirectional prediction and bi-directional prediction.
For inter prediction using bi-prediction (also referred to as bi-directional inter prediction), two sets of motion parameters (Δx 0 ,y 0 ,refIdx 0 And Deltax 1 ,y 1 ,refIdx 1 ) To generate two motion compensated predictions (from the same reference picture or possibly from different reference pictures). For example, for bi-prediction, two motion-compensated prediction signals are used per prediction block and a B prediction unit is generated.The two motion compensated predictions are combined to obtain the final motion compensated prediction. For example, two motion compensated predictions may be combined by averaging. In another example, weighted prediction may be used, in which case different weights may be applied to each motion compensated prediction. The reference pictures that can be used for bi-prediction are stored in two different lists denoted list 0 and list 1, respectively. Motion parameters may be derived at the encoding device 104 using a motion estimation process.
For inter prediction using unidirectional prediction (also referred to as unidirectional inter prediction), a set of motion parameters (Δx 0 ,y 0 ,refIdx 0 ) To generate motion compensated predictions from the reference picture. For example, for unidirectional prediction, at most one motion-compensated prediction signal is used per prediction block, and P prediction units are generated.
The PU may include data (e.g., motion parameters or other suitable data) related to the prediction process. For example, when encoding a PU using intra prediction, the PU may include data describing an intra prediction mode of the PU. As another example, when encoding a PU using inter prediction, the PU may include data defining a motion vector for the PU. The data defining the motion vector of the PU may describe, for example, a horizontal component (Δx) of the motion vector, a vertical component (Δy) of the motion vector, a resolution (e.g., integer precision, quarter-pixel precision, or eighth-pixel precision) of the motion vector, a reference picture to which the motion vector points, a reference index, a reference picture list (e.g., list 0, list 1, or list C) of the motion vector, or any combination thereof.
AV1 includes two general techniques for encoding and decoding codec blocks of video data. Two common techniques are intra-prediction (e.g., intra-prediction or spatial prediction) and inter-prediction (e.g., inter-prediction or temporal prediction). In the context of AV1, when a block of a current frame of video data is predicted using an intra prediction mode, the encoding apparatus 104 and the decoding apparatus 112 do not use video data from other frames of video data. For most intra-prediction modes, the video encoding device 104 encodes a block of the current frame based on the difference between the sample values in the current block and the predicted values generated from the reference samples in the same frame. The video encoding device 104 determines predicted values generated from the reference samples based on the intra-prediction mode.
After prediction using intra and/or inter prediction, the encoding device 104 may perform transformation and quantization. For example, after prediction, the encoder engine 106 may calculate residual values corresponding to the PU. The residual values may include pixel differences between a current block (PU) of pixels being encoded and a prediction block (e.g., a predicted version of the current block) used to predict the current block. For example, after generating a prediction block (e.g., issuing inter prediction or intra prediction), the encoder engine 106 may generate a residual block by subtracting the prediction block generated by the prediction unit from the current block. The residual block includes a set of pixel difference values that quantize differences between pixel values of the current block and pixel values of the prediction block. In some examples, the residual block may be represented in a two-dimensional block format (e.g., a two-dimensional matrix or array of pixel values). In this example, the residual block is a two-dimensional representation of pixel values.
Any residual data that can remain after prediction is transformed using a block transform, which may be based on a discrete cosine transform, a discrete sine transform, an integer transform, a wavelet transform, other suitable transform function, or any combination thereof. In some cases, one or more block transforms (e.g., sizes 32×32, 16×16, 8×8, 4×4, or other suitable sizes) may be applied to the residual data in each CU. In some examples, TUs may be used for the transform and quantization processes implemented by encoder engine 106. A given CU with one or more PUs may also include one or more TUs. As described in further detail below, residual values may be transformed into transform coefficients using a block transform, and may be quantized and scanned using TUs to produce serialized transform coefficients for entropy encoding and decoding.
In some examples, encoder engine 106 may calculate residual data for TUs of a CU after intra-predictive or inter-predictive coding using PUs of the CU. The PU may include pixel data in a spatial domain (or pixel domain). The TUs may include coefficients in the transform domain after applying the block transform. As previously described, the residual data may correspond to pixel differences between pixels of the non-coded picture and the prediction value corresponding to the PU. The encoder engine 106 may form TUs that include residual data for the CU, and may transform the TUs to generate transform coefficients for the CU.
The encoder engine 106 may perform quantization of the transform coefficients. Quantization reduces the amount of data representing the coefficients by quantizing the transform coefficients to provide further compression. For example, quantization may reduce the bit depth associated with some or all of these coefficients. In one example, coefficients having n-bit values may be rounded down to m-bit values during quantization, where n is greater than m.
Once quantized, the decoded video bitstream includes quantized transform coefficients, prediction information (e.g., prediction modes, motion vectors, block vectors, etc.), partition information, and any other suitable data (e.g., other syntax data). The different elements of the encoded video bitstream may be entropy encoded by the encoder engine 106. In some examples, the encoder engine 106 may scan the quantized transform coefficients using a predefined scan order to produce a serialized vector that can be entropy encoded. In some examples, the encoder engine 106 may perform adaptive scanning. After scanning the quantized transform coefficients to form a vector (e.g., a one-dimensional vector), the encoder engine 106 may entropy encode the vector. For example, the encoder engine 106 may use context adaptive variable length coding, context adaptive binary arithmetic coding, syntax-based context adaptive binary arithmetic coding, probability interval partitioning entropy coding, or other suitable entropy coding techniques.
The output 110 of the encoding device 104 may send out NAL units that make up the encoded video bitstream data over a communication link 120 to a decoding device 112 of a receiving device. An input 114 of the decoding device 112 may receive the NAL unit. The communication link 120 may include channels provided by a wireless network, a wired network, or a combination of wired and wireless networks.The wireless network may include any wireless interface or combination of wireless interfaces, and may include any suitable wireless network (e.g., the Internet or other wide area network, packet-based network, wiFi) TM Radio Frequency (RF), UWB, wiFi-Direct, cellular, long Term Evolution (LTE), wiMax TM Etc.). The wired network may include any wired interface (e.g., fiber optic, ethernet, power line ethernet, ethernet over coaxial cable, digital Signal Line (DSL), etc.). The wired network and/or the wireless network may be implemented using various equipment, such as base stations, routers, access points, bridges, gateways, switches, and the like. The encoded video bitstream data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to a receiving device.
In some examples, the encoding device 104 may store the encoded video bitstream data in the storage 108. The output 110 may retrieve encoded video bitstream data from the encoder engine 106 or from the storage 108. Storage 108 may comprise any of a variety of distributed or locally accessed data storage media. For example, storage 108 may include a hard disk drive, a storage disk, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. The storage 108 may also include a Decoded Picture Buffer (DPB) for storing reference pictures for inter prediction. In further examples, the storage 108 may correspond to a file server or another intermediate storage device that may store encoded video generated by the source device. In this case, the receiving device including the decoding device 112 may access the stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to a receiving device. Example file servers include web servers (e.g., for websites), FTP servers, network Attached Storage (NAS) devices, or local disk drives. The receiving device may access the encoded video data through any standard data connection, including an internet connection, and may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing the encoded video data stored on the file server. The transmission of the encoded video data from the storage 108 may be a streaming transmission, a download transmission, or a combination thereof.
An input 114 of the decoding apparatus 112 receives the encoded video bitstream data and may provide the video bitstream data to a decoder engine 116, or to a storage 118 for later use by the decoder engine 116. For example, the storage 118 may include a DPB for storing reference pictures for inter prediction. A receiving device comprising a decoding device 112 may receive encoded video data to be decoded via the storage 108. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to a receiving device. The communication medium for transmitting the encoded video data may comprise any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device that may be used to facilitate communications from a source device to a receiving device.
The decoder engine 116 may decode the encoded video bitstream data by entropy decoding and extracting (e.g., using an entropy decoder) elements of one or more decoded video sequences that make up the encoded video data. The decoder engine 116 may rescale and inverse transform the encoded video bitstream data. The residual data is passed to the prediction stage of the decoder engine 116. The decoder engine 116 predicts a block of pixels (e.g., a PU). In some examples, the prediction is added to the output of the inverse transform (residual data).
The decoding device 112 may output the decoded video to a video destination device 122, which may include a display or other output device for displaying the decoded video data to a consumer of the content. In some aspects, video destination device 122 may be part of a receiving device that includes decoding device 112. In some aspects, video destination device 122 may be part of a separate device other than the receiving device.
In some examples, video encoding device 104 and/or video decoding device 112 may be integrated with an audio encoding device and an audio decoding device, respectively. The video encoding device 104 and/or the video decoding device 112 may also include other hardware or software necessary to implement the above-described codec techniques, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. The video encoding device 104 and the video decoding device 112 may be integrated as part of a combined encoder/decoder (codec) in the respective devices. An example of specific details of the encoding device 104 is described below with reference to fig. 8. An example of specific details of the decoding apparatus 112 is described below with reference to fig. 9.
The example system shown in fig. 1 is one illustrative example that may be used herein. The techniques for processing video data using the techniques described herein may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are generally performed by a video encoding device or a video decoding device, the techniques may also be performed by a combined video encoder-decoder (commonly referred to as a "CODEC"). Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. The source device and the sink device are merely examples of such codec devices in which the source device generates decoded video data for transmission to the sink device. In some examples, the source device and the sink device may operate in a substantially symmetrical manner such that each device includes video encoding and decoding components. Thus, example systems may support unidirectional or bidirectional video transmission between video devices, e.g., for video streaming, video playback, video broadcasting, or video telephony.
Extensions to the HEVC standard include multi-view video codec extensions known as MV-HEVC and scalable video codec extensions known as SHVC. MV-HEVC and SHVC extensions share the concept of layered coding, where different layers are included in the encoded video bitstream. Each layer in the decoded video sequence is addressed by a unique layer Identifier (ID). A layer ID may be present in a header of a NAL unit to identify a layer associated with the NAL unit. In MV-HEVC, different layers may represent different views of the same scene in a video bitstream. In SHVC, different scalability layers are provided that represent a video bitstream at different spatial resolutions (or picture resolutions) or different reconstruction fidelity. The scalable layers may include a base layer (layer id=0) and one or more enhancement layers (layer id=1, 2, … n). The base layer may conform to a configuration file of the first version of HEVC and represent the lowest available layer in the bitstream. The enhancement layer has an increased spatial resolution, temporal resolution or frame rate, and/or reconstruction fidelity (or quality) as compared to the base layer. Enhancement layers are organized hierarchically and may (or may not) depend on lower layers. In some examples, different layers may be encoded using a single standard codec (e.g., encoding all layers using HEVC, SHVC, or other codec standards). In some examples, different layers may be encoded using a multi-standard codec. For example, AVC may be used to codec the base layer, while SHVC and/or MV-HEVC extensions to the HEVC standard may be used to codec one or more enhancement layers.
Typically, a layer includes a set of VCL NAL units and a corresponding set of non-VCL NAL units. NAL units are assigned specific layer ID values. Layers may be hierarchical in the sense that layers may depend on lower layers. Layer set refers to a self-contained layer set represented within a bitstream, meaning that layers within a layer set may depend on other layers in the layer set in the decoding process, but not on any other layers used for decoding. Thus, the layers in the layer set may form independent bitstreams that may represent video content. The set of layers in the layer set may be obtained from another bitstream through an operation of the sub-bitstream extraction process. The layer set may correspond to a set of layers that the decoder wants to decode when operating according to certain parameters.
As previously described, an HEVC bitstream includes a NAL unit group that includes VCL NAL units and non-VCL NAL units. The VCL NAL units include decoded picture data that forms a decoded video bitstream. For example, a sequence of bits forming a decoded video bitstream is present in a VCL NAL unit. The non-VCL NAL units may contain, among other information, a set of parameters with high level information related to the encoded video bitstream. For example, the parameter sets may include a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), and a Picture Parameter Set (PPS). Examples of targets for parameter sets include bit rate efficiency, error resilience, and provision of a system layer interface. Each slice references a single active PPS, SPS, and VPS to access information that the decoding device 112 may use to decode the slice. An Identifier (ID) may be encoded for each parameter set, including a VPS ID, an SPS ID, and a PPS ID. The SPS includes an SPS ID and a VPS ID. PPS includes PPS ID and SPS ID. Each slice header includes a PPS ID. Using the ID, a set of activity parameters may be identified for a given slice.
PPS includes information applied to all slices in a given picture. In some examples, all slices in a picture refer to the same PPS. Slices in different pictures may also refer to the same PPS. SPS includes information that applies to all pictures in the same decoded video sequence (CVS) or bitstream. As previously described, the decoded video sequence is a series of Access Units (AUs) starting with a random access point picture in the base layer (e.g., an Instantaneous Decoding Reference (IDR) picture or an interrupt link access (BLA) picture, or other suitable random access point picture) and having a particular attribute (as described above) up to and including no next AU having a random access point picture in the base layer and having a particular attribute (or end of bitstream). The information in the SPS may not change with pictures within the decoded video sequence. Pictures in a decoded video sequence may use the same SPS. The VPS includes information that applies to all layers within the decoded video sequence or bitstream. The VPS includes a syntax structure with syntax elements applied to the entire decoded video sequence. In some embodiments, a VPS, SPS, or PPS may be sent in-band with the encoded bitstream. In some embodiments, the VPS, SPS, or PPS may be sent out-of-band in a separate transmission from the NAL unit containing the decoded video data.
The present disclosure may generally relate to "signaling" certain information, such as syntax elements. The term "signaling" may generally refer to communication of values for syntax elements and/or other data used to decode encoded video data. For example, the video encoding device 104 may signal the value of the syntax element in the bitstream. In general, signaling refers to generating values in a bitstream. As described above, the source device 102 may convey the bitstream to the destination device 116 in substantially real-time, or to the destination device 116 in non-real-time, such as may occur when storing syntax elements to the storage 112 for later retrieval by the destination device 116.
The video bitstream may also include Supplemental Enhancement Information (SEI) messages. For example, the SEI NAL unit may be part of a video bitstream. In some cases, the SEI message may contain information that is not needed by the decoding process. For example, the information in the SEI message may not be necessary for the decoder to decode video pictures of the bitstream, but the decoder may use this information to improve the display or processing of the pictures (e.g., decoded output). The information in the SEI message may be embedded metadata. In one illustrative example, the decoder-side entity may use information in the SEI message to improve the visibility of the content. In some cases, certain application standards may require the presence of such SEI messages in the bitstream so that quality improvements may be brought to all devices conforming to the application standard (e.g., a carrier of frame packing SEI messages for frame compatible planar stereoscopic 3DTV video formats, where the SEI messages are carried for each frame of video), processing of recovery point SEI messages, use of pan-scan (pan-scan) scan rectangle SEI messages in DVB, and many other examples).
As described above, the encoding device 104 may encode one or more blocks or rectangular regions of pictures of the original video sequence by using intra prediction and/or intra prediction to remove spatial redundancy. The decoding device 112 may decode the encoded block by using the same intra-prediction mode as used by the encoding device 104. Intra-prediction modes describe different variations or methods for calculating pixel values in the region being encoded based on reference pixel values. In the VVC standard, one or more smoothing filters and interpolation filters may be selected based on the intra prediction mode and then applied to the reference pixels and/or intra prediction of the current block. In this method the same choice between smoothing and interpolation filters for intra prediction is applied for all block sizes, e.g. a fixed degree of smoothing is applied for all possible block sizes. Different directional intra-prediction modes are provided in the VVC standard.
Fig. 2B shows an example diagram 200B of directional intra-prediction modes (also referred to as "angular intra-prediction modes") in VVC. In some examples, the planar and DC modes remain the same in VVC as in HEVC. As shown, the intra-prediction modes with even indices between 2 and 66 may be equivalent to 33 HEVC intra-prediction modes, with the remaining intra-prediction modes of fig. 2B representing newly added intra-prediction modes in VVC. As an illustrative example, to better capture any edge direction presented in natural video, the number of directional intra-prediction modes in VTM5 (VVC test model 5) increases from 33 HEVC directions to 93 directions in total. The intra prediction mode is described in more detail in "Versatile Video Coding (Draft 10) of B.Bross, J.Chen, S.Liu (multifunctional video codec (Draft 10))," 19th JVET Meeting,Teleconference,Jul.2020,JVET-S2001, which is incorporated by reference herein in its entirety and for all purposes. In some examples, denser directional intra-prediction modes introduced in the VVC standard may be applied to all block sizes as well as both luma and chroma intra-predictions. In some cases, these directional intra-prediction modes may be used in combination with Multiple Reference Lines (MRLs) and/or with intra-subframe partition modes (ISPs). Further details are described in "Algorithm description for Versatile Video Coding and Test Model (VTM 10) of J.Chen, Y.Ye, S.Kim (algorithmic description for multifunctional video codec and test model 10 (VTM 10)), and" 19th JVET Meeting,Teleconference,Jul.2020,JVET-S2002 (which is incorporated herein by reference in its entirety and for all purposes).
In some examples, mode-dependent intra smoothing (MDIS) may be utilized to smooth the intra-prediction signal by applying a smoothing filter and/or a smoothing type based on the intra-prediction mode of the current decoded block. Fig. 3 is a flow chart illustrating an example of an MDIS process 300 that may be used for intra prediction. In an illustrative example, the example MDIS process of fig. 3 may be the same as the VVC standard MDIS process. The example MDIS process 300 may be used to select a particular interpolation filter and/or a particular smoothing filter for intra prediction of a current decoded block. As will be explained in greater depth below, in some examples, the selection of interpolation and/or smoothing filters may be based at least in part on the intra-prediction mode of the current decoded block.
The example MDIS process 300 may begin at operation 302 by determining whether an intra-prediction mode of a current decoded block is a horizontal intra-prediction mode or a vertical intra-prediction mode. Referring to the directional intra-prediction mode shown in fig. 2B, the horizontal intra-prediction mode is indicated as mode 18, and the vertical intra-prediction mode is indicated as mode 50. In response to determining whether the intra-prediction mode is a horizontal mode or a vertical mode at operation 302 (e.g., a "yes" output of 302), the example MDIS process may proceed to operation 304. As shown, operation 304 ends the MDIS process without reference pixel smoothing or applying interpolation filters. In some examples, smoothing or interpolation may not be performed for horizontal and vertical intra prediction modes, as reference pixel values for both modes may be directly copied when determining the predicted pixel values of the current block.
If the intra-prediction mode is not a horizontal or vertical mode (e.g., a "no" output of operation 302), the example MDIS process may proceed to determine whether the current block requires smoothing. As shown, it may be determined at operation 306 whether smoothing should be performed for the current block based at least in part on the intra prediction mode of the current block. For example, the minimum distance minDistVerHor may be calculated using an intra prediction mode, e.g., where minDistVerHor is the minimum of { |intra prediction mode number-vertical intra prediction mode number|, |intra prediction mode number-horizontal intra prediction mode number|. The minimum distance minDistVerHor may also be referred to as a minimum angular offset and/or a minimum angular distance. In an illustrative example, the vertical intra prediction mode number may be 50 and the horizontal intra prediction mode number may be 18. Thus, if the intra prediction mode number of the current block is 30, the minimum angular offset may be calculated as min { |30-50|, |30-18| } =min {20,12} =12.
In operation 306, the minimum angular offset minDistVerHor may then be compared to a threshold value interhorverdistthres nTbS, which in some examples may be a predetermined threshold value given by the VVC standard, e.g., determined by providing the current transform block size nTbS as an index to a lookup function or lookup table interhorverdistthres. As shown in FIG. 3, if the minimum angular offset minDistVerhoris is not greater than the threshold introHorVerDistThres [ nTbS ], then operation 306 may determine that the current block does not require smoothing, e.g., a "NO" output of 306.
If smoothing is not required, the example MDIS process may then proceed from operation 306 to operation 307, which is shown as applying an interpolation filter without any reference pixel smoothing. In some examples, the interpolation filter applied by operation 307 may be a cubic interpolation filter, such as the 4 tap (6 bit) cubic interpolation filter shown in fig. 3. Because operation 306 determines that direct reference pixel smoothing is not required, operation 307 may apply only a 4-tap cubic interpolation filter—for example, reference pixel smoothing is not performed because a minimum angular offset of the intra-prediction mode within a threshold distance from the horizontal mode or the vertical mode is indicated in operation 306.
If operation 306 determines that the minimum angular offset minDistVerhoris is greater than the threshold interHorVerDistThres [ nTbS ], operation 306 may determine that the current block needs smoothing, e.g., a "Yes" output. In response to a determination that smoothing is required, the intra-prediction mode for the current block may be further analyzed in a subsequent operation 308.
In some examples, operation 308 may analyze the intra prediction mode for the current block to determine whether it is an integer-sloped intra prediction mode or a fractional-sloped intra prediction mode (also referred to as "integer-angle mode" and "fractional-angle mode", respectively). As mentioned previously, the integer angle mode is associated with a particular integer value reference pixel position of the current block, while the fractional angle mode is not associated therewith. Instead, the fractional angle mode is associated with some intermediate (e.g., fractional) position between adjacent integer value reference pixel positions.
Based on operation 308, it is determined that the intra prediction mode for the current block is an integer angle mode (e.g., a "yes" output of 308), then operation 308 may proceed to operation 309. As shown, operation 309 may perform reference pixel smoothing without interpolation, for example, because in some cases it is determined that interpolation is not required for the integer angle mode. For example, since the integer-angle intra prediction mode can directly utilize the reference pixel value, only reference pixel smoothing is performed. In some examples, reference pixel smoothing of operation 309 may be performed by applying a low pass filter (such as a [1 2 1] filter) that calculates an average of the sum of two times the reference pixel value plus the values of the immediately left and right (or top and bottom) reference pixel locations.
Based on operation 308, it is determined that the intra-prediction mode of the current block is a fractional angle mode (e.g., a non-integer angle mode; a "no" output of 308), then in some cases, a subsequent operation 310 may calculate an interpolated value for the fractional reference pixel location associated with the intra-prediction mode. For example, operation 310 may calculate an interpolated fractional reference pixel position value, which may be calculated based on one or more reference pixel values obtained from one or more adjacent integer value reference pixel positions. Recall that the previous determination in operation 306 that smoothing should be performed for intra-prediction of the current block (e.g., because operation 306 determines the minimum angular offset minDistVerHor > threshold interhorverdistthres nTbS), the no output of operation 308 may correspond to a situation in which both smoothing and interpolation are applied to the current block.
In some examples and as illustrated in fig. 3, the smoothing and interpolation operations may be performed in a single combined step (e.g., by applying a smoothing interpolation filter). In an illustrative example, the smooth interpolation filter may be set to a gaussian interpolation filter that simultaneously smoothes the generated intra prediction signal and interpolates the fractional reference pixel position values. A smooth interpolation filter, such as the gaussian smooth interpolation filter described previously, may apply smoothing without direct reference pixel smoothing. In some examples, the smooth interpolation filter may include a 4 tap (6 bit) gaussian interpolation filter, as shown in operation 310.
Note that in the context of the example MDIS process 300 of fig. 3, the MDIS process (and VVC standard) does not use variable smoothness based on block size or other characteristics. In some examples, the systems and techniques described herein may provide variable smoothness and/or interpolation based at least in part on factors including, but not limited to, intra-prediction mode of the current block, size of the current block, width of the current block, height of the current block, and so forth.
In some cases, video coding techniques may include using a directional intra-prediction mode with one or more of a main reference line extension (MRL) and/or an intra-subframe partition mode (ISP) for intra-prediction. In an illustrative example, intra prediction may include extending a line of main reference pixels using one or more side reference pixels for intra prediction.
Fig. 4 illustrates an example diagram 400 of a reference line extension using one or more side reference pixels. Depicted for the current decoded block 405 is the upper line of reference pixels 410, which includes a series of calculated reference line extension pixels 420. A left reference pixel set 430 is also shown. For intra-prediction of vertical modes (e.g., intra-prediction mode ≡34, not confused with a particular vertical intra-prediction mode 50), the upper line of reference pixels 410 may be extended with one or more pixels from the left side reference pixels 430 of the current decoded block 405, e.g., by generating or otherwise calculating values of reference line extension pixels 420. The length of the upper line of reference pixels 410 may be extended using the calculation of reference line extension pixels 420 to extend beyond the leftmost edge of current block 405, as illustrated in fig. 4.
In the current VVC standard, the upper line of reference pixel 410 may be extended by identifying the nearest neighbor in left reference pixel 430, wherein the value of the identified nearest neighbor is set equal to the value of at least one of reference line extension pixels 420. In the illustrative example, fig. 4 depicts a point P (e.g., indicated at 423) in a reference line extension pixel 420 located on the upper line of reference pixel 410. The upper reference pixel line 410 is extended based on the left reference pixel 430. In the current VVC standard, the reference line extension process is performed by determining which of the left reference pixels 430 is the nearest neighbor of the extended reference line pixel P/423, and then setting the value of the extended reference line pixel P equal to the value of the identified nearest neighbor in the left reference pixel 430. In the illustration of fig. 4, the nearest neighbor in the column of left reference pixel 430 is denoted as X1, and thus the pixel value at the X1 position is used to create an extended reference line pixel P (e.g., 423). The upper line 410 of the reference pixel may be extended to a desired length using this method, and then intra prediction is performed using an extended reference line formed by the upper line 410 of the original reference pixel and the reference line extension pixel 420. In some examples, a similar process may also be applied to intra-prediction of horizontal modes (e.g., intra-prediction mode > = 34, not to be confused with a particular horizontal intra-prediction mode 18), where the values of the identified nearest neighbor pixels in the upper reference line are projected to extend the left reference pixel line.
Various improvements to the VVC intra prediction process have been proposed in jfet-D0119, which is described in "Six tap intra interpolation filter (six tap interpolation filter)" of X.Zhao, V.Seregin, M.Karczewicz, 4th JVET Meeting,Chengdu,CN,Oct.2016,JVET-D0119 (which is incorporated herein by reference in its entirety and for all purposes). For example, jfet-D0119 proposes to improve the intra prediction process by introducing two methods: (1) The example MDIS process of fig. 3 is performed using 6 tap (8 bit) cubic interpolation instead of 4 tap (6 bit) cubic interpolation as described above; and (2) use the same 4-tap (6-bit) cubic interpolation (again, as described above with respect to the example MDIS process of fig. 3) to perform the example reference line extension described with respect to fig. 4 instead of projecting the nearest neighbor pixel values.
As previously described, in some examples, larger block sizes may benefit from applying higher smoothness during intra prediction. However, VVC uses fixed smoothness (e.g., 4-tap gaussian interpolation or [1 2 1] filtering) for all block sizes, which may result in inefficient or less efficient intra prediction from the above observations. With respect to jfet-D0119 discussed above, using 4-tap cubic interpolation to extend one or more lines of reference pixels (e.g., upper and/or left reference pixel lines) can be problematic because when intra-prediction is performed using the extended portion(s) of the extended reference line, it can result in overcomplification, introducing inaccuracy and/or inefficiency into the overall intra-prediction process.
For example, over-smoothing may occur in this case because the extended pixels of the extended reference line are subjected to at least two different interpolation operations, each of which introduces a degree of smoothing and edge degradation. The first interpolation operation is 4-tap cubic interpolation to determine extended upper/left reference pixel line values based on neighboring values nearest to the left/upper reference pixel, respectively. The interpolated reference pixel values of the extended reference pixel line may then be included in a second interpolation operation during intra prediction for the current block, such as the interpolation operation described with respect to the exemplary MDIS process of fig. 3. For example, interpolated reference pixel values of the extended reference pixel line may be used in one or more of 4-tap cubic interpolation, 4-tap gaussian smoothing interpolation, and/or low-pass [ 12 1] reference pixel smoothing, each of which may result in over-smoothing throughout the intra prediction process.
As previously described, systems and techniques for intra prediction using one or more enhanced interpolation filters are described herein. The systems and techniques may be performed by the encoding device 104, the decoding device 112, by both the encoding device 104 and the decoding device 112, and/or by other devices. The aspects described herein may be applied independently and/or in combination. In some examples, the systems and techniques described herein may be used to perform one or more intra-prediction modes (e.g., for filtering during or with application of intra-prediction modes).
In some examples, the systems and techniques described herein may provide a variable degree of reference pixel smoothing with block level switching. For example, a plurality of smoothing filters and/or gaussian interpolation filters (also referred to as "gaussian smoothing interpolation filters"), each having different smoothness, may be used to smooth the reference pixels during interpolation. In some cases, the selection of the determined smoothing filter and/or the determined interpolation filter may be explicitly signaled at different codec levels (e.g., per prediction block, per codec block, per CTU, per slice, and/or at a sequence (e.g., in SPS) level). In some examples, the determined selection of smoothing and/or interpolation filters may be implicitly determined using decoded information, which may include, but is not limited to, block size, prediction mode, QP, and/or CU level mode flags (MRL, ISP, etc.), in which case explicit signaling of filter selection is not required. For example, in some examples, encoding device 104 and/or decoding device 112 may implicitly determine or select a smoothing filter and/or interpolation filter for use in intra prediction based on a determination that the current decoded block has a particular size, has a width and/or height greater than a threshold, has a width and/or height less than a threshold, and so forth.
In one illustrative example, the processing of the fractional angle (e.g., non-integer angle) intra prediction mode may be extended from the method described in the VVC standard to include selecting between applying a first gaussian-smooth interpolation filter of higher smoothness and applying at least a second gaussian-smooth interpolation filter of lower smoothness. As previously discussed with respect to fig. 3, the method used by the VVC standard uses the same 4-tap gaussian smoothing interpolation filter for all fractional-angle intra prediction modes, regardless of the size of the current decoded block.
Fig. 5 is an exemplary diagram illustrating an example of a process 500 for switchable smoothing and/or interpolation to apply a variable degree of intra-prediction smoothing based at least on an intra-prediction mode of a current block and a size of the current block. In the context of the example discussed immediately above, the presently disclosed systems and techniques for intra prediction using an enhanced interpolation filter may include: for the fractional angle intra prediction mode, a selection is made between a first filter comprising a 6-tap gaussian smoothing interpolation filter and a second filter comprising a 4-tap gaussian smoothing interpolation filter. The 6-tap gaussian smoothing interpolation filter may apply a higher smoothness than the 4-tap gaussian smoothing interpolation filter. In some examples, the 4-tap gaussian smoothing interpolation filter of fig. 5 may be the same as or similar to the 4-tap gaussian smoothing interpolation filter described with respect to the example VVC MDIS process 300 of fig. 3. In some examples, the filtering, interpolation, and/or smoothness selection process may be implicit depending on the block size of the current decoded block, as seen in fig. 5.
In some examples, the variable smoothness filtering and interpolation process for reference pixels with block-level switching illustrated in fig. 5 may be the same as or similar to the example VVC MDIS process of fig. 3, except for operations 510 (e.g., comparing one or more of the width of the current decoded block and the height of the current decoded block to at least a first threshold T) and subsequent operations 512 (e.g., selecting and applying a 6-tap gaussian smoothing interpolation filter with relatively higher smoothness in response to the first threshold T being exceeded) and 514 (e.g., selecting and applying a 4-tap gaussian smoothing interpolation filter with relatively lower smoothness in response to the first threshold T not being exceeded).
At operation 502, the process may determine whether the intra-prediction mode of the current decoded block is a horizontal intra-prediction mode (e.g., mode 18) or a vertical intra-prediction mode (e.g., mode 50). If the intra prediction mode is either a horizontal mode or a vertical mode, the process determines at block 504 that reference pixel smoothing (referred to as "reference pixel (ref pel) smoothing" in FIG. 5) and interpolation filtering is not performed, as previously described with respect to the example MDIS process of FIG. 3. The process may then continue to process the current decoded block and intra-prediction without applying reference pixel smoothing or interpolation filtering.
At operation 506, the process may determine whether the minimum angular offset minDistVerHor is greater than a threshold introhorverdistthres [ nTbS ]. In some cases, one or more of minDistVerHor and/or interhorverdistthres nTbS may be the same as or similar to the corresponding variable values discussed above with respect to the example MDIS process of fig. 3. In an illustrative example, the angle offset variable minDistVerHor may be set equal to Min (Abs (predModeIntra-50), abs (predModeIntra-18)), where predModeIntra indicates an intra prediction mode number, 50 is a vertical intra prediction mode number, and 18 is a horizontal intra prediction mode number. In some cases, predModeintra may be set equal to ntraPredModeY [ xCb ] [ yCb ] or IntraPredModeC [ xCb ] [ yCb ]. In some examples, for different values of the current decoded transform block size nTbS, the threshold variable intraHorVerDistThres [ nTbS ] may be given as specified in table 2 below:
TABLE 2 Specification of threshold variable intraHorVerDistThres [ nTbS ] for various transform block sizes nTbS
In some examples, if operation 506 determines that the angular offset minDistVerHor is not greater than the value of the threshold variable introhorverdistthres [ nTbs ] (e.g., minDistVerHor +. Introhorverdistthres [ nTbs ]), then the process may determine not to perform reference pixel smoothing at operation 507 and may further determine to apply a 4-tap cubic interpolation filter to the intra prediction of the current decoded block. For example, the process may apply a 4-tap cubic filter to predict or interpolate one or more reference pixels without any reference pixel smoothing.
In the event that operation 506 determines that the angular offset minDistVerHor is greater than the threshold interhorverdistthres nTbS, (e.g., (minDistVerHor > interhorverdistthres nTbS)), the process may then determine whether an integer angle mode exists in the intra-prediction mode of the current decoded block at operation 508, as previously described with respect to the example MDIS process of fig. 3.
In one example, when operation 508 determines that an integer angle mode exists in the intra prediction mode of the current decoded block, then the process may determine to use a low pass filter for reference pixel smoothing without interpolation filtering at operation 509. Then, after reference pixel smoothing using the [1 2 1] filter to smooth the reference pixels, the process may terminate at operation 509. No interpolation is performed and the smoothed reference pixels are directly copied for intra prediction of the current decoded block.
In one example, when operation 508 determines that a fractional (e.g., non-integer) angular mode exists in the intra-prediction modes of the currently decoded block, the process may proceed to operation 510, which may determine whether the width of the block is greater than or equal to a threshold T and/or whether the height of the block is greater than or equal to a threshold T. In some examples, operation 510 may include determining which of the width of the block and the height of the block is greater than or equal to threshold T. In some examples, the value of the threshold T may be a predetermined value, such as 16, 32, 64, or one or more other predetermined values.
In the event that the width of the block and the height of the block are determined to be greater than or equal to a threshold T (e.g., height T. Gtoreq. T. Andwidth T. Gtoreq.), at operation 510, the process may then determine that reference pixel smoothing is not performed at operation 512 and terminate by applying a 6-tap Gaussian smoothing interpolation filter to the intra prediction of the currently decoded block. For example, the process may apply a 6-tap gaussian smoothing interpolation filter to predict one or more pixels of the current block without any reference pixel smoothing.
In the event that the width of the block or the height of the block is not greater than or equal to a threshold T (e.g., height < T and/or width < T), the process may determine that reference pixel smoothing is not performed at operation 514 and terminate by applying a 4-tap gaussian smoothing interpolation filter. For example, the process may apply a 4-tap (6-bit) gaussian smoothing interpolation filter to predict one or more pixels of the current decoded block without any reference pixel smoothing. As previously described, the 4-tap gaussian smoothing interpolation filter of operation 514 may apply less smoothness than the 6-tap gaussian smoothing interpolation filter of operation 512, e.g., because operation 514 is triggered in response to operation 510 determining that the current decoded block has a relatively small block size. Similarly, recall that the 6-tap gaussian smoothing interpolation filter applies greater smoothness and that a larger block size may benefit from greater smoothness than a smaller block size, the 6-tap gaussian smoothing interpolation filter of operation 512 may be triggered in part in response to operation 510 determining that the current decoded block has a relatively large block size.
In some cases, the convolution of the [ 14 6 4 1] low-pass filter with one or more different phases of the bilinear filter may be used to derive the example 6-tap gaussian smoothing interpolation filter applied in operation 514.
In one illustrative example, such as for the case where operation 508 determines that the intra-prediction mode of the current decoded block is an integer angle mode, operation 509 depicted in fig. 5 may be extended to include selecting between a larger tap smoothing filter (e.g., [ 14 6 4 1] low-pass filter, not shown) and a smaller [1 2 1] low-pass filter currently depicted as being applied in association with operation 509. In some examples, the selection criteria for selecting between the larger tap [ 14 6 4 1] filter and the smaller tap [1 2 1] filter may be performed in the same or similar manner as the selection criteria implemented in operation 510. For example, one or more of the width of the current decoded block and the height of the current decoded block may be compared to at least one threshold, where a larger block (e.g., determined to be greater than or equal to the threshold) has a larger tap [ 14 6 4 1] filter applied to intra prediction and a smaller block (e.g., determined to be less than the threshold) has a smaller tap [1 2 1] filter applied to intra prediction. In some cases, in examples where integer angle reference pixel smoothing of operation 509 is extended to select between different tap filters and/or smoothness based on the current decoded block size, one or more of the same or similar explicit and/or explicit selection processes based on factors such as block size described with respect to operation 510 may be used.
In some examples, the systems and techniques described herein may perform weak filtering interpolation for reference line extensions, e.g., to avoid or minimize the overcomplete problem discussed above that may occur when the reference line extensions are based on 4-tap cubic interpolation and then undergo another interpolation during intra prediction. For example, instead of using 4-tap cubic filtering to interpolate the values of the reference line extension pixels (e.g., interpolation based on nearest neighbor pixel values of the vertical pixel reference), weaker filter-based interpolation may be used to reduce or mitigate possible overcomplete problems that may otherwise occur in the context of the extension reference line. By utilizing weaker interpolation to determine the values of the reference line extension pixels, the remaining intra prediction process, and its associated interpolation and smoothing operations described herein, can remain the same without causing the above-described overcomplete problem.
In one illustrative example, a 4 tap sinc-based interpolation (e.g., with appropriate windowing) may be used to provide a weak interpolation for computing interpolated values for reference line extension pixels. In some examples, the 4-tap sinc-based interpolation may be weaker than a cubic interpolation, such as a 4-tap cubic interpolation (e.g., with a higher cut-off frequency). In an illustrative example, the weak interpolation of the reference line extension pixels may be provided as a 6-bit 4-tap weak filter, examples of which are provided below (note that the coefficients at positions (32-i)/32 are mirrored versions of i/32):
{0,64,0,0},//0/32 position
{ -1,64,1,0},//1/32 position
{ -3,65,3, -1},//2/32 position
{ -3,63,5, -1},//3/32 position
{ -4,63,6, -1},//4/32 position
{ -5,62,9, -2},//5/32 position
{ -5,60,11, -2},//6/32 position
{ -5,58,13, -2},//7/32 position
{ -6,57,16, -3},//8/32 position
{ -6,55,18, -3},//9/32 position
{ -7,54,21, -4},//10/32 position
{ -7,52,23, -4},//11/32 position
{ -6,48,26, -4},//12/32 position
{ -7,47,29, -5},//13/32 position
{ -6,43,32, -5},//14/32 position
{ -6,41,34, -5},//15/32 position
{ -5,37,37, -5},//16/32 position
The systems and techniques allow prediction (e.g., intra prediction) to be performed using an enhanced interpolation filter. In some examples, the systems and techniques described herein may provide advantages over other techniques that utilize multiple interpolation filters. For example, in some cases, multiple interpolation filters, e.g., with different interpolation filter taps, may be applied within one block, slice, tile, and/or picture. In one example, the interpolation filter type and interpolation filter tap (length) may depend on the height and/or width of the block, the block shape (width to height ratio), the block region size, the intra prediction mode, and/or neighboring decoded information, including, but not limited to, reconstructed sample values and intra prediction modes, and the like. In such a case, when the intra prediction is a vertical-like angular intra prediction mode, and if the width is less than or equal to 8 or other sizes, a 6-tap Sextic interpolation filter is used; otherwise, a 4-tap gaussian interpolation filter is used. When intra prediction is a similar level intra prediction mode, and if the width is less than or equal to 8 or other sizes, a 6 tap Sextic interpolation filter is used, otherwise a 4 tap Gaussian interpolation filter is used. In one example using the systems and techniques described herein, if the width and height of the codec block is greater than or equal to the threshold T, a 6 tap gaussian filter is used (and no pixel smoothing is applied); otherwise, a 4-tap gaussian filter is used (and no pixel smoothing is applied).
Fig. 6 is a flowchart illustrating an example of a process 600 for processing image and/or video data. At block 602, process 600 may include determining an intra-prediction mode for predicting a block of video data.
At block 604, process 600 may include determining a type of smoothing filter for the block of video data. For example, process 600 may determine the type of the smoothing filter based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold. In some aspects, the type of smoothing filter is signaled in the video bitstream. In some cases, this type of smoothing filter is signaled for each of a set of prediction blocks, codec Tree Units (CTUs), slices, or sequences. At block 606, process 600 may include intra-predicting the block of video data using the determined type of the smoothing filter and the intra-prediction mode.
In some examples, process 600 may include using a first smoothing interpolation filter as the type of smoothing filter determined based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold. In one illustrative example, the first smooth interpolation filter comprises a 6 tap gaussian filter. In such examples, process 600 may further include determining reference pixels for intra prediction of the block of video data using the first smooth interpolation filter.
In some examples, process 600 may include using a second smoothing interpolation filter as the type of smoothing filter determined based at least in part on determining that at least one of the width of the block and the height of the block is not greater than (e.g., less than) the first threshold. In one illustrative example, the second smooth interpolation filter comprises a 4 tap gaussian filter. In such examples, process 600 may further include determining reference pixels for intra prediction of the block of video data using the second smooth interpolation filter.
In some cases, process 600 may include determining a minimum offset between an angular direction of the intra-prediction mode and one of a vertical intra-prediction mode and a horizontal intra-prediction mode. Process 600 may also include determining the type of smoothing filter for the block of video data based on comparing the determined minimum offset to a second threshold. In one example, process 600 may include determining the low pass filter as the type of smoothing filter based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra prediction mode is an integer angle mode associated with an integer value reference pixel location. In one illustrative example, the low pass filter comprises a [1 2 1] filter and reference pixel smoothing is performed without interpolation.
In another example, process 600 may include determining a gaussian filter as the type of smoothing filter based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra-prediction mode is a fractional angle mode associated with a fractional value reference pixel location. In some cases, the gaussian filter performs smooth interpolation without reference pixel smoothing. In one illustrative example, the gaussian filter comprises a 6 tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is greater than the first threshold. In another illustrative example, the gaussian filter comprises a 4-tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold.
In some aspects, the process 600 may include using an interpolation filter as the type of smoothing filter determined based at least in part on a determination that the minimum offset determined is not greater than (e.g., less than) the second threshold. In one illustrative example, the interpolation filter includes a 4 tap cubic filter. Process 600 may also include intra-predicting the block of video data using the interpolation filter without applying reference pixel smoothing.
In some examples, process 600 may include determining the low pass filter as the type of smoothing filter based at least in part on a determination that the intra-prediction mode is an integer angle mode and a determination that the determined minimum offset is greater than the second threshold. In some cases, process 600 may include using a large tap low pass filter for reference pixel smoothing based at least in part on a determination that the width of the block, the height of the block, or the width and height of the block is greater than the first threshold. The large tap low pass filter applies a greater degree of reference pixel smoothing than the small tap low pass filter. In some cases, process 600 may include using a small tap low pass filter for reference pixel smoothing based at least in part on a determination that the width of the block, the height of the block, or the width and height of the block is not greater than (e.g., less than) the first threshold. The small tap low pass filter applies a smaller degree of reference pixel smoothing than the large tap low pass filter.
In some cases, process 600 may include determining that the intra-prediction mode is an integer angle mode based at least in part on comparing a slope of the intra-prediction mode to one or more pixel locations determined from the width of the block and the height of the block.
In some aspects, the process 600 may include determining that an offset between an angular direction of the intra-prediction mode and a vertical intra-prediction mode or a horizontal intra-prediction mode is less than a second threshold. Process 600 may also include intra-predicting the block of video data using a cubic interpolation filter based on determining that the offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than the second threshold.
In some examples, process 600 may include reference line extension using a weak interpolation filter. In some cases, the reference line extension is performed using the weak interpolation filter before the intra prediction is performed using the cubic interpolation filter. In some cases, the cubic interpolation filter has a higher cut-off frequency than the weak interpolation filter and applies a greater degree of smoothing than the weak interpolation filter. In some aspects, the weak interpolation filter comprises a 4 tap sinc-based interpolation filter and a 6 bit 4 tap interpolation filter.
In some aspects, the process 600 may include determining the type of smoothing filter based on the width of the block, the height of the block, or the width and the height of the block without using information explicitly signaled in the video bitstream.
In some cases, process 600 may be performed by a decoding device (e.g., decoding device 112 of fig. 1 and 8). For example, process 600 may also include determining a residual data block for the block of video data. Process 600 may also include decoding the block of video data using the residual block of data and a predictive block determined based on the intra prediction of the block of video data.
In some cases, process 600 may be performed by an encoding device (e.g., encoding device 104 of fig. 1 and 7). For example, process 600 may include generating an encoded video bitstream that includes information associated with the block of video data. In some examples, process 600 may include storing the encoded video bitstream (e.g., in the at least one memory of the apparatus). In some examples, process 600 may include transmitting the encoded video bitstream (e.g., using a transmitter of the apparatus).
In some implementations, the processes (or methods) described herein may be performed by a computing device or apparatus, such as the system 100 shown in fig. 1. For example, these processes may be performed by the encoding device 104 shown in fig. 1 and 8, by another video source side device or video transmission device, by the decoding device 112 shown in fig. 1 and 9, and/or by another client side device (such as a player device, a display, or any other client side device). In some cases, a computing device or apparatus may include a processor, microprocessor, microcomputer, or other component of a device configured to perform the steps of the processes described herein. In some examples, a computing device or apparatus may include a camera configured to capture video data (e.g., a video sequence) including video frames. In some examples, a camera or other capture device that captures the video data is separate from the computing device, in which case the computing device receives or obtains the captured video data. The computing device may also include a network interface configured to communicate the video data. The network interface may be configured to communicate Internet Protocol (IP) based data or other types of data. In some examples, a computing device or apparatus may include a display to display samples of output video content, such as pictures of a video bitstream.
These processes may be described with respect to logic flow diagrams, the operations of which represent sequences of operations that may be implemented in hardware, computer instructions, or a combination thereof. In the context of computer instructions, operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the described operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, etc. that perform particular functions or implement particular data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations may be combined in any order and/or in parallel to implement the process.
Additionally, the process may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) that are executed together on one or more processors by hardware or a combination thereof. As described above, the code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions that may be executed by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.
The codec techniques discussed herein may be implemented in an example video encoding and decoding system (e.g., system 100). In some examples, the system includes a source device that provides encoded video data to be decoded by a destination device at a later time. In particular, the source device provides video data to the destination device via a computer readable medium. The source device and the destination device may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" tablets, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and the like. In some cases, the source device and the destination device may be equipped for wireless communication.
The destination device may receive the encoded video data to be decoded via a computer readable medium. The computer readable medium may include any type of medium or device capable of moving encoded video data from a source device to a destination device. In one example, the computer-readable medium may include a communication medium to cause the source device to transmit encoded video data directly to the destination device in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to a destination device. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device that may be used to facilitate communication from a source device to a destination device.
In some examples, the encoded data may be output from the output interface to a storage device. Similarly, the encoded data may be accessed from the storage device through the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as hard drives, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In another example, the storage device may correspond to a file server or another intermediate storage device that may store encoded video generated by the source device. The destination device may access the stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting such encoded video data to a destination device. Example file servers include web servers (e.g., for websites), FTP servers, network Attached Storage (NAS) devices, or local disk drives. The destination device may access the encoded video data through any standard data connection, including an internet connection. This may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. Transmitting the encoded video data from the storage device may be streaming, downloading, or a combination thereof.
The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video codecs that support any of a variety of multimedia applications such as over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission such as dynamic adaptive streaming over HTTP (DASH), decoding of digital video encoded onto a data storage medium, digital video stored on a data storage medium, or other applications. In some examples, the system may be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
In one example, a source device includes a video source, a video encoder, and an output interface. The destination device may include an input interface, a video decoder, and a display device. The video encoder of the source device may be configured to apply the techniques disclosed herein. In other examples, the source device and the destination device may include other components or arrangements. For example, the source device may receive video data from an external video source (such as an external camera). Likewise, the destination device may interface with an external display device instead of including an integrated display device.
The above example system is merely one example. The techniques for parallel processing of video data may be performed by any digital video encoding and/or decoding device. Although the techniques of the present disclosure are typically performed by video encoding devices, these techniques may also be performed by video encoders/decoders, which are commonly referred to as "CODECs". Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. The source device and the destination device are merely examples of such codec devices, wherein the source device generates the decoded video data for transmission to the destination device. In some examples, the source device and the destination device may operate in a substantially symmetrical manner such that each of the devices includes video encoding and decoding components. Thus, example systems may support unidirectional or bidirectional video transmission between video devices, for example, for video streaming, video playback, video broadcasting, or video telephony.
The video source may include a video capture device such as a video camera, a video archive containing previously captured video, and/or a video feed interface that receives video from a video content provider. As another alternative, the video source may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if the video source is a video camera, the source device and the destination device may form a so-called camera phone or video phone. However, as mentioned above, the techniques described in this disclosure may be applicable to video codecs in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by a video encoder. The encoded video information may then be output onto a computer readable medium via an output interface.
As described above, the computer-readable medium may include a transitory medium such as a wireless broadcast or a wired network transmission, or a storage medium (i.e., a non-transitory storage medium) such as a hard disk, a flash drive, a compact disk, a digital video disk, a blu-ray disk, or other computer-readable medium. In some examples, a network server (not shown) may receive encoded video data from a source device and provide the encoded video data to a destination device, e.g., via network transmission. Similarly, a computing device of a media production facility, such as a disk stamping facility, may receive encoded video data from a source device and generate an optical disk containing the encoded video data. Thus, in various examples, a computer-readable medium may be understood to include one or more computer-readable media in various forms.
An input interface of the destination device receives information from the computer-readable medium. The information of the computer readable medium may include syntax information defined by the video encoder, which is also used by the video decoder, including syntax elements describing characteristics and/or processing of blocks and other decoded units (e.g., group of pictures (GOP)). The display device displays the decoded video data to a user and may include any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device. Various embodiments of the present application have been described.
Specific details of encoding device 104 and decoding device 112 are shown in fig. 8 and 9, respectively. Fig. 8 is a block diagram illustrating an example encoding device 104 that may implement one or more of the techniques described in this disclosure. The encoding device 104 may, for example, generate a syntax structure described herein (e.g., a syntax structure of VPS, SPS, PPS or other syntax element). The encoding device 104 may perform intra-prediction and inter-prediction coding of video blocks within a video slice. As previously described, intra-coding relies at least in part on spatial prediction to reduce or remove spatial redundancy within a given video frame or picture. Inter-frame coding relies at least in part on temporal prediction to reduce or remove temporal redundancy within adjacent or surrounding frames of a video sequence. Intra mode (I mode) may refer to any of several spatial-based compression modes. Inter modes such as unidirectional prediction (P-mode) or bi-directional prediction (B-mode) may refer to any of several time-domain based compression modes.
The encoding apparatus 104 includes a dividing unit 35, a prediction processing unit 41, a filter unit 63, a picture memory 64, a summer 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction processing unit 41 includes a motion estimation unit 42, a motion compensation unit 44, and an intra prediction processing unit 46. For video block reconstruction, the encoding device 104 further includes an inverse quantization unit 58, an inverse transform processing unit 60, and a summer 62. The filter unit 63 is intended to represent one or more loop filters, such as a deblocking filter, an Adaptive Loop Filter (ALF), and a Sampling Adaptive Offset (SAO) filter. Although the filter unit 63 is shown as an in-loop filter in fig. 8, in other configurations, the filter unit 63 may be implemented as a post-loop filter. The post-processing device 57 may perform additional processing on the encoded video data generated by the encoding device 104. In some examples, the techniques of this disclosure may be implemented by encoding device 104. However, in other examples, one or more of the techniques of this disclosure may be implemented by post-processing device 57.
As shown in fig. 8, the encoding device 104 receives video data, and the dividing unit 35 divides the data into video blocks. Partitioning may also include partitioning into slices (slices), slice segments (slices), tiles, or other larger units according to the quadtree structure of the LCUs and CUs, as well as video block (block) partitioning. The encoding device 104 generally illustrates the components that encode video blocks within a video slice to be encoded. A slice may be partitioned into multiple video blocks (and possibly into a set of video blocks called tiles). The prediction processing unit 41 may select one of a plurality of possible codec modes, such as one of a plurality of intra-prediction codec modes or one of a plurality of inter-prediction codec modes, for the current video block based on an error result (e.g., codec rate and distortion level, etc.). The prediction processing unit 41 may provide the resulting intra or inter coded block to the summer 50 to generate residual block data and to the summer 62 to reconstruct the encoded block used as a reference picture.
Intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-prediction encoding of the current video block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.
The motion estimation unit 42 may be configured to determine the inter prediction mode for the video slice according to a predetermined pattern (pattern) of the video sequence. The predetermined pattern may designate video slices in the sequence as P slices, B slices, or GPB slices. The motion estimation unit 42 and the motion compensation unit 44 may be highly integrated but are illustrated separately for conceptual purposes. The motion estimation by the motion estimation unit 42 is a process of generating a motion vector that estimates the motion of a video block. For example, a motion vector may indicate a displacement of a Prediction Unit (PU) of a video block within a current video frame or picture relative to a predictive block within a reference picture.
A predictive block is a block found to closely match the PU of the video block to be encoded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), sum of Squared Differences (SSD), or other difference metric. In some examples, encoding device 104 may calculate a value for a sub-integer pixel location of the reference picture stored in picture store 64. For example, the encoding device 104 may interpolate values for one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference picture. Accordingly, the motion estimation unit 42 can perform motion search with respect to the full pixel position and the fractional pixel position, and output a motion vector having fractional pixel accuracy.
Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the location of the PU with the location of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each reference picture list identifying one or more reference pictures stored in picture memory 64. The motion estimation unit 42 issues the calculated motion vector to the entropy encoding unit 56 and the motion compensation unit 44.
Motion compensation by motion compensation unit 44 may involve extracting or generating a predictive block, possibly interpolating sub-pixel precision, based on motion vectors determined by motion estimation. Upon receiving the motion vector of the PU of the current video block, motion compensation unit 44 may locate the predictive block in the reference picture list to which the motion vector points. The encoding device 104 forms a residual video block by subtracting pixel values of the predictive block from pixel values of the current video block being encoded and decoded, thereby forming pixel differences. The pixel differences form residual data for the block and may include both luma and chroma difference components. Summer 50 represents one or more components that perform this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with the video blocks and the video slices for use by decoding device 112 during decoding of the video blocks of the video slices.
The intra prediction processing unit 46 may intra predict the current block as an alternative to inter prediction by the motion estimation unit 42 and the motion compensation unit 44, as described above. In particular, intra-prediction processing unit 46 may determine an intra-prediction mode for encoding the current block. In some examples, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes (e.g., during separate encoding passes), and intra-prediction processing unit 46 may select an appropriate intra-prediction mode from the tested modes for use. For example, the intra prediction processing unit 46 may calculate a rate distortion value using rate distortion analysis for various tested intra prediction modes, and may select an intra prediction mode having the best rate distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original uncoded block encoded to produce the encoded block, as well as the bit rate (i.e., number of bits) used to produce the encoded block. Intra-prediction processing unit 46 may calculate ratios from the distortion and rate for the various encoded blocks to determine which intra-prediction mode exhibits the best rate distortion value for the block.
In any case, after selecting the intra-prediction mode for a block, intra-prediction processing unit 46 may provide entropy encoding unit 56 with information indicating the intra-prediction mode selected for the block. Entropy encoding unit 56 may encode information indicating the selected intra-prediction mode. The encoding device 104 may include in the transmitted bitstream configuration data definitions of the encoding contexts for the various blocks and indications of the most probable intra-prediction mode, intra-prediction mode index table, and modified intra-prediction mode index table for each context. The bitstream configuration data may include a plurality of intra prediction mode index tables and a plurality of modified intra prediction mode index tables (also referred to as codeword mapping tables).
After prediction processing unit 41 generates a predictive block for the current video block via inter prediction or intra prediction, encoding device 104 forms a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform. The transform processing unit 52 may convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.
The transform processing unit 52 may issue the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting quantization parameters. In some examples, quantization unit 54 may then perform a scan of a matrix including quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.
After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding technique. After entropy encoding by entropy encoding unit 56, the encoded bitstream may be sent to decoding device 112, or archived for later transmission or retrieval by decoding device 112. Entropy encoding unit 56 may also entropy encode the motion vector and other syntax elements of the current video slice being encoded.
The inverse quantization unit 58 and the inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct residual blocks in the pixel domain for later use as reference blocks of reference pictures. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within the reference picture list. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in picture memory 64. The reference block may be used by the motion estimation unit 42 and the motion compensation unit 44 as a reference block for inter prediction of a block in a subsequent video frame or picture.
In this way, the encoding device 104 of fig. 8 represents an example of a video encoder configured to perform the techniques described herein. For example, the encoding device 104 may perform any of the techniques described herein, including the processes described herein. In some cases, some techniques of the present disclosure may also be implemented by the post-processing device 57.
Fig. 9 is a block diagram illustrating an example decoding device 112. The decoding apparatus 112 includes an entropy decoding unit 80, a prediction processing unit 81, an inverse quantization unit 86, an inverse transform processing unit 88, a summer 90, a filter unit 91, and a picture memory 92. The prediction processing unit 81 includes a motion compensation unit 82 and an intra prediction processing unit 84. In some examples, the decoding device 112 may perform a decoding pass that is substantially opposite to the encoding pass described with respect to the encoding device 104 of fig. 8.
During the decoding process, the decoding device 112 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements issued by the encoding device 104. In some embodiments, the decoding device 112 may receive the encoded video bitstream from the encoding device 104. In some embodiments, the decoding device 112 may receive the encoded video bitstream from a network entity 79, such as a server, a Media Aware Network Element (MANET), a video editor/splicer, or other such device configured to implement one or more of the techniques described above. Network entity 79 may or may not include encoding device 104. Some of the techniques described in this disclosure may be implemented by network entity 79 before network entity 79 sends the encoded video bitstream to decoding device 112. In some video decoding systems, the network entity 79 and the decoding device 112 may be part of separate devices, while in other cases the functionality described with respect to the network entity 79 may be performed by the same device 9 that includes the decoding device 112.
Entropy decoding unit 80 of decoding device 112 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction processing unit 81. The decoding device 112 may receive the syntax elements at the video slice level and/or the video block level. Entropy decoding unit 80 may process and parse fixed length syntax elements and variable length syntax elements in one or more parameter sets (such as VPS, SPS, and PPS).
When a video slice is encoded as an intra-coded (I) slice, intra-prediction processing unit 8 of prediction processing unit 81 may generate prediction data for a video block of the current video slice based on the signaled intra-prediction mode and data from a previously decoded block of the current frame or picture. When a video frame is encoded as an inter-frame codec (i.e., B, P or GPB) slice, the motion compensation unit 82 of the prediction processing unit 81 generates a predictive block for a video block of the current video slice based on the motion vectors and other syntax elements received from the entropy decoding unit 80. The predictive block may be generated from one of the reference pictures within the reference picture list. The decoding device 112 may construct the reference frame list, i.e., list 0 and list 1, using a default construction technique based on the reference pictures stored in the picture memory 92.
Motion compensation unit 82 determines prediction information for the video block of the current video slice by parsing the motion vector and other syntax elements and uses the prediction information to generate a predictive block of the current video block being decoded. For example, motion compensation unit 82 may determine a prediction mode (e.g., intra or inter prediction) for encoding a video block of a video slice, an inter prediction slice type (e.g., B slice, P slice, or GPB slice), construction information for one or more reference picture lists for the slice, a motion vector for each inter-encoded video block of the slice, an inter prediction state for each inter-encoded video block of the slice, and other information to decode the video block in the current video slice using one or more syntax elements in the parameter set.
The motion compensation unit 82 may also interpolate based on interpolation filters. The motion compensation unit 82 may calculate interpolated values for sub-integer pixels of the reference block using interpolation filters used by the encoding device 104 during encoding of the video block. In this case, the motion compensation unit 82 may determine an interpolation filter used by the encoding device 104 from the received syntax element, and may use the interpolation filter to generate the predictive block.
The inverse quantization unit 86 inversely quantizes or dequantizes the quantized transform coefficients provided in the bitstream and decoded by the entropy decoding unit 80. The inverse quantization process may include determining a degree of quantization using quantization parameters calculated by the encoding device 104 for each video block in the video slice, and likewise, determining a degree of inverse quantization that should be applied. The inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT or other suitable inverse transform), an inverse integer transform, or a conceptually similar inverse transform process to the transform coefficients in order to generate a residual block in the pixel domain.
After motion compensation unit 82 generates a predictive block for the current video block based on the motion vector and other syntax elements, decoding apparatus 112 forms a decoded video block by summing the residual block from inverse transform processing unit 88 with the corresponding predictive block generated by motion compensation unit 82. Summer 90 represents one or more components that perform this summation operation. Loop filters (in or after the codec loop) may also be used to smooth pixel transitions (smooth pixel transition), if desired, or otherwise improve video quality. The filter unit 91 is used to represent one or more loop filters, such as a deblocking filter, an Adaptive Loop Filter (ALF), and a Sampling Adaptive Offset (SAO) filter. Although the filter unit 91 is shown in fig. 9 as an in-loop filter, in other configurations, the filter unit 91 may be implemented as a post-loop filter. The decoded video blocks in a given frame or picture are then stored in a picture memory 92, the picture memory 92 storing reference pictures for subsequent motion compensation. The picture memory 92 also stores decoded video for later presentation on a display device, such as the video destination device 122 shown in fig. 1.
In this way, the decoding device 112 of fig. 9 represents an example of a video decoder configured to perform the techniques described herein. For example, the decoding device 112 may perform any of the techniques described herein, including the processes described herein.
As used herein, the term "computer-readable medium" includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media capable of storing, containing, or carrying instruction(s) and/or data. A computer-readable medium may include a non-transitory medium that may store data therein and that does not include a carrier wave and/or transitory electronic signals that propagate wirelessly or over a wired connection. Examples of non-transitory media may include, but are not limited to, magnetic disks or tapes, optical storage media such as Compact Discs (CDs) or Digital Versatile Discs (DVDs), flash memory, or memory devices. The computer readable medium may have stored thereon code and/or machine executable instructions, which may represent processes, functions, subroutines, procedures, subroutines, modules, software packages, classes, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, actual parameters, form parameters, data, etc. may be transferred, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
In some embodiments, the computer readable storage devices, media, and memory may comprise a cable or wireless signal comprising a bit stream or the like. However, when mentioned, non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals themselves.
In the above description, specific details are provided to provide a thorough understanding of the embodiments and examples provided herein. However, it will be understood by those of ordinary skill in the art that the embodiments may be practiced without these specific details. For clarity of explanation, in some cases, the present technology may be presented as including individual functional blocks, including functional blocks that include devices, device components, steps, or routines embodied in software or a combination of hardware and software. Additional components may be used in addition to those shown in the figures and/or described herein. For example, circuits, systems, networks, processes, and other components may be shown as block diagram form components in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
Various embodiments may be described above as a process or method which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. Furthermore, the order of the operations may be rearranged. The process terminates when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, etc. When a process corresponds to a function, its termination may correspond to the function returning to the calling function or the main function.
The processes and methods according to the examples above may be implemented using computer-executable instructions stored in or obtainable from a computer-readable medium. Such instructions may include, for example, instructions and data which cause or otherwise configure a general purpose computer, special purpose computer, or processing device to perform a certain function or group of functions. The portion of the computer resources used may be accessible through a network. The computer-executable instructions may be, for example, binary, intermediate format instructions, such as assembly language, firmware, source code, and the like. Examples of computer readable media that may be used to store instructions, information used, and/or information created during a method according to the described examples include magnetic or optical disks, flash memory, a USB device providing non-volatile memory, a network storage device, and so forth.
Devices implementing processes and methods according to these disclosures may include hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof, and may employ any of a variety of form factors. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer program product) may be stored in a computer-readable or machine-readable medium. The processor may perform the necessary tasks. Typical examples of form factors include laptop computers, smart phones, mobile phones, tablet devices, or other small form factor personal computers, personal digital assistants, rack-mounted devices, stand alone devices, and the like. The functionality described herein may also be embodied in a peripheral device or plug-in card. As a further example, such functionality may also be implemented in different chips or different processes executing in a single device on a circuit board.
The instructions, the medium for transmitting the instructions, the computing resources for executing the instructions, and other structures for supporting the computing resources are example means for providing the functionality described in this disclosure.
In the foregoing description, aspects of the present application have been described with reference to specific embodiments thereof, but those skilled in the art will recognize that the present application is not limited thereto. While illustrative embodiments of the application have been described in detail herein, it should be understood that the inventive concepts may be otherwise variously embodied and employed and that the appended claims are intended to be construed to include such variations except as limited by the prior art. The various features and aspects of the above-described applications may be used alone or in combination. Moreover, embodiments may be utilized in any number of environments and applications beyond those described herein without departing from the broader spirit and scope of the specification. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. For purposes of illustration, the methods are described in a particular order. It should be understood that in alternative embodiments, the methods may be performed in an order different than that described.
Those of ordinary skill in the art will understand that symbols or terms of less than ("<") and greater than (">) as used herein may be substituted with symbols of less than or equal to (" +") and greater than or equal to (" +") respectively, without departing from the scope of the present description.
Where a component is described as "configured to" perform a certain operation, such configuration may be implemented, for example, by designing electronic circuitry or other hardware to perform the operation, by programming programmable electronic circuitry (e.g., a microprocessor or other suitable electronic circuitry) to perform the operation, or any combination thereof.
The phrase "coupled to" refers to any component that is physically connected, directly or indirectly, to another component, and/or that communicates, directly or indirectly, with another component (e.g., connected to the other component through a wired or wireless connection and/or other suitable communication interface).
Claim language or other language reciting "at least one of a collection" and/or "one or more of a collection" means that a member of the collection or members of the collection (in any combination) satisfies the claim. For example, claim language reciting "at least one of a and B" refers to A, B, or a and B. In another example, claim language reciting "at least one of A, B and C" refers to A, B, C, or a and B, or a and C, or B and C, or a and B and C. The language "at least one of a collection" and/or "one or more of a collection" does not limit the collection to the items listed in the collection. For example, claim language reciting "at least one of a and B" may represent A, B, or a and B, and may additionally include items not listed in the set of a and B.
The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, firmware, or combinations thereof. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The techniques described herein may also be implemented in electronic hardware, computer software, firmware, or any combination thereof. Such techniques may be implemented in any of a variety of devices such as a general purpose computer, a wireless communication device handset, or an integrated circuit device having multiple uses including applications in wireless communication device handsets and other devices. Any features described as modules or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a computer-readable data storage medium comprising program code that, when executed, includes instructions to perform one or more of the methods described above. The computer readable data storage medium may form part of a computer program product, which may include packaging material. The computer-readable medium may include memory or data storage media such as Random Access Memory (RAM), such as Synchronous Dynamic Random Access Memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), flash memory, magnetic or optical data storage media, and the like. Additionally or alternatively, the techniques may be implemented, at least in part, by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures and that the program code can be accessed, read, and/or executed by a computer, such as a propagated signal or wave.
The program code may be executed by a processor, which may include one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Such processors may be configured to perform any of the techniques described in this disclosure. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Thus, the term "processor" as used herein may refer to any of the foregoing structure, any combination of the foregoing structure, or any other structure or device suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated into a combined video encoder-decoder (CODEC).
Illustrative examples of the present disclosure include:
aspect 1: a method of processing video data, the method comprising: obtaining a video data block; processing the block using intra prediction mode; and determining a type of interpolation filter for the block based on at least one of a width and a height of the block.
Aspect 2: the method of aspect 1, further comprising: determining a first type of interpolation filter for the block based on a determination that at least one of the width of the block and the height of the block is greater than a threshold; and determining a reference pixel for the block using the first type of interpolation filter.
Aspect 3: the method of aspect 1, wherein the first type of interpolation filter comprises a 6 tap gaussian filter.
Aspect 4: the method of aspect 1, further comprising: determining a second type of interpolation filter for the block based on a determination that at least one of the width of the block and the height of the block is not greater than a threshold; and determining a reference pixel for the block using the second type of interpolation filter.
Aspect 5: the method of aspect 4, wherein the second type of interpolation filter comprises a 4 tap gaussian filter.
Aspect 6: the method of any one of aspects 1 to 5, wherein the type of interpolation filter is explicitly signaled in the video bitstream.
Aspect 7: the method of aspect 6, wherein the type of interpolation filter is explicitly signaled for each prediction block, codec Tree Unit (CTU), slice, or sequence.
Aspect 8: the method of any one of aspects 1 to 5, further comprising: the type of interpolation filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in the video bitstream.
Aspect 9: an apparatus, comprising: a memory configured to store video data; and a processor configured to: obtaining a video data block; processing the block using intra prediction mode; and determining a type of interpolation filter for the block based on at least one of a width and a height of the block.
Aspect 10: the apparatus of aspect 9, wherein the processor is configured to: determining a first type of interpolation filter for the block based on a determination that at least one of the width of the block and the height of the block is greater than a threshold; and determining a reference pixel for the block using the first type of interpolation filter.
Aspect 11: the apparatus of aspect 9, wherein the first type of interpolation filter comprises a 6 tap gaussian filter.
Aspect 12: the apparatus of aspect 9, wherein the processor is configured to: determining a second type of interpolation filter for the block based on a determination that at least one of the width of the block and the height of the block is not greater than a threshold; and determining a reference pixel for the block using the second type of interpolation filter.
Aspect 13: the apparatus of aspect 12, wherein the second type of interpolation filter comprises a 4 tap gaussian filter.
Aspect 14: the apparatus of any of aspects 9 to 13, wherein the type of interpolation filter is explicitly signaled in the video bitstream.
Aspect 15: the apparatus of aspect 14, wherein the type of interpolation filter is explicitly signaled for each prediction block, codec Tree Unit (CTU), slice, or sequence.
Aspect 16: the apparatus of any one of aspects 9 to 13, wherein the processor is configured to: the type of interpolation filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in the video bitstream.
Aspect 17: the apparatus of any one of aspects 9 to 16, wherein the apparatus comprises an encoder.
Aspect 18: the apparatus of any one of aspects 9 to 17, wherein the apparatus comprises a decoder.
Aspect 19: the apparatus of any one of aspects 9 to 18, wherein the apparatus is a mobile device.
Aspect 20: the apparatus of any one of aspects 9 to 19, wherein the apparatus is an augmented reality device.
Aspect 21: the apparatus of any one of aspects 9 to 20, further comprising: and a display configured to display the video data.
Aspect 22: the apparatus of any one of aspects 9 to 21, further comprising: a camera configured to capture one or more pictures.
Aspect 23: a computer readable medium having instructions stored thereon which, when executed by a processor, perform the method of any one of aspects 1 to 22.
Aspect 24: an apparatus comprising means for performing the operations of any one of aspects 1 to 22.
Aspect 25: a method of processing video data, the method comprising: obtaining a video data block; processing the block using intra prediction mode; and determining a type of smoothing filter for the block based on at least one of a width and a height of the block.
Aspect 26: the method of aspect 25, further comprising: determining whether an angle of the intra prediction mode is an integer angle; wherein determining the type of smoothing filter is further based on determining the angle of the intra prediction mode to be an integer angle.
Aspect 27: the method of any one of aspects 25 or 26, further comprising: determining a first type of smoothing filter for the block based on a determination that at least one of the width of the block and the height of the block is greater than a threshold; and processing at least one predicted pixel for the block using the first type of smoothing filter.
Aspect 28: the method of aspect 27, wherein the first type of smoothing filter comprises a [1 4 6 4 1] filter.
Aspect 29: the method of any one of aspects 25 or 26, further comprising: determining a second type of smoothing filter for the block based on a determination that at least one of the width of the block and the height of the block is not greater than a threshold; and processing at least one predicted pixel for the block using the second type of smoothing filter.
Aspect 30: the method of aspect 29, wherein the second type of smoothing filter comprises a [1 2 1] filter.
Aspect 31: the method of any of aspects 25 through 30, wherein the type of smoothing filter is explicitly signaled in the video bitstream.
Aspect 32: the method of aspect 31, wherein the type of interpolation filter is explicitly signaled for each prediction block, codec Tree Unit (CTU), slice, or sequence.
Aspect 33: the method of any one of aspects 25 to 30, further comprising: the type of smoothing filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in the video bitstream.
Aspect 34: an apparatus, comprising: a memory configured to store video data; and a processor configured to: obtaining a video data block; processing the block using intra prediction mode; and determining a type of smoothing filter for the block based on at least one of a width and a height of the block.
Aspect 35: the apparatus of aspect 34, wherein the processor is configured to: determining whether an angle of the intra prediction mode is an integer angle; wherein determining the type of smoothing filter is further based on determining the angle of the intra prediction mode to be an integer angle.
Aspect 36: the apparatus of any one of aspects 34 or 35, wherein the processor is configured to: determining a first type of smoothing filter for the block based on a determination that at least one of the width of the block and the height of the block is greater than a threshold; and processing at least one predicted pixel for the block using the first type of smoothing filter.
Aspect 37: the apparatus of aspect 36, wherein the first type of smoothing filter comprises a [1 4 6 41 ] filter.
Aspect 38: the apparatus of any one of aspects 34 or 35, wherein the processor is configured to: determining a second type of smoothing filter for the block based on a determination that at least one of the width of the block and the height of the block is not greater than a threshold; and processing at least one predicted pixel for the block using the second type of smoothing filter.
Aspect 39: the apparatus of aspect 38, wherein the second type of smoothing filter comprises a [1 2 1] filter.
Aspect 40: the apparatus of any of aspects 34 through 39, wherein the type of smoothing filter is explicitly signaled in the video bitstream.
Aspect 41: the apparatus of aspect 40, wherein the type of interpolation filter is explicitly signaled for each prediction block, codec Tree Unit (CTU), slice, or sequence.
Aspect 42: the apparatus of any one of aspects 34 to 39, wherein the processor is configured to: the type of smoothing filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in the video bitstream.
Aspect 43: the apparatus of any one of aspects 34 to 42, wherein the apparatus comprises an encoder.
Aspect 44: the apparatus of any one of aspects 34 to 43, wherein the apparatus comprises a decoder.
Aspect 45: the apparatus of any one of aspects 34 to 44, wherein the apparatus is a mobile device.
Aspect 46: the apparatus of any one of aspects 34 to 45, wherein the apparatus is an augmented reality device.
Aspect 47: the apparatus of any one of aspects 34 to 46, further comprising: and a display configured to display the video data.
Aspect 48: the apparatus of any one of aspects 34 to 47, further comprising: a camera configured to capture one or more pictures.
Aspect 49: a computer readable medium having instructions stored thereon which, when executed by a processor, perform the method of any of aspects 25 to 48.
Aspect 50: an apparatus comprising means for performing the operations of any one of aspects 25 to 48.
Aspect 51: a computer readable medium having instructions stored thereon which, when executed by a processor, perform the method of any one of aspects 1 to 22 and aspects 25 to 48.
Aspect 52: an apparatus comprising means for performing the operations of any one of aspects 1 to 22 and aspects 25 to 48.
Aspect 53: an apparatus for processing video data, comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: determining an intra-prediction mode for predicting a block of video data; determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and intra-predicting the block of video data using the determined type of smoothing filter and the intra-prediction mode.
Aspect 54: the apparatus of aspect 53, wherein the at least one processor is configured to: based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, using a first smoothing interpolation filter as the type of smoothing filter determined; and determining reference pixels for intra prediction of the block of video data using the first smooth interpolation filter.
Aspect 55: the apparatus of any one of aspects 53-54, wherein the first smooth interpolation filter comprises a 6 tap gaussian filter.
Aspect 56: the apparatus of aspect 55, wherein the at least one processor is configured to: based at least in part on determining that at least one of the width of the block and the height of the block is not greater than the first threshold, using a second smoothing interpolation filter as the type of smoothing filter determined; and determining reference pixels for intra prediction of the block of video data using the second smooth interpolation filter.
Aspect 57: the apparatus of aspect 56, wherein the second smooth interpolation filter comprises a 4 tap gaussian filter.
Aspect 58: the apparatus of any one of aspects 53-57, wherein the at least one processor is configured to: determining a minimum offset between an angular direction of the intra prediction mode and one of a vertical intra prediction mode and a horizontal intra prediction mode; and determining the type of smoothing filter for the block of video data based on comparing the determined minimum offset to a second threshold.
Aspect 59: the apparatus of aspect 58, wherein the at least one processor is configured to: the low pass filter is determined to be of the type of smoothing filter based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra prediction mode is an integer angle mode associated with an integer value reference pixel location.
Aspect 60: the apparatus of aspect 59, wherein the low pass filter performs reference pixel smoothing without interpolation, the low pass filter comprising a [1 2 1] filter.
Aspect 61: the apparatus of aspect 58, wherein the at least one processor is configured to: the method may further include determining a gaussian filter as the type of smoothing filter based at least in part on the determination that the determined minimum offset is greater than the second threshold and the determination that the intra-prediction mode is a fractional angle mode associated with a fractional value reference pixel location.
Aspect 62: the apparatus of aspect 61, wherein the gaussian filter performs smooth interpolation without reference pixel smoothing.
Aspect 63: the apparatus of aspect 61, wherein the gaussian filter comprises a 6 tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is greater than the first threshold.
Aspect 64: the apparatus of aspect 61, wherein the gaussian filter comprises a 4-tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold.
Aspect 65: the apparatus of aspect 58, wherein the at least one processor is configured to: based at least in part on a determination that the determined minimum offset is not greater than the second threshold: using an interpolation filter as the type of the determined smoothing filter, wherein the interpolation filter comprises a 4 tap cubic filter; and intra-predicting the block of video data using the interpolation filter without applying reference pixel smoothing.
Aspect 66: the apparatus of aspect 58, wherein the at least one processor is configured to: the low pass filter is determined to be of the type of smoothing filter based at least in part on the determination that the intra prediction mode is an integer angle mode and the determination that the minimum offset is greater than the second threshold.
Aspect 67: the apparatus of any one of aspects 67, wherein the at least one processor is configured to: reference pixel smoothing is performed using a large tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, wherein the large tap low pass filter applies a greater degree of reference pixel smoothing than a small tap low pass filter.
Aspect 68: the apparatus of aspect 67, wherein the at least one processor is configured to: a small tap low pass filter is used for reference pixel smoothing based at least in part on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold, wherein the small tap low pass filter applies a lesser degree of reference pixel smoothing than a large tap low pass filter.
Aspect 69: the apparatus of any one of aspects 53-68, wherein the at least one processor is configured to: the intra-prediction mode is determined to be an integer angle mode based at least in part on comparing a slope of the intra-prediction mode to one or more pixel locations determined from the width of the block and the height of the block.
Aspect 70: the apparatus of any one of aspects 53-69, wherein the at least one processor is configured to: determining that an offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than a second threshold; and intra-predicting the block of video data using a cubic interpolation filter based on determining that the offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than the second threshold.
Aspect 71: the apparatus of aspect 70, wherein the at least one processor is configured to: reference line extension is performed using a weak interpolation filter, wherein: performing the reference line extension using the weak interpolation filter before performing the intra prediction using the cubic interpolation filter; and the cubic interpolation filter has a higher cut-off frequency than the weak interpolation filter and applies a greater degree of smoothing than the weak interpolation filter.
Aspect 72: the apparatus of aspect 71, wherein the weak interpolation filter comprises a 4 tap sinc-based interpolation filter and a 6 bit 4 tap interpolation filter.
Aspect 73: the apparatus of any one of aspects 53 through 72, wherein the type of smoothing filter is signaled in the video bitstream.
Aspect 74: the apparatus of any of aspects 53-73, wherein the type of smoothing filter is signaled for each of a set of prediction blocks, codec Tree Units (CTUs), slices, or sequences.
Aspect 75: the apparatus of any one of aspects 53-74, wherein the at least one processor is configured to: the type of smoothing filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in the video bitstream.
Aspect 76: the apparatus of any one of aspects 53-75, wherein the at least one processor is configured to: determining a residual data block for the block of video data; and decoding the block of video data using the residual block of data and a predictive block determined based on the intra prediction of the block of video data.
Aspect 77: the apparatus of any one of aspects 53-75, wherein the at least one processor is configured to: an encoded video bitstream is generated that includes information associated with the block of video data.
Aspect 78: the apparatus of aspect 77, further comprising: such that the encoded video bitstream is stored in the at least one memory.
Aspect 79: the apparatus of any one of aspects 77 or 78, further comprising: a transmitter configured to transmit the encoded video bitstream.
Aspect 80: a method of processing video data, the method comprising: determining an intra-prediction mode for predicting a block of video data; determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and intra-predicting the block of video data using the determined type of smoothing filter and the intra-prediction mode.
Aspect 81: the method of aspect 80, further comprising: based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, using a first smoothing interpolation filter as the type of smoothing filter determined; and determining reference pixels for intra prediction of the block of video data using the first smooth interpolation filter.
Aspect 82: the method of aspect 81, wherein the first smooth interpolation filter comprises a 6 tap gaussian filter.
Aspect 83: the method of any one of aspects 80 to 82, further comprising: based at least in part on determining that at least one of the width of the block and the height of the block is not greater than the first threshold, using a second smoothing interpolation filter as the type of smoothing filter determined; and determining reference pixels for intra prediction of the block of video data using the second smooth interpolation filter.
Aspect 84: the method of aspect 83, wherein the second smooth interpolation filter comprises a 4 tap gaussian filter.
Aspect 85: the method of any one of aspects 80 to 84, further comprising: determining a minimum offset between an angular direction of the intra prediction mode and one of a vertical intra prediction mode and a horizontal intra prediction mode; and determining the type of smoothing filter for the block of video data based on comparing the determined minimum offset to a second threshold.
Aspect 86: the method of aspect 85, further comprising: the low pass filter is determined to be of the type of smoothing filter based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra prediction mode is an integer angle mode associated with an integer value reference pixel location.
Aspect 87: the method of aspect 86, wherein the low pass filter performs reference pixel smoothing without interpolation, wherein the low pass filter comprises a [1 2 1] filter.
Aspect 88: the method of aspect 85, further comprising: the method may further include determining a gaussian filter as the type of smoothing filter based at least in part on the determination that the determined minimum offset is greater than the second threshold and the determination that the intra-prediction mode is a fractional angle mode associated with a fractional value reference pixel location.
Aspect 89: the method of aspect 88, wherein the gaussian filter performs smooth interpolation without reference pixel smoothing.
Aspect 90: the method of aspect 88, wherein the gaussian filter comprises a 6 tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is greater than the first threshold.
Aspect 91: the method of aspect 88, wherein the gaussian filter comprises a 4-tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold.
Aspect 92: the method of aspect 85, further comprising: based at least in part on a determination that the determined minimum offset is not greater than the second threshold: using an interpolation filter as the type of the determined smoothing filter, wherein the interpolation filter comprises a 4 tap cubic filter; and intra-predicting the block of video data using the interpolation filter without applying reference pixel smoothing.
Aspect 93: the method of aspect 85, further comprising: the low pass filter is determined to be of the type of smoothing filter based at least in part on the determination that the intra prediction mode is an integer angle mode and the determination that the minimum offset is greater than the second threshold.
Aspect 94: the method of aspect 93, further comprising: applying reference pixel smoothing using a large tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, wherein the large tap low pass filter applies a greater degree of reference pixel smoothing than a small tap low pass filter.
Aspect 95: the method of aspect 93, further comprising: applying reference pixel smoothing using a small tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold, wherein the small tap low pass filter applies a lesser degree of reference pixel smoothing than a large tap low pass filter.
Aspect 96: the method of any one of aspects 80 to 95, further comprising: the intra-prediction mode is determined to be an integer angle mode based at least in part on comparing a slope of the intra-prediction mode to one or more pixel locations determined from the width of the block and the height of the block.
Aspect 97: the method of any one of aspects 80 to 96, further comprising: determining that an offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than a second threshold; and intra-predicting the block of video data using a cubic interpolation filter based on determining that the offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than the second threshold.
Aspect 98: the method of aspect 97, further comprising: reference line extension is performed using a weak interpolation filter, wherein: performing the reference line extension using the weak interpolation filter before performing the intra prediction using the cubic interpolation filter; and the cubic interpolation filter has a higher cut-off frequency than the weak interpolation filter and applies a greater degree of smoothing than the weak interpolation filter.
Aspect 99: the method of aspect 98, wherein the weak interpolation filter comprises a 4 tap sinc-based interpolation filter and a 6 bit 4 tap interpolation filter.
Aspect 100: the method of any of aspects 80 to 99, wherein the type of smoothing filter is signaled in the video bitstream.
Aspect 101: the method of any of aspects 80 to 100, wherein the type of smoothing filter is signaled for each of a set of prediction blocks, codec Tree Units (CTUs), slices, or sequences.
Aspect 102: the method of any one of aspects 80 to 101, further comprising: the type of smoothing filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in the video bitstream.
Aspect 103: the method of any one of aspects 80 to 102, further comprising: determining a residual data block for the block of video data; and decoding the block of video data using the residual block of data and a predictive block determined based on the intra prediction of the block of video data.
Aspect 104: the method of any one of aspects 80 to 102, further comprising: an encoded video bitstream is generated that includes information associated with the block of video data.
Aspect 105: the method of aspect 104, further comprising: the encoded video bitstream is stored.
Aspect 106: the method of any one of aspects 104 or 105, further comprising: the encoded video bitstream is transmitted.
Aspect 107: a computer readable medium having instructions stored thereon which, when executed by a processor, perform the method of any of aspects 53 to 106.
Aspect 108: an apparatus comprising means for performing the operations of any one of aspects 53 to 106.

Claims (50)

1. An apparatus for processing video data, comprising:
at least one memory; and
at least one processor coupled to the at least one memory and configured to:
determining an intra-prediction mode for predicting a block of video data;
determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and
and performing intra prediction on the video data block by using the determined type of the smoothing filter and the intra prediction mode.
2. The apparatus of claim 1, wherein the at least one processor is configured to:
based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, using a first smoothing interpolation filter as a determined type of smoothing filter; and
A reference pixel for intra prediction of the block of video data is determined using the first smooth interpolation filter.
3. The apparatus of claim 2, wherein the first smooth interpolation filter comprises a 6 tap gaussian filter.
4. The apparatus of claim 1, wherein the at least one processor is configured to:
based at least in part on determining that at least one of the width of the block and the height of the block is not greater than the first threshold, using a second smoothing interpolation filter as the determined type of smoothing filter; and
a reference pixel for intra prediction of the block of video data is determined using the second smooth interpolation filter.
5. The apparatus of claim 4, wherein the second smooth interpolation filter comprises a 4 tap gaussian filter.
6. The apparatus of claim 1, wherein the at least one processor is configured to:
determining a minimum offset between an angular direction of the intra-prediction mode and one of a vertical intra-prediction mode and a horizontal intra-prediction mode; and
a type of the smoothing filter for the block of video data is determined based on comparing the determined minimum offset to a second threshold.
7. The apparatus of claim 6, wherein the at least one processor is configured to:
a low pass filter is determined to be of the smoothing filter type based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra prediction mode is an integer angle mode associated with an integer value reference pixel location.
8. The device of claim 7, wherein the low pass filter performs reference pixel smoothing without interpolation, the low pass filter comprising a [1 2 1] filter.
9. The apparatus of claim 6, wherein the at least one processor is configured to:
a gaussian filter is determined to be of the smoothing filter type based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra-prediction mode is a fractional angle mode associated with a fractional value reference pixel location.
10. The device of claim 9, wherein the gaussian filter performs smooth interpolation without reference pixel smoothing.
11. The device of claim 9, wherein the gaussian filter comprises a 6 tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is greater than the first threshold.
12. The device of claim 9, wherein the gaussian filter comprises a 4-tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold.
13. The apparatus of claim 6, wherein the at least one processor is configured to: based at least in part on a determination that the determined minimum offset is not greater than the second threshold:
using an interpolation filter as the determined type of smoothing filter, wherein the interpolation filter comprises a 4 tap cubic filter; and
the interpolation filter is used to intra-predict the block of video data without applying reference pixel smoothing.
14. The apparatus of claim 6, wherein the at least one processor is configured to:
a low pass filter is determined to be of the smoothing filter type based at least in part on the determination that the intra prediction mode is an integer angle mode and the determination that the minimum offset determined is greater than the second threshold.
15. The apparatus of claim 14, wherein the at least one processor is configured to:
Reference pixel smoothing is performed using a large tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, wherein the large tap low pass filter applies a greater degree of reference pixel smoothing than a small tap low pass filter.
16. The apparatus of claim 14, wherein the at least one processor is configured to:
reference pixel smoothing is performed using a small tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold, wherein the small tap low pass filter applies a lesser degree of reference pixel smoothing than a large tap low pass filter.
17. The apparatus of claim 1, wherein the at least one processor is configured to:
the intra-prediction mode is determined to be an integer angle mode based at least in part on comparing a slope of the intra-prediction mode to one or more pixel locations determined from the width of the block and the height of the block.
18. The apparatus of claim 1, wherein the at least one processor is configured to:
Determining that an offset between an angular direction of the intra-prediction mode and a vertical intra-prediction mode or a horizontal intra-prediction mode is less than a second threshold; and
intra-predicting the block of video data using a cubic interpolation filter based on determining that the offset between the angular direction of the intra-prediction mode and the vertical intra-prediction mode or the horizontal intra-prediction mode is less than the second threshold.
19. The apparatus of claim 18, wherein the at least one processor is configured to: reference line extension is performed using a weak interpolation filter, wherein:
performing the reference line extension using the weak interpolation filter before performing the intra prediction using the cubic interpolation filter; and
the cubic interpolation filter has a higher cut-off frequency than the weak interpolation filter and applies a greater degree of smoothing than the weak interpolation filter.
20. The apparatus of claim 19, wherein the weak interpolation filter comprises a 4 tap sinc-based interpolation filter and a 6 bit 4 tap interpolation filter.
21. The device of claim 1, wherein the type of smoothing filter is signaled in a video bitstream.
22. The device of claim 1, wherein a type of the smoothing filter is signaled for each of a set of prediction blocks, codec Tree Units (CTUs), slices, or sequences.
23. The apparatus of claim 1, wherein the at least one processor is configured to:
the type of smoothing filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in a video bitstream.
24. The apparatus of claim 1, wherein the at least one processor is configured to:
determining a residual block of data for the block of video data; and
the block of video data is decoded using the residual data block and a predictive block determined based on the intra prediction of the block of video data.
25. The apparatus of claim 1, wherein the at least one processor is configured to:
an encoded video bitstream is generated that includes information associated with the block of video data.
26. The apparatus of claim 25, further comprising:
such that the encoded video bitstream is stored in the at least one memory.
27. The apparatus of claim 25, further comprising:
a transmitter configured to transmit the encoded video bitstream.
28. A method of processing video data, the method comprising:
determining an intra-prediction mode for predicting a block of video data;
determining a type of smoothing filter for the block of video data, wherein the type of smoothing filter is determined based at least in part on comparing at least one of a width of the block of video data and a height of the block of video data to a first threshold; and
and performing intra prediction on the video data block by using the determined type of the smoothing filter and the intra prediction mode.
29. The method of claim 28, further comprising:
based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, using a first smoothing interpolation filter as a determined type of smoothing filter; and
a reference pixel for intra prediction of the block of video data is determined using the first smooth interpolation filter.
30. The method of claim 29, wherein the first smooth interpolation filter comprises a 6 tap gaussian filter.
31. The method of claim 28, further comprising:
based at least in part on determining that at least one of the width of the block and the height of the block is not greater than the first threshold, using a second smoothing interpolation filter as the determined type of smoothing filter; and
a reference pixel for intra prediction of the block of video data is determined using the second smooth interpolation filter.
32. The method of claim 31, wherein the second smooth interpolation filter comprises a 4-tap gaussian filter.
33. The method of claim 28, further comprising:
determining a minimum offset between an angular direction of the intra-prediction mode and one of a vertical intra-prediction mode and a horizontal intra-prediction mode; and
a type of the smoothing filter for the block of video data is determined based on comparing the determined minimum offset to a second threshold.
34. The method of claim 33, further comprising:
a low pass filter is determined to be of the smoothing filter type based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra prediction mode is an integer angle mode associated with an integer value reference pixel location.
35. The method of claim 34, wherein the low pass filter performs reference pixel smoothing without interpolation, wherein the low pass filter comprises a [1 2 1] filter.
36. The method of claim 33, further comprising:
a gaussian filter is determined to be of the smoothing filter type based at least in part on a determination that the determined minimum offset is greater than the second threshold and a determination that the intra-prediction mode is a fractional angle mode associated with a fractional value reference pixel location.
37. The method of claim 36, wherein the gaussian filter performs smooth interpolation without reference pixel smoothing.
38. The method of claim 36, wherein the gaussian filter comprises a 6 tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is greater than the first threshold.
39. The method of claim 36, wherein the gaussian filter comprises a 4-tap gaussian filter based on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold.
40. The method of claim 33, further comprising: based at least in part on a determination that the determined minimum offset is not greater than the second threshold:
using an interpolation filter as the determined type of smoothing filter, wherein the interpolation filter comprises a 4 tap cubic filter; and
the interpolation filter is used to intra-predict the block of video data without applying reference pixel smoothing.
41. The method of claim 33, further comprising: a low pass filter is determined to be of the smoothing filter type based at least in part on a determination that the intra prediction mode is an integer angle mode and a determination that the minimum offset between the intra prediction mode and a horizontal or vertical mode is greater than the second threshold.
42. The method of claim 41, further comprising:
applying reference pixel smoothing using a large tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is greater than the first threshold, wherein the large tap low pass filter applies a greater degree of reference pixel smoothing than a small tap low pass filter.
43. The method of claim 41, further comprising:
applying reference pixel smoothing using a small tap low pass filter based at least in part on a determination that at least one of the width of the block and the height of the block is not greater than the first threshold, wherein the small tap low pass filter applies a lesser degree of reference pixel smoothing than a large tap low pass filter.
44. The method of claim 28, further comprising: the intra-prediction mode is determined to be an integer angle mode based at least in part on comparing a slope of the intra-prediction mode to one or more pixel locations determined from the width of the block and the height of the block.
45. The method of claim 28, further comprising:
determining that an offset between an angular direction of the intra-prediction mode and a vertical intra-prediction mode or a horizontal intra-prediction mode is less than a second threshold; and
based on a determination that the determined offset is less than the second threshold, intra-prediction is performed on the block of video data using a cubic interpolation filter.
46. The method of claim 45, further comprising: reference line extension is performed using a weak interpolation filter, wherein:
Performing the reference line extension using the weak interpolation filter before performing the intra prediction using the cubic interpolation filter; and
the cubic interpolation filter has a higher cut-off frequency than the weak interpolation filter and applies a greater degree of smoothing than the weak interpolation filter.
47. The method of claim 28, wherein the type of smoothing filter is signaled in a video bitstream.
48. The method of claim 28, further comprising: the type of smoothing filter is determined based on at least one of the width of the block and the height of the block without using information explicitly signaled in a video bitstream.
49. The method of claim 28, further comprising:
determining a residual block of data for the block of video data; and
the block of video data is decoded using the residual data block and a predictive block determined based on the intra prediction of the block of video data.
50. The method of claim 28, further comprising:
an encoded video bitstream is generated that includes information associated with the block of video data.
CN202180084615.5A 2020-12-22 2021-12-20 Intra prediction using an enhanced interpolation filter Pending CN116648911A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/129,437 2020-12-22
US17/645,024 US20220201329A1 (en) 2020-12-22 2021-12-17 Intra prediction using enhanced interpolation filters
US17/645,024 2021-12-17
PCT/US2021/073040 WO2022140765A1 (en) 2020-12-22 2021-12-20 Intra prediction using enhanced interpolation filters

Publications (1)

Publication Number Publication Date
CN116648911A true CN116648911A (en) 2023-08-25

Family

ID=87640435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180084615.5A Pending CN116648911A (en) 2020-12-22 2021-12-20 Intra prediction using an enhanced interpolation filter

Country Status (1)

Country Link
CN (1) CN116648911A (en)

Similar Documents

Publication Publication Date Title
CN113196775B (en) Virtual search area for current picture reference (CPR) and intra block copy (IBC)
TWI775780B (en) Systems and methods of switching interpolation filters
CN113711595B (en) Block-based quantized residual domain pulse codec modulation allocation for intra prediction mode derivation
US11563933B2 (en) Reference picture resampling with switchable filters
CN114128277B (en) Palette predictor update for local dual trees
WO2017019656A1 (en) Methods and systems of restricting bi-prediction in video coding
CN114982246B (en) Adaptive rounding of loop filters
AU2021256044A1 (en) Adaptive loop filtering for color format support
WO2023177501A1 (en) Decoder-side motion vector refinement (dmvr) inter prediction using shared interpolation filters and reference pixels
TW202226836A (en) Overlapped block motion compensation
WO2023107790A1 (en) Adaptive film grain synthesis
KR20230123949A (en) Intra prediction using enhanced interpolation filters
US20220201329A1 (en) Intra prediction using enhanced interpolation filters
CN116648911A (en) Intra prediction using an enhanced interpolation filter
US20240015326A1 (en) Non-separable transform for inter-coded blocks
US20230124010A1 (en) Histogram of gradient generation
WO2024011065A1 (en) Non-separable transform for inter-coded blocks
AU2021397790A1 (en) Low complexity history usage for rice parameter derivation for high bit-depth video coding
CN117769835A (en) Green metadata signaling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40092084

Country of ref document: HK