CN117616752A - High level syntax for picture resampling - Google Patents

High level syntax for picture resampling Download PDF

Info

Publication number
CN117616752A
CN117616752A CN202280048686.4A CN202280048686A CN117616752A CN 117616752 A CN117616752 A CN 117616752A CN 202280048686 A CN202280048686 A CN 202280048686A CN 117616752 A CN117616752 A CN 117616752A
Authority
CN
China
Prior art keywords
filter
filtering
pictures
metadata
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280048686.4A
Other languages
Chinese (zh)
Inventor
T·波里尔
F·莱莱昂内克
K·纳赛尔
G·马丁-科谢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
InterDigital CE Patent Holdings SAS
Original Assignee
InterDigital CE Patent Holdings SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by InterDigital CE Patent Holdings SAS filed Critical InterDigital CE Patent Holdings SAS
Publication of CN117616752A publication Critical patent/CN117616752A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Abstract

A method, the method comprising: decoding (701) a picture of a plurality of pictures representing a video sequence from video data; obtaining (702) parameters of a filter determined from metadata associated with the video data, the metadata including at least one first information specifying a subset of the plurality of pictures to which the filter is applied; and applying (703) a filter to the decoded picture in response to the metadata.

Description

High level syntax for picture resampling
1. Technical field
At least one of the present embodiments relates generally to methods, apparatuses and signals for controlling post-filtering processing intended to resample pictures of video content.
2. Background art
To achieve high compression efficiency, video coding schemes typically employ prediction and transformation to exploit spatial and temporal redundancy in video content. During encoding, pictures of video content are divided into blocks of samples (i.e., pixels), and then these blocks are partitioned into one or more sub-blocks, hereinafter referred to as original sub-blocks. Intra or inter prediction is then applied to each sub-block to exploit intra or inter image correlation. Regardless of the prediction method (intra or inter) used, a predictor sub-block is determined for each original sub-block. Then, sub-blocks representing the differences between the original sub-blocks and the predictor sub-blocks (commonly denoted as prediction error sub-blocks, prediction residual sub-blocks, or simply residual sub-blocks) are transformed, quantized, and entropy encoded to generate an encoded video stream. To reconstruct video, the compressed data is decoded by the inverse process corresponding to transformation, quantization, and entropy encoding.
The use of post-filtering is supported by the definition of adapted metadata by the international standard known as universal video coding (VVC) being developed by previous generation video compression standards such as MPEG-4/AVC (ISO/CEI 14496-10), HEVC (ISO/IEC 23008-2-MPEG-H Part 2, efficient video coding/ITU-T h.265) or joint collaboration team consisting of ITU-T and ISO/IEC experts, known as joint video expert team (jfet). For example, supplemental Enhancement Information (SEI) messages are defined to convey some post-filtering parameters.
In VVC, a new tool called reference picture resampling (Reference Picture Resampling, RPR) allows to encode sequences of pictures, where the picture resolution is heterogeneous.
Fig. 1 shows the application of the RPR tool. In fig. 1, picture 4 is derived from temporally predicting picture 3. Picture 3 is derived from temporally predicting picture 2. Picture 2 is derived from temporally predicting picture 1. Since picture 4 and picture 3 have different resolutions, picture 3 is upsampled to the resolution of picture 4 during the decoding process. Picture 3 and picture 2 have the same resolution. No upsampling or downsampling is applied to picture 2 for temporal prediction. Picture 1 is larger than picture 2. Downsampling is applied to picture 1 during decoding to temporally predict picture 2. However, the resampling process applied to temporal prediction is typically applied at the block level, such that no resampled pictures are available at the output of the decoder. Only pictures at their reconstruction resolution are available, video sequences encoded with pictures at different resolutions being output as pictures at their encoded resolutions. A post-resampling filtering process is therefore required to homogenize the picture resolution.
The post-filtering SEI messages defined so far are mainly designed to specify filters intended to improve the subjective quality of the output picture. These SEI messages are not designed for specifying resampling filters and, let alone, for video sequences comprising pictures with heterogeneous resolutions. In practice, these SEI messages are designed to perform the same post-filtering on all pictures of a video sequence, whereas the resampling post-filtering process, which aims to homogenize the picture resolution, cannot perform the same processing on pictures of different resolutions.
It is desirable to propose a solution that allows to overcome the above-mentioned problems. In particular, it is desirable to propose a solution that allows to specify a post-resampling filter and that is suitable for the specific case of video sequences with heterogeneous coding resolutions that require homogenization of these resolutions.
3. Summary of the invention
In a first aspect, one or more of the embodiments of the present invention provide a method comprising:
decoding a current picture of a plurality of pictures representing a video sequence from a portion of a bitstream; obtaining parameters of a filter determined from metadata embedded in a bitstream, the metadata including at least one first information specifying a subset of a plurality of pictures on which the filter is to be applied; and applying a filter to the decoded current picture in response to the metadata.
In an embodiment, the filter is a resampling filter.
In an embodiment, the filter is a separable filter, and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
In an embodiment, the filter is intended to be applied to luminance and chrominance components of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
In an embodiment, the at least one first information specifying filter is applied only to pictures having a resolution different from the maximum resolution.
In an embodiment, the metadata includes second information specifying a filtering method of the plurality of filtering methods.
In an embodiment, the plurality of filtering methods includes luminance filtering, chrominance filtering, bilinear filtering, directional cubic convolution interpolation, iterative curvature-based interpolation, edge-guided image interpolation, and depth-learning-based filtering methods.
In a second aspect, one or more of the embodiments of the invention provide a method comprising:
Encoding a plurality of pictures representing a video sequence in a portion of a bitstream; and encoding metadata representing a filter in the bitstream, the metadata including at least one first information specifying a subset of the plurality of pictures on which the filter is to be applied.
In an embodiment, the filter is a resampling filter.
In an embodiment, the filter is a separable filter, and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
In an embodiment, the filter is intended to be applied to luminance and chrominance components of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
In an embodiment, the at least one first information specifying filter is applied only to pictures having a resolution different from the maximum resolution.
In an embodiment, the metadata includes second information specifying a filtering method of the plurality of filtering methods.
In an embodiment, the plurality of filtering methods includes luminance filtering, chrominance filtering, bilinear filtering, directional cubic convolution interpolation, interpolation based on iterative curvature, edge-directed image interpolation, and depth-learning based filtering methods.
In a third aspect, one or more of the embodiments of the invention provide an apparatus comprising electronic circuitry adapted to:
decoding a current picture of a plurality of pictures representing a video sequence from a portion of a bitstream; obtaining parameters of a filter determined from metadata embedded in a bitstream, the metadata including at least one first information specifying a subset of a plurality of pictures on which the filter is to be applied; and applying a filter to the decoded current picture in response to the metadata.
In an embodiment, the filter is a resampling filter.
In an embodiment, the filter is a separable filter, and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
In an embodiment, the filter is intended to be applied to luminance and chrominance components of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
In an embodiment, the at least one first information specifying filter is applied only to pictures having a resolution different from the maximum resolution.
In an embodiment, the metadata includes second information specifying a filtering method of the plurality of filtering methods.
In an embodiment, the plurality of filtering methods includes luminance filtering, chrominance filtering, bilinear filtering, directional cubic convolution interpolation, iterative curvature-based interpolation, edge-guided image interpolation, and depth-learning-based filtering methods.
In a fourth aspect, one or more of the embodiments of the invention provide an apparatus comprising electronic circuitry adapted to:
encoding a plurality of pictures representing a video sequence in a portion of a bitstream; and encoding metadata representing a filter in the bitstream, the metadata including at least one first information specifying a subset of the plurality of pictures on which the filter is to be applied.
In an embodiment, the filter is a resampling filter.
In an embodiment, the filter is a separable filter, and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
In an embodiment, the filter is intended to be applied to luminance and chrominance components of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
In an embodiment, the at least one first information specifying filter is applied only to pictures having a resolution different from the maximum resolution.
In an embodiment, the metadata includes second information specifying a filtering method of the plurality of filtering methods.
In an embodiment, the plurality of filtering methods includes luminance filtering, chrominance filtering, bilinear filtering, directional cubic convolution interpolation, iterative curvature-based interpolation, edge-guided image interpolation, and depth-learning-based filtering methods.
In a fifth aspect, one or more of the present embodiments provide a signal comprising metadata representing a filter and associated with a plurality of pictures representing a video sequence, the metadata comprising at least one information specifying a subset of the plurality of pictures on which the filter is to be applied.
In a sixth aspect, one or more embodiments of the invention provide a computer program comprising program code instructions for implementing the method according to the first or second aspect.
In a seventh aspect, one or more embodiments of the present invention provide a non-transitory information storage medium storing program code instructions for implementing the method according to the first or second aspect.
4. Description of the drawings
FIG. 1 illustrates an application of a reference picture resampling tool;
fig. 2 schematically shows an example of the partitions undergone by a pixel picture of an original video;
fig. 3 schematically depicts a method for encoding a video stream;
fig. 4 schematically depicts a method for decoding an encoded video stream;
FIG. 5A schematically illustrates an example of a video streaming system in which embodiments are implemented;
FIG. 5B schematically illustrates an example of a hardware architecture of a processing module capable of implementing an encoding module or a decoding module, in which various aspects and embodiments are implemented;
FIG. 5C illustrates a block diagram of an example of a first system in which various aspects and embodiments are implemented;
FIG. 5D illustrates a block diagram of an example of a second system in which aspects and embodiments may be implemented;
fig. 6 schematically shows an example of a method for encoding pictures of a video sequence and metadata allowing to control the resampling of these pictures; the method comprises the steps of,
fig. 7 schematically shows an example of a method for reconstructing pictures, the method comprising resampling the pictures in response to metadata.
5. Detailed description of the preferred embodiments
The following examples of implementations are described in the context of video formats similar to VVCs. However, these embodiments are not limited to video encoding/decoding methods corresponding to VVC. These embodiments are particularly applicable to any video format that allows generating a video stream comprising pictures with different resolutions, and/or wherein the reconstructed resolution of a picture may be different from its display resolution. Such formats include, for example, standard HEVC, S-HVC (Scalable High Efficiency Video Coding ), AVC, SVC (Scalable Video Coding, scalable video coding), EVC (Essential Video Coding, base video coding/MPEG-5), AV1, and VP9.
Fig. 2, 3 and 4 introduce examples of video formats.
Fig. 2 shows an example of the partitioning experienced by a pixel picture 21 of an original video sequence 20. A pixel is considered herein to be composed of three components: one luminance component and two chrominance components. However, other types of pixels may include fewer or more components (e.g., only a luminance component or an additional depth component or a transparency component).
A picture is divided into a plurality of encoding entities. First, as denoted by reference numeral 23 in fig. 2, a picture is divided into a grid of blocks called Coding Tree Units (CTUs). The CTU consists of n x n blocks of luma samples and two corresponding blocks of chroma samples. N is typically a power of two, e.g., a maximum of "128". Second, the picture is divided into one or more CTU groups. For example, the picture may be divided into one or more tile rows and tile columns, a tile being a sequence of CTUs covering a rectangular region of the picture. In some cases, a tile may be divided into one or more bricks, each consisting of at least one row of CTUs within the tile. Above the concept of tiles and bricks, there is another encoding entity, called slice, which may contain at least one tile or at least one brick of a tile of a picture.
In the example of fig. 2, as indicated by reference numeral 22, the picture 21 is divided into three slices S1, S2 and S3 of a raster scan slice pattern, each slice comprising a plurality of tiles (not shown), each tile comprising only one brick.
As indicated by reference numeral 24 in fig. 1, the CTU may be partitioned into a hierarchical tree of one or more sub-blocks, referred to as Coding Units (CUs). CTUs are the root (i.e., parent nodes) of a hierarchical tree and may be partitioned into multiple CUs (i.e., child nodes). If each CU is not further partitioned into smaller CUs, each CU becomes a leaf of the hierarchical tree; or if each CU is further partitioned into smaller CUs (i.e., child nodes), each CU becomes the parent node of the smaller CU.
In the example of fig. 1, CTUs 14 are first partitioned into "4" square CUs using quadtree type partitioning. The CU in the upper left corner is a leaf of the hierarchical tree because it is not further partitioned, i.e., it is not the parent of any other CU. The upper right CU is further partitioned into "4" smaller square CUs using quadtree type partitioning again. The CU in the lower right corner is partitioned vertically into "2" rectangular CUs using binary tree type partitioning. The lower left CU is partitioned vertically into "3" rectangular CUs using a trigeminal tree type partition.
During picture coding, the partitioning is adaptive, each CTU being partitioned so as to optimize the compression efficiency of the CTU criterion.
Pre-emphasis occurs in HEVCConcepts of a measurement unit (PU) and a Transformation Unit (TU). In fact, in HEVC, the coding entity for prediction (i.e., PU) and the coding entity for transform (i.e., TU) may be sub-partitions of the CU. For example, as represented in fig. 1, a CU of size 2n×2n may be partitioned into PUs 2411 of size n×2n or size 2n×n. In addition, the CU may be partitioned into "4" TUs 2412 or sizes of N "16" TUs of (i).
It may be noted that in VVC, the boundaries of TUs and PUs are aligned over the boundaries of CUs, except for some special cases. Thus, a CU generally includes one TU and one PU.
In this application, the term "block" or "picture block" may be used to refer to either of CTU, CU, PU and TU. In addition, the term "block" or "picture block" may be used to refer to macroblocks, partitions, and sub-blocks specified in the H.264/AVC or other video coding standard, and more generally to arrays of samples of numerous sizes.
In this application, the terms "reconstruct" and "decode" are used interchangeably, the terms "pixel" and "sample" are used interchangeably, and the terms "image", "picture", "sub-picture", "slice" and "frame" are used interchangeably. Typically, but not necessarily, the term "reconstruction" is used on the encoder side, while "decoding" is used on the decoder side.
Fig. 3 schematically depicts a method performed by an encoding module for encoding a video stream. Variations of this method for encoding are envisaged, but for clarity the method for encoding of fig. 3 is described below, without describing all the contemplated variations.
The current original image of the original video sequence may be pre-processed prior to encoding. For example, in step 301, a color transform (e.g., a conversion from RGB 4:4:4 to YCbCr 4:2:0) is applied to the current original picture, or a remapping is applied to the current original picture components in order to obtain a more resilient to compression signal distribution (e.g., using histogram equalization of one of the color components). In addition, the preprocessing 301 may include resampling (downsampling or upsampling). Resampling may be applied to some pictures such that the resulting bitstream may include pictures of original resolution and pictures of other resolutions. Resampling typically involves downsampling and is used to reduce the bit rate of the generated bit stream. However, up-sampling is also possible. The picture obtained by preprocessing is hereinafter referred to as a preprocessed picture.
The encoding of the preprocessed picture starts with a partitioning of the preprocessed picture during step 302, as described in relation to fig. 1. Thus, the preprocessed picture partition is CTU, CU, PU, TU or the like. For each block, the encoding module determines an encoding mode between intra prediction and inter prediction.
Intra prediction comprises predicting pixels of the current block from a prediction block derived from pixels of a reconstructed block located in causal vicinity of the current block to be encoded according to an intra prediction method during step 303. The result of intra prediction is a prediction direction indicating which pixels of a nearby block are used, and a residual block obtained by calculating the difference between the current block and the predicted block.
Inter prediction involves predicting pixels of a current block from a block of pixels (referred to as a reference block) of a picture preceding or following the current picture (this picture is referred to as a reference picture). During encoding of the current block according to the inter prediction method, the block of the reference picture closest to the current block is determined by the motion estimation step 304 according to the similarity criterion. During step 304, a motion vector is determined that indicates a location of a reference block in a reference picture. The motion vectors are used during a motion compensation step 305 during which a residual block is calculated in the form of the difference between the current block and the reference block. In the first video compression standard, the unidirectional inter prediction mode described above is the only inter mode available. As video compression standards evolve, the family of inter modes has grown significantly and now includes many different inter modes.
During the selection step 306, a prediction mode that optimizes compression performance is selected by the encoding module according to a rate/distortion optimization criterion (i.e., RDO criterion) among the prediction modes tested (intra prediction mode, inter prediction mode).
When the prediction mode is selected, the residual block is transformed during step 307 and quantized during step 309. Note that the coding module may skip the transform and directly apply quantization to the untransformed residual signal. When the current block is encoded according to the intra prediction mode, the prediction direction and the transformed and quantized residual block are encoded by the entropy encoder during step 310. When a current block is encoded according to inter prediction, a motion vector of the block is predicted according to a prediction vector selected from a set of motion vectors corresponding to reconstructed blocks located near the block to be encoded, as appropriate. Next, during step 310, motion information is encoded by an entropy encoder in the form of a motion residual and an index for identifying a prediction vector. During step 310, the transformed and quantized residual block is encoded by an entropy encoder. Note that the coding module may bypass the transform and quantization, i.e. apply entropy coding to the residual, without applying the transform or quantization process. The result of the entropy encoding is inserted into the encoded video stream 311.
Metadata such as SEI (supplemental enhancement information) messages may be appended to the encoded video stream 311. SEI messages, for example defined in standards such as AVC, HEVC or VVC, are data containers associated with a video stream and including metadata that provides information related to the video stream.
Some SEI messages are defined as post-transmission filtering information. An example of such an SEI message is depicted in table TAB 1.
TAB1 Table
The SEI message allows the definition of filters for post-filtered pictures. The SEI message provides coefficients of the post-filter or related information for designing the post-filter. The SEI message is typically designed for use in post-filters that allow for improved subjective quality of pictures output by the decoder.
In this SEI message:
● filter_hit_size_y is a syntax element that specifies the vertical size of the filter coefficient array or related array. The value of filter_hit_size_y should be in the range of 1 to 15 (inclusive).
● filter_hit_size_x is a syntax element that specifies the horizontal size of the filter coefficient array or related array. The value of filter_hit_size_x should be in the range of 1 to 15 (inclusive).
● filter_hit_type is a syntax element identifying the type of transmitted filter prompt as specified in the following table TAB 2. The value of filter_hit_type should be in the range of 0 to 2 (inclusive). The value of filter_hit_type equal to 3 is reserved for future use. The decoder will ignore the post-filter hint SEI message with filter_hit_type equal to 3.
Value of Description of the invention
0 Coefficients of 2D-FIR filter
1 Coefficients of two 1D-FIR filters
2 Cross correlation matrix
TAB2 Table
filter_hint_value[cIdx][cy][cx]Is a syntax element that specifies the elements of the cross-correlation matrix between the filter coefficients or the original signal and the decoded signal with 16-bit precision. Filter_hit_value [ cIdx ]][cy][cx]The values of (2) should be within the following ranges: -2 31 +1 to 2 31 -1 (inclusive). cIdx designates the relevant color component, cy denotes a counter in the vertical direction, and cx denotes a counter in the horizontal direction. According to the value of the filter_hit_type, the following applies:
● If the filter_hit_type is equal to 0, the coefficients of a 2-dimensional finite impulse response (finite impulse response, FIR) filter with a size of filter_hit_size_y
● Otherwise, if the filter_hit_type is equal to 1, the filter coefficients of the two 1-dimensional FIR filters are transmitted. In this case, filter_hit_size_y should be equal to 2. The index cy equal to 0 designates the filter coefficients of the horizontal filter, and the index cy equal to 1 designates the filter coefficients of the vertical filter. In the filtering process, a horizontal filter is first applied, and the result is filtered by a vertical filter.
● Otherwise (filter_hit_type equals 2), the transmitted hint specifies the cross-correlation matrix between the original signal and the decoded signal.
One limitation of the SEI message of table TAB1 is that it does not allow specifying a duration or time interval for the applicability of the SEI message. The SEI message is applied at the sequence level and its applicability is independent of picture resolution. Another limitation is that the number of types of filters that can be specified by the SEI message is limited. For example, it cannot specify a filter based on a neural network as the last generation filter. In addition, only filters aimed at improving the visual (subjective) quality of the picture may be specified. The resampling filter cannot be specified.
Another SEI message is defined as carrying resampling information specific to chrominance. The SEI message is depicted in table TAB 3.
Table TAB3
The SEI message of TAB3 signals one downsampling process and one upsampling process for the chroma components of the decoded picture. When using the resampling process signaled in the SEI message of TAB3, it is desirable to minimize degradation to this color component for any number of upsampling and downsampling iterations performed on the decoded picture.
ver_chroma_filter_idc is a syntax element that identifies the vertical components of the downsampled and upsampled set of filters as specified in table TAB 4. Based on the value of ver_chroma_filter_idc, the value of verFilterCoeff [ ] is derived from table TAB 5. The value of ver_chroma_filter_idc should be in the range of 0 to 2 (inclusive). Values of ver_chroma_filter_idc greater than 2 are reserved for future use.
When ver_chroma_filter_idc is equal to 0, the chroma resampling filter in the vertical direction is not specified.
When chroma format idc is equal to 1, ver chroma filter idc should be equal to 1 or 2.
Table TAB4
TAB5
horchroma filter idc is a syntax element that identifies the horizontal components of the downsampled and upsampled set of filters as specified in table TAB 6. The value of horFilterCoeff [ ] is derived from table TAB7 based on the value of hor_chroma_filter_idc. The value of horchroma filter idc should be in the range of 0 to 2 (inclusive). The value of horchroma filter idc greater than 2 is reserved for future use.
When hor_chroma_filter_idc is equal to 0, the chroma resampling filter in the horizontal direction is not specified.
When chroma format idc is equal to 3, horchroma filter idc should be equal to 1 or 2.
When chroma format idc is equal to 2 and ver chroma filter idc is equal to 2, horchroma filter idc should be equal to 0.
The requirement for bitstream consistency is that ver_chroma_filter_idc and hor_chroma_filter_idc must both be equal to 0.
TAB6 Table
TAB7
The SEI message of TAB3 is subject to the same limitations as the SEI message of TAB 1. In addition, it is applicable only to chromaticity.
After the quantization step 309, the current block is reconstructed so that the pixels corresponding to the block are available for future prediction. This reconstruction stage is also called a prediction loop. Thus, inverse quantization is applied to the transformed and quantized residual block during step 312, and inverse transformation is applied during step 313. The prediction block of the block is reconstructed according to the prediction mode for the block obtained during step 314. If the current block is encoded according to the inter prediction mode, the encoding module applies motion compensation using the motion vector of the current block during step 316 as appropriate in order to identify a reference block for the current block. If the current block is encoded according to the intra prediction mode, a prediction block of the current block is reconstructed using a prediction direction corresponding to the current block during step 315. The prediction block and the reconstructed residual block are added to obtain a reconstructed current block.
After reconstruction, in-loop filtering, which aims to reduce coding artifacts, is applied to the reconstructed block during step 317. This filtering is called in-loop filtering because it occurs in the prediction loop to obtain the same reference picture at the decoder as the encoder, avoiding drift between the encoding and decoding processes. In-loop filtering tools include deblocking filtering, sample Adaptive Offset (SAO), and Adaptive Loop Filtering (ALF).
When reconstructing a block, the block is inserted during step 318 into a reconstructed picture stored in a memory 319 of reconstructed pictures, commonly referred to as a Decoded Picture Buffer (DPB). The reconstructed picture thus stored may then be used as a reference picture for other pictures to be encoded.
When the RPR is activated, samples from pictures (i.e., at least a portion) stored in the DPB are resampled in step 320 when used for motion estimation and compensation. The resampling step (320) and the motion compensation step (316) may be combined in one single sample interpolation step. Note that the motion estimation step (304), which actually uses motion compensation, will also use a single sample interpolation step in this case.
Fig. 4 schematically depicts a method performed by a decoding module for decoding an encoded video stream 311 encoded according to the method described in relation to fig. 3. Variations of this method for decoding are envisaged, but for clarity the method for decoding of fig. 4 is described below, without describing all the contemplated variations.
Decoding is performed block by block. For the current block, the decoding starts with entropy decoding of the current block during step 410. Entropy decoding allows obtaining a prediction mode of a block.
Entropy decoding allows the prediction vector index, motion residual and residual block to be obtained when appropriate if the block has been encoded according to the inter prediction mode. During step 408, the motion vector of the current block is reconstructed using the prediction vector index and the motion residual.
Entropy decoding allows obtaining a prediction direction and a residual block if the block has been encoded according to an intra prediction mode. Steps 412, 413, 414, 415, 416 and 417 implemented by the decoding module are identical in all respects to steps 412, 413, 414, 415, 416 and 417 implemented by the encoding module, respectively. In step 418, the decoded block is saved in the decoded picture and the decoded picture is stored in the DPB 419. When the decoding module decodes a given picture, the picture stored in DPB 419 is the same as the picture stored in DPB 319 by the encoding module during encoding of the given picture. The decoded pictures may also be output by a decoding module for display, for example. When the RPR is activated, samples (i.e., at least a portion) of the picture used as the reference picture are resampled to the resolution of the predicted picture in step 420. The resampling step (420) and the motion compensation step (416) may be combined in one single sample interpolation step.
Since displaying video sequences with heterogeneous picture resolution would be unacceptable to the user, when RPR is used, resampling is applied to the reconstructed pictures in post-processing step 421 to homogenize their resolution.
As already mentioned above, one problem with the post-filtering SEI messages defined so far (and described with respect to tables TAB1 and TAB 3) is that these SEI messages are not suitable for video sequences comprising pictures encoded at different resolutions. Hereinafter, a new SEI message is proposed to address this problem.
Post-processing step 421 may also include an inverse color transform (e.g., conversion from YCbCr 4:2:0 to RGB 4:4:4), performing an inverse mapping of the inverse of the remapping process performed in the preprocessing of step 301, and post-filtering for improving the reconstructed picture based on, for example, the filter parameters provided in the SEI message.
Fig. 5A depicts an example of a context in which the following embodiments may be implemented.
In fig. 4A, an apparatus 51, which may be a camera, a storage device, a computer, a server, or any device capable of delivering a video stream, uses a communication channel 52 to transmit the video stream to a system 53. The video stream is encoded and transmitted by the device 51 or received and/or stored by the device 51 and subsequently transmitted. The communication channel 52 is a wired (e.g., internet or ethernet) or wireless (e.g., wiFi, 3G, 4G, or 5G) network link.
The system 53, which may be a set-top box, for example, receives and decodes the video stream to generate a decoded picture sequence.
The obtained decoded picture sequence is then transmitted to a display system 55 using a communication channel 54, which may be a wired or wireless network. The display system 55 then displays the picture.
In one embodiment, system 53 is included in display system 55. In this case, the system 53 and the display 55 are included in a TV, a computer, a tablet, a smart phone, a head mounted display, or the like.
Fig. 5B schematically illustrates an example of a hardware architecture of a processing module 500 modified according to various aspects and embodiments that is capable of implementing an encoding module or a decoding module capable of implementing the method for encoding of fig. 3 and the method for decoding of fig. 4, respectively. When the device 51 is responsible for encoding a video stream, an encoding module is included in the device, for example. The decoding module is for example comprised in the system 53. As a non-limiting example, the processing module 500 includes the following connected by a communication bus 5005: processors or CPUs (central processing units) 5000, general purpose computers, special purpose computers, and processors based on a multi-core architecture, which encompass one or more microprocessors; a Random Access Memory (RAM) 5001; a Read Only Memory (ROM) 5002; a storage unit 5003 that may include nonvolatile memory and/or volatile memory including, but not limited to, electrically Erasable Programmable Read Only Memory (EEPROM), read Only Memory (ROM), programmable Read Only Memory (PROM), random Access Memory (RAM), dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), flash memory, magnetic disk drive and/or optical disk drive, or a storage medium reader such as an SD (secure digital) card reader and/or a Hard Disk Drive (HDD) and/or a network accessible storage device; at least one communication interface 5004 for exchanging data with other modules, devices or equipment. Communication interface 5004 may include, but is not limited to, a transceiver configured to transmit and receive data over a communication channel. Communication interface 5004 may include, but is not limited to, a modem or a network card.
If the processing module 500 implements a decoding module, the communication interface 5004 enables, for example, the processing module 500 to receive the encoded video stream and provide a decoded sequence of pictures. If the processing module 500 implements an encoding module, the communication interface 5004 enables, for example, the processing module 500 to receive an original picture data sequence to be encoded and provide an encoded video stream.
The processor 5000 can execute instructions loaded into the RAM 5001 from the ROM 5002, an external memory (not shown), a storage medium, or a communication network. When the processing module 500 is powered on, the processor 5000 can read instructions from the RAM 5001 and execute the instructions. These instructions form a computer program that causes the decoding method described in relation to fig. 4, the encoding method described in relation to fig. 3 and the methods described in relation to fig. 6 or fig. 7 to be implemented, for example, by the processor 5000, these methods comprising the various aspects and embodiments described in this document below.
All or some of the algorithms and steps of the methods of fig. 3, 4, 6 and 7 may be implemented in software by execution of a set of instructions by a programmable machine such as a DSP (digital signal processor) or microcontroller, or may be implemented in hardware by a machine or special purpose component such as an FPGA (field programmable gate array) or ASIC (application specific integrated circuit).
It can be seen that microprocessors, general purpose computers, special purpose computers, processors with or without a multi-core architecture, DSPs, microcontrollers, FPGAs and ASICs are electronic circuits adapted to at least partially implement the methods of fig. 3, 4, 6 and 7.
Fig. 5D shows a block diagram of an example of a system 53 in which various aspects and embodiments are implemented. The system 53 may be embodied as a device including the various components described below and configured to perform one or more of the aspects and embodiments described in this document. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptop computers, smart phones, tablet computers, digital multimedia set-top boxes, digital television receivers, personal video recording systems, connected household appliances, and head mounted displays. The elements of system 53 may be embodied in a single Integrated Circuit (IC), multiple ICs, and/or discrete components, alone or in combination. For example, in at least one embodiment, the system 53 includes one processing module 500 that implements a decoding module. In various embodiments, system 53 is communicatively coupled to one or more other systems or other electronic devices via, for example, a communication bus or through dedicated input and/or output ports. In various embodiments, the system 53 is configured to implement one or more of the aspects described in this document.
The input of the processing module 500 may be provided by various input modules as indicated in block 531. Such input modules include, but are not limited to: (i) A Radio Frequency (RF) module that receives an RF signal transmitted over the air, for example, by a broadcaster; (ii) A Component (COMP) input module (or a set of COMP input modules); (iii) a Universal Serial Bus (USB) input module; and/or (iv) a High Definition Multimedia Interface (HDMI) input module. Other examples not shown in fig. 5D include composite video.
In various embodiments, the input modules of block 531 have associated respective input processing elements as known in the art. For example, the RF module may be associated with elements suitable for: (i) select the desired frequency (also referred to as a select signal, or band limit the signal to one frequency band), (ii) down-convert the selected signal, (iii) band limit again to a narrower frequency band to select a signal band that may be referred to as a channel in some embodiments, for example, (iv) demodulate the down-converted and band limited signal, (v) perform error correction, and (vi) de-multiplex to select the desired data packet stream. The RF module of various embodiments includes one or more elements for performing these functions, such as a frequency selector, a signal selector, a band limiter, a channel selector, a filter, a down-converter, a demodulator, an error corrector, and a demultiplexer. The RF section may include a tuner that performs various of these functions including, for example, down-converting the received signal to a lower frequency (e.g., intermediate or near baseband frequency) or to baseband. In one set-top box embodiment, the RF module and its associated input processing elements receive RF signals transmitted over a wired (e.g., cable) medium and perform frequency selection by filtering, down-converting, and re-filtering to a desired frequency band. Various embodiments rearrange the order of the above (and other) elements, remove some of these elements, and/or add other elements that perform similar or different functions. Adding components may include inserting components between existing components, such as an insertion amplifier and an analog-to-digital converter. In various embodiments, the RF module includes an antenna.
In addition, the USB and/or HDMI module may include a respective interface processor for connecting the system 53 to other electronic devices across a USB and/or HDMI connection. It should be appreciated that various aspects of the input processing (e.g., reed-Solomon error correction) may be implemented, for example, within a separate input processing IC or within the processing module 500, as desired. Similarly, aspects of the USB or HDMI interface processing may be implemented within a separate interface IC or within the processing module 500 as desired. The demodulated, error corrected and demultiplexed streams are provided to a processing module 500.
The various elements of system 53 may be disposed within an integrated housing. Within the integrated housing, the various elements may be interconnected and data transferred between these elements using suitable connection arrangements (e.g., internal buses known in the art, including inter-IC (I2C) buses, wiring, and printed circuit boards). For example, in system 53, processing module 500 is interconnected with other elements of system 53 via bus 5005.
The communication interface 5004 of the processing module 500 allows the system 53 to communicate over the communication channel 52. As already mentioned above, the communication channel 52 may be implemented, for example, within a wired medium and/or a wireless medium.
In various embodiments, a wireless network, such as a Wi-Fi network (e.g., IEEE 802.11 (IEEE refers to institute of electrical and electronics engineers)), is used to stream or otherwise provide data to system 53. Wi-Fi signals of these embodiments are received through communication channel 52 and communication interface 5004 suitable for Wi-Fi communication. The communication channel 52 of these embodiments is typically connected to an access point or router that provides access to external networks, including the internet, to allow streaming applications and other top-level communications. Other embodiments provide streaming data to the system 53 using the RF connection of input box 531. As described above, various embodiments provide data in a non-streaming manner. In addition, various embodiments use wireless networks other than Wi-Fi, such as cellular networks or bluetooth networks.
The system 53 may provide output signals to various output devices, including a display system 55, speakers 56, and other peripheral devices 57. The display system 55 of various embodiments includes, for example, one or more of a touch screen display, an Organic Light Emitting Diode (OLED) display, a curved display, and/or a collapsible display. The display 55 may be used in a television, tablet computer, laptop computer, cellular telephone (mobile telephone), head mounted display, or other device. The display system 55 may also be integrated with other components (e.g., as in a smart phone), or may be stand alone (e.g., an external monitor of a laptop computer). In various examples of embodiments, other peripheral devices 57 include one or more of a stand-alone digital video disc (or digital versatile disc) (DVR, which may be referred to by both terms), a disc player, a stereo system, and/or an illumination system. Various embodiments use one or more peripheral devices 57 that provide functionality based on the output of the system 53. For example, the disc player performs the function of playing the output of the system 53.
In various embodiments, control signals are communicated between the system 53 and the display system 55, speakers 56, or other peripheral devices 57 using signaling such as av.link, consumer Electronics Control (CEC), or other communication protocols that enable device-to-device control with or without user intervention. The output devices may be communicatively coupled to the system 53 via dedicated connections through respective interfaces 532, 533, and 534. Alternatively, the output device may be connected to the system 53 via the communication interface 5004 using the communication channel 52 or via the communication interface 5004 using a dedicated communication channel corresponding to the communication channel 54 in fig. 5A. The display system 55 and speaker 56 may be integrated in a single unit with other components of the system 53 in an electronic device, such as a television. In various embodiments, the display interface 532 includes a display driver, such as, for example, a timing controller (tcon) chip.
The display system 55 and speaker 56 may alternatively be separate from one or more of the other components. In various embodiments where display system 55 and speakers 56 are external components, the output signals may be provided via dedicated output connections (including, for example, HDMI ports, USB ports, or COMP outputs).
Fig. 5C shows a block diagram of an example of a system 51 in which various aspects and embodiments are implemented. System 51 is very similar to system 53. The system 51 may be embodied as a device including the various components described below and configured to perform one or more of the aspects and embodiments described in this document. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptops, smartphones, tablets, cameras, and servers. The elements of system 51 may be embodied in a single Integrated Circuit (IC), multiple ICs, and/or discrete components, alone or in combination. For example, in at least one embodiment, the system 51 includes one processing module 500 that implements an encoding module. In various embodiments, system 51 is communicatively coupled to one or more other systems or other electronic devices via, for example, a communication bus or through dedicated input and/or output ports. In various embodiments, the system 51 is configured to implement one or more of the aspects described in this document.
Inputs to the processing module 500 may be provided through various input modules as shown in block 531 already described with respect to fig. 5D.
The various elements of system 51 may be disposed within an integrated housing. Within the integrated housing, the various elements may be interconnected and data transferred between these elements using suitable connection arrangements (e.g., internal buses known in the art, including inter-IC (I2C) buses, wiring, and printed circuit boards). For example, in system 51, processing module 500 is interconnected with other elements of system 51 via bus 5005.
The communication interface 5004 of the processing module 500 allows the system 500 to communicate over the communication channel 52.
In various embodiments, a wireless network, such as a Wi-Fi network (e.g., IEEE 802.11 (IEEE refers to institute of electrical and electronics engineers)), is used to stream or otherwise provide data to system 51. Wi-Fi signals of these embodiments are received through communication channel 52 and communication interface 5004 suitable for Wi-Fi communication. The communication channel 52 of these embodiments is typically connected to an access point or router that provides access to external networks, including the internet, to allow streaming applications and other top-level communications. Other embodiments provide streaming data to the system 51 using the RF connection of the input box 531.
As described above, various embodiments provide data in a non-streaming manner. In addition, various embodiments use wireless networks other than Wi-Fi, such as cellular networks or bluetooth networks.
The data provided to system 51 may be provided in different formats. In various embodiments, such data is encoded and conforms to known video compression formats, such as AV1, VP9, VVC, HEVC, AVC, SVC, SHVC, and the like. In various embodiments, these data are raw data provided by a picture and/or audio acquisition module connected to the system 51 or included in the system 51. In this case, the processing module is responsible for the encoding of these data.
The system 51 may provide the output signal to various output devices (e.g., the system 53) capable of storing and/or decoding the output signal.
Various implementations participate in decoding. As used in this application, "decoding" may encompass all or part of a process performed on a received encoded video stream, for example, in order to produce a final output suitable for display. In various implementations, such processes include one or more processes typically performed by a decoder, such as entropy decoding, inverse quantization, inverse transformation, and prediction. In various implementations, such processes also or alternatively include processes performed by various embodying decoders described herein, such as for decoding pictures of different resolutions from an encoded video stream, for decoding SEI messages containing post-filtering information, and for resampling pictures in response to the post-filtering information.
The phrase "decoding process" is intended to refer specifically to a subset of operations or broadly to a broader decoding process, as will be clear based on the context of the specific description, and is believed to be well understood by those skilled in the art.
Various implementations participate in the encoding. In a similar manner to the discussion above regarding "decoding," as used in this application, may encompass, for example, all or part of a process performed on an input video sequence to produce an encoded video stream. In various implementations, such processes include one or more processes typically performed by an encoder, such as partitioning, prediction, transformation, quantization, and entropy encoding. In various embodiments, such processes also or alternatively include processes performed by the various embodying encoders described herein, e.g., for generating encoded video streams comprising pictures of different resolutions and for associating SEI messages comprising post-filtering information.
Whether the phrase "encoding process" refers specifically to a subset of operations or broadly refers to a broader encoding process will be apparent based on the context of the specific description and is believed to be well understood by those skilled in the art.
Note that syntax element names used herein are descriptive terms. Thus, they do not exclude the use of other syntax element names.
When the figures are presented as flow charts, it should be understood that they also provide block diagrams of corresponding devices. Similarly, when the figures are presented as block diagrams, it should be understood that they also provide a flow chart of the corresponding method/process.
Various embodiments are directed to rate distortion optimization. In particular, during the encoding process, a balance or tradeoff between rate and distortion is typically considered. Rate distortion optimization is typically expressed as minimizing a rate distortion function, which is a weighted sum of rate and distortion. There are different approaches to solving the rate distortion optimization problem. For example, these methods may be based on extensive testing of all coding options (including all considered modes or coding parameter values) and evaluating their coding costs and the associated distortion of the reconstructed signal after encoding and decoding completely. Faster methods may also be used to reduce coding complexity, in particular the calculation of approximate distortion based on prediction or prediction residual signals instead of reconstructed residual signals. A mix of the two methods may also be used, such as by using approximate distortion for only some of the possible coding options, and full distortion for other coding options. Other methods evaluate only a subset of the possible coding options. More generally, many methods employ any of a variety of techniques to perform the optimization, but the optimization is not necessarily a complete assessment of both coding cost and associated distortion.
The specific implementations and aspects described herein may be implemented in, for example, a method or process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (e.g., discussed only as a method), the implementation of the features discussed may also be implemented in other forms (e.g., an apparatus or program). The apparatus may be implemented in, for example, suitable hardware, software and firmware. The method may be implemented, for example, in a processor, which refers generally to a processing device including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end users.
Reference to "one embodiment" or "an embodiment" or "one embodiment" or "an embodiment" and other variations thereof means that a particular feature, structure, characteristic, etc., described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one embodiment" or "in an embodiment" and any other variations that occur in various places throughout this application are not necessarily all referring to the same embodiment.
In addition, the present application may be directed to "determining" various information. Determining information may include one or more of, for example, estimating information, calculating information, predicting information, retrieving information from memory, or obtaining information from another device, module, or from a user, for example.
Furthermore, the present application may relate to "accessing" various information. The access information may include, for example, one or more of receiving information, retrieving information (e.g., from memory), storing information, moving information, copying information, computing information, determining information, predicting information, or estimating information.
In addition, the present application may be directed to "receiving" various information. As with "access," receipt is intended to be a broad term. Receiving information may include, for example, one or more of accessing information or retrieving information (e.g., from memory). Further, during operations such as, for example, storing information, processing information, transmitting information, moving information, copying information, erasing information, computing information, determining information, predicting information, or estimating information, the "receiving" is typically engaged in one way or another.
It should be understood that, for example, in the case of "a/B", "a and/or B" and "at least one of a and B", "one or more of a and B", the use of any of the following "/", "and/or" and "at least one", "one or more" is intended to cover the selection of only the first listed option (a), or only the second listed option (B), or both options (a and B). As a further example, in the case of "A, B and/or C" and "at least one of A, B and C", "one or more of A, B and C", such phrases are intended to encompass selection of only the first listed option (a), or selection of only the second listed option (B), or selection of only the third listed option (C), or selection of only the first and second listed options (a and B), or selection of only the first and third listed options (a and C), or selection of only the second and third listed options (B and C), or selection of all three options (a and B and C). As will be apparent to one of ordinary skill in the art and related arts, this extends to as many items as are listed.
Also, as used herein, the word "signaling" refers to (among other things) indicating something to the corresponding decoder. For example, in certain embodiments, the encoder signals the use of some encoding tools. Thus, in one embodiment, the same parameters may be used on both the encoder side and the decoder side. Thus, for example, an encoder may transmit (explicit signaling) certain parameters to a decoder so that the decoder may use the same certain parameters. Conversely, if the decoder already has specific parameters, among others, signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the specific parameters. By avoiding transmission of any actual functions, bit savings are achieved in various embodiments. It should be appreciated that the signaling may be implemented in various ways. For example, in various implementations, information is signaled to a corresponding decoder using one or more syntax elements, flags, and the like. Although the foregoing relates to the verb form of the word "signal," the word "signal" may also be used herein as a noun.
It will be apparent to one of ordinary skill in the art that implementations may produce various signals formatted to carry, for example, storable or transmittable information. The information may include, for example, instructions for performing a method or data resulting from one of the implementations. For example, the signal may be formatted to carry the encoded video stream and SEI messages of the implementation. Such signals may be formatted, for example, as electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or baseband signals. Formatting may include, for example, encoding the encoded video stream and modulating a carrier wave using the encoded video stream. The information carried by the signal may be, for example, analog or digital information. It is well known that signals may be transmitted over a variety of different wired or wireless links. The signal may be stored on a processor readable medium.
In the following, various implementations propose two new SEI messages that are more suitable for video sequences that include pictures with heterogeneous picture resolutions. The proposed SEI message differs from those of tables TAB1 and TAB3 in that:
● They provide filters for resampling improvement and not subjective quality improvement. These two new SEI messages may be used jointly or independently to improve both aspects;
● They allow signaling various types of filters, such as, for example, resampling filters based on Neural Networks (NN);
● They may be applied to all pictures or only to a subset of the pictures that need to be resampled;
● They can be applied for a specific duration and can be replaced by successive SEI messages with different parameters, instead of being applied at the sequence level;
● They can be applied to both luminance and chrominance.
Table TAB8 describes a first embodiment of a new SEI message, referred to as a resampling SEI message (resampling SEI message), which is better suited for video sequences comprising pictures with heterogeneous picture resolutions.
/>
Table TAB8
The rescaling_id is an identifier for the purpose of identifying resampling information. The value of the resetting_id should be 0 to 2 32 -2 (inclusive).
A syntax element reset_cancel_flag equal to 1 indicates that the resampling SEI message cancels the persistence of any previous resampling SEI message applied to the current layer in output order, as defined in VVC, for example. A reset _ cancel _ flag equal to 0 indicates that resampling information follows.
The syntax element rescaling_persistence_flag specifies the persistence of the resampling SEI message. A resampling_persistence_flag equal to 0 specifies that the resampling information is applied only to the current decoded picture. Let picA be the current picture. The reset_persistence_flag equal to 1 specifies that for the current layer, the resampling information persists in output order until either of the following conditions is true:
● The new CLVS of the current layer (e.g., the encoded layer video sequence defined in VVC) starts.
● The picture picB in the current layer has a picture order count that is greater than the picture order count of the picture picA in the access unit of the resampling SEI message having the same value of resmpling_id (i.e., the picture number in decoding order).
The syntax element rescaling_tap_luma_hor_minus1 specifies the size of the filter coefficient array applied to the picture to be resampled. The value of resetting_tap_luma_hor_minus1 should be in the range of 1 to 15 (inclusive).
Syntax element rescaling_luma_hor_coeff [ i ]]The filter coefficients applied to the luminance component of the picture to be resampled, which have a 16-bit precision, are specified. resetting_luma_hor_coeff [ i ]]Should have a value of-2 31 +1 to 2 31 -1.
The syntax element use_alternative_filter_for_vertical_luma specifies whether resampling information for vertical filtering of the luma component is different from horizontal luma information.
The syntax element num_resetting_filters_luma_hor specifies the number of filters signaled for luma resampling in the horizontal direction.
The syntax element rescaling_tap_luma_ver_minus1 specifies the size of the filter coefficient array applied to the picture to be resampled. The value of resetting_tap_luma_ver_minus1 should be in the range of 1 to 15 (inclusive).
Syntax element reset_luma_ver_coeff [ i ]]The filter coefficients applied to the luminance component of the picture to be resampled, which have a 16-bit precision, are specified. resetting_luma_ver_coeff [ i ]]Should have a value of-2 31 +1 to 2 31 -1.
The syntax element use_alternative_filter_for_chroma specifies whether to encode the resampled information for chroma.
The requirement for bitstream consistency is that sps_chroma_format_idc (which specifies chroma samples relative to luma samples) should not equal 0 representing a single color sequence when use_alternative_filter_for_chroma equals 1.
The syntax element num_sampling_filters_chroma_hor specifies the number of filters signaled for chroma resampling in the horizontal direction.
The syntax element rescaling_tap_chroma_hor_minus1 specifies the size of the filter coefficient array applied to the picture to be resampled. The value of resetting_tap_chroma_minus1 should be in the range of 1 to 15 (inclusive).
Syntax element reset_chroma_hor_coeff [ i ]]The filter coefficients applied to the chroma component of the picture to be resampled, which have a 16-bit precision, are specified. resetting_chroma_coeff [ i ]]Should have a value of-2 31 +1 to 2 31 -1.
The syntax element use_alternative_filter_for_vertical_chroma specifies whether the resampling information for the vertical filtering of the chrominance components is different from the horizontal chrominance information.
The syntax element rescaling_tap_chroma_ver_minus1 specifies the size of the filter coefficient array applied to the picture to be resampled. The value of resetting_tap_chroma_ver_minus1 should be in the range of 1 to 15 (inclusive).
resampling_chroma_ver_coeff[i]The filter coefficients applied to the chroma component of the picture to be resampled, which have a 16-bit precision, are specified. resetting_chroma_ver_coeff [ i ]]Should have a value of-2 31 +1 to 2 31 -1.
The resampling SEI message enables description of both luma and chroma resampling coefficients. The resampling SEI message is particularly applicable to the case of video sequences encoded using RPR tools. In practice, these sequences consist mainly of pictures of original resolution and pictures of reduced resolution. In this case, resampling includes upsampling pictures to their original resolution at a reduced resolution. However, the application of the resampling SEI message is not limited to upsampling, and the resampling SEI message is suitable for specifying a downsampling filter, but may be any filter, such as a filter suitable for improving the subjective quality of a picture.
In some implementations, the resampling SEI message is used in combination with any post-filtering SEI message (such as those of tables TAB1 and TAB 3), with resampling applied prior to post-filtering.
In the first modification of the first embodiment, the semantics of the resampling SEI message are changed to be applied only to pictures whose resolution is not the same as the maximum resolution, and thus when pps_pic_width_in_luma_samples (which specify the width of each decoded picture of a reference Picture Parameter Set (PPS) (i.e., picture header) in units of luma samples) is not equal to sps_pic_width_max_at_luma_samples (which specify the maximum width of each decoded picture of a reference Sequence Parameter Set (SPS) (i.e., sequence header) in units of luma samples) and pps_pic_height_in_luma_samples (which specify the height of each decoded picture of a reference PPS in units of luma samples) is not equal to sps_pic_height_max_in_luma_samples (which specify the maximum height of each decoded picture of the reference SPS in units of luma samples).
In a second variant of the first embodiment, the syntax of the resampling SEI message is modified to check whether the current picture is at a lower resolution than the maximum resolution in the sequence. A second variant is shown in table TAB9, the differences between table TAB9 and table TAB8 being indicated in bold.
Table TAB9
In a second embodiment, a second SEI message denoted by table TAB10, called resampling method SEI message, is presented. In the resampling method SEI message, the index is encoded to signal the existing resampling filter, whose characteristics are known by the encoding module and decoding module.
Table TAB10
The syntax element reset_method_hor_luma identifies the resampling method for horizontal filtering of the luminance component as specified in table TAB 11. The value of reset_method_luma should be in the range of 0 to 6 (inclusive). The values 7 to 15 are reserved for future use.
The syntax element use_alternative_filter_for_vertical_luma specifies whether the resampling information for vertical filtering of the luminance component is different from the resampling information for horizontal filtering of the luminance component.
Since methods 3 through 6 in table TAB11 are inseparable, if the resemblance_method_hor_luma (respectively resemblance_method_hor_chroma) is greater than or equal to 3, we do not need to encode use_alternative_filter_for_vertical_luma (respectively use_alternative_filter_for_vertical_chroma).
The syntax element reset_method_ver_luma identifies the resampling method for vertical filtering of the luminance component as specified in table TAB 11. The value of reset_method_luma should be in the range of 0 to 6 (inclusive). The values 7 to 15 are reserved for future use.
The syntax element resampling_method_hor_chroma identifies the resampling method for horizontal filtering of the chrominance components as specified in table TAB 11. The value of resetting_method_chroma should be in the range of 0 to 6 (inclusive). The values 7 to 15 are reserved for future use.
The syntax element use_alternative_filter_for_vertical_chroma specifies whether the resampling information for vertical filtering of the chrominance components is different from the resampling information for vertical filtering of the luminance components.
Resampling method values Associated resampling method
0 Brightness filter
1 Chrominance filter
2 Bilinear filter
3 DCC (directional cubic convolution interpolation) error ≡! Not finding a reference source
4 ICBI (interpolation based on iterative curvature) error-! Not finding a reference source
5 EGII (edge guided image interpolation) error ≡! Not finding a reference source
6 Filter based on deep learning
7..15 Reservation of
Table TAB11
In a variant, this resampling method SEI message is only applied if sps_ref_pic_sampling_enabled_flag or sps_res_change_in_clvs_allowed_flag is equal to 1. The sps_ref_pic_resetting_enabled_flag equal to 1 specifies that RPR is enabled. The sps_ref_pic_resetting_enabled_flag being equal to 0 specifies that RPR is disabled. The spatial resolution of the picture specified by sps_res_change_in_clvs_allowed_flag equal to 1 may change within CLVS of the reference SPS. The sps_res_change_in_clvs_allowed_flag equal to 0 specifies that the picture spatial resolution does not change within any CLVS of the reference SPS.
In an embodiment, the resampling method SEI message is used in combination with a variant of the resampling SEI message. A variant of the resampling SEI message corresponding to this embodiment is described in table TAB 12:
/>
table TAB12
The difference between the resampled SEI messages of TAB8 and TAB12 is indicated in bold.
The syntax element use_resetting_method_sei indicates at equal to 1: if there is at least one resampling method SEI message in the bitstream, the resampling method specified in the last received resampling method SEI message should be used instead of the resampling filter specified in the resampling SEI message. If use_sampling_method_sei is equal to 0, then the resampling filter specified in the resampling SEI message is used.
In an embodiment, the resampling method SEI is independent of the resampling SEI message. In this embodiment, the resampling method SEI message may be considered as a substitute for the resampling SEI message. This embodiment uses a variant of the resampling method SEI message described in table TAB 13:
/>
table TAB13
The difference between the resampling method SEI messages of TAB10 and TAB13 is indicated in bold. It can be seen that the syntax elements rescaling_ id, resampling _cancel_flag and rescaling_persistence_flag are introduced into the resampling method SEI message with the same semantics (as described in relation to the resampling SEI message of table TAB 8).
Fig. 6 schematically shows an example of a method for encoding pictures of a video sequence and metadata allowing to control the resampling of these pictures.
The method of fig. 6 is for example implemented by the device 51 and more precisely by the processing module 500 of the device 51.
In an embodiment, the apparatus 51 receives a RAW video sequence from the input module 531.
In step 601, the processing module 500 of the apparatus 51 encodes a plurality of pictures of a RAW video sequence in a portion of a bitstream using, for example, the method of fig. 3. In an embodiment, a subset of the plurality of pictures is downsampled (upsampled accordingly) prior to encoding in step 301.
In step 602, the processing module 500 of the device 51 encodes at least one resampling SEI message and/or at least one resampling method SEI message (i.e. metadata representing a filter) in a bitstream. As described above, the resampling SEI message (or the resampling method SEI message of the table TAB 13) includes at least one syntax element (i.e., a reset_cancel_ flag, resampling _persistence_flag) that specifies a subset of a plurality of pictures on which a filter specified by the SEI message is to be applied. For example, the processing module 500 of the apparatus 51 encodes, in the bitstream, a resampling SEI message (or a resampling method SEI message of TAB 13) for each picture that needs to be resampled on the decoder side, each resampling SEI message (each resampling method SEI message of TAB13, respectively) comprising a resampling_persistence_flag equal to 0.
Fig. 7 schematically illustrates an example of a method for reconstructing a picture, the method comprising resampling the picture in response to a resampling SEI message and/or a resampling method SEI message.
The method of fig. 7 is implemented, for example, by the system 53 and more precisely by the processing module 500 of the system 53.
In step 701, the processing module 500 of the system 53 decodes a current picture of a plurality of pictures representing a video sequence from a portion of a bitstream. For example, the current picture is downsampled (up-sampled accordingly) before it is encoded.
In step 702, the processing module 500 of the system 53 obtains parameters of the filter determined from at least one resampling SEI message and/or at least one resampling method SEI message embedded in the bitstream. As already mentioned, the resampling SEI message (or the resampling method SEI message of the table TAB 13) comprises at least one syntax element (i.e. a resetting_enhancement_flag, resetting_persistence_flag) specifying a subset of the plurality of pictures on which the filter specified by the SEI message is to be applied. For example, the bitstream includes a resampling SEI message (or a resampling method SEI message of TAB 13) associated with the current picture (i.e., includes a reset_persistence_flag equal to 0). The filter is for example an upsampling (respectively downsampling) filter that allows resampling the current picture with its original resolution.
In step 703, the processing module 500 of the system 53 applies a filter to the decoded current picture in response to the resampling SEI message (and/or the resampling method SEI message).
Various embodiments are described above. The features of these embodiments may be provided separately or in any combination. Further, embodiments may include one or more of the following features, devices, or aspects, alone or in any combination, across the various claim categories and types:
● A bitstream or signal comprising one or more of the described syntax elements or variants thereof.
● A bitstream or signal comprising one or more of the described syntax elements or variants thereof is created and/or transmitted and/or received and/or decoded.
● A television, set-top box, mobile phone, tablet computer or other electronic device that performs at least one of the described embodiments.
● A television, set-top box, mobile phone, tablet or other electronic device that performs at least one of the described embodiments and displays the resulting picture (e.g., using a monitor, screen or other type of display).
● Tuning (e.g., using a tuner) a channel to receive a signal comprising an encoded video stream and to perform at least one of the described embodiments is a television, a set-top box, a mobile phone, a tablet computer, or other electronic device.
● A television, set-top box, mobile phone, tablet computer, or other electronic device that receives signals over the air (e.g., using an antenna) including encoded video streams and performs at least one of the described embodiments.
● A server, camera, mobile phone, tablet, or other electronic device that transmits signals including encoded video streams over the air (e.g., using an antenna) and performs at least one of the described embodiments.
● A server, camera, mobile phone, tablet, or other electronic device that tunes (e.g., using a tuner) a channel to transmit a signal comprising an encoded video stream and performs at least one of the described embodiments.

Claims (31)

1. A method, the method comprising:
decoding (701) a picture of a plurality of pictures representing a video sequence from video data;
obtaining (702) parameters of a filter determined from metadata associated with the video data, the metadata comprising at least one first information specifying a subset of the plurality of pictures on which the filter is to be applied; the method comprises the steps of,
-applying (703) the filter to the decoded picture in response to the metadata.
2. The method of claim 1, wherein the filter is a resampling filter.
3. The method of claim 1 or 2, wherein the filter is a separable filter and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
4. A method according to claim 1, 2 or 3, wherein the filter is intended to be applied to a luminance component and a chrominance component of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
5. The method of any preceding claim, wherein the at least one first information specifies that the filter is only applied to pictures having a resolution different from a maximum resolution.
6. A method according to any preceding claim, wherein the metadata comprises second information specifying a filtering method of a plurality of filtering methods.
7. The method of claim 6, wherein the plurality of filtering methods includes luma filtering, chroma filtering, bilinear filtering, directional cubic convolution interpolation, iterative curvature-based interpolation, edge-guided image interpolation, and depth-learning-based filtering methods.
8. A method, the method comprising:
encoding a plurality of pictures representing a video sequence in video data; the method comprises the steps of,
metadata representing a filter in the video data is encoded, the metadata including at least one first information specifying a subset of the plurality of pictures on which the filter is to be applied.
9. The method of claim 8, wherein the filter is a resampling filter.
10. The method of claim 8 or 9, wherein the filter is a separable filter and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
11. The method of claim 8, 9 or 10, wherein the filter is intended to be applied to a luminance component and a chrominance component of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
12. The method of any preceding claim 8 to 11, wherein the at least one first information specifies that the filter is only applied to pictures having a resolution different from a maximum resolution.
13. The method of any preceding claim 8 to 12, wherein the metadata comprises second information specifying a filtering method of a plurality of filtering methods.
14. The method of claim 13, wherein the plurality of filtering methods includes luma filtering, chroma filtering, bilinear filtering, directional cubic convolution interpolation, iterative curvature-based interpolation, edge-guided image interpolation, and depth-learning-based filtering methods.
15. An apparatus comprising electronic circuitry, the apparatus being adapted to:
decoding a picture of a plurality of pictures representing a video sequence from video data;
obtaining parameters of a filter determined from metadata associated with the video data, the metadata including at least one first information specifying a subset of the plurality of pictures on which the filter is to be applied; the method comprises the steps of,
the filter is applied to the decoded picture in response to the metadata.
16. The apparatus of claim 15, wherein the filter is a resampling filter.
17. The apparatus of claim 15 or 16, wherein the filter is a separable filter and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
18. The apparatus of claim 15, 16 or 17, wherein the filter is intended to be applied to a luminance component and a chrominance component of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
19. The apparatus of any preceding claim 15 to 18, wherein the at least one first information specifies that the filter is only applied to pictures having a resolution different from a maximum resolution.
20. The apparatus of any preceding claim 15 to 19, wherein the metadata comprises second information specifying a filtering method of a plurality of filtering methods.
21. The apparatus of claim 20, wherein the plurality of filtering methods includes luma filtering, chroma filtering, bilinear filtering, directional cubic convolution interpolation, iterative curvature-based interpolation, edge-guided image interpolation, and depth-learning-based filtering methods.
22. An apparatus comprising electronic circuitry, the apparatus being adapted to:
encoding a plurality of pictures representing a video sequence in video data; the method comprises the steps of,
metadata representing a filter in the video data is encoded, the metadata including at least one first information specifying a subset of the plurality of pictures on which the filter is to be applied.
23. The apparatus of claim 22, wherein the filter is a resampling filter.
24. The apparatus of claim 21 or 22, wherein the filter is a separable filter and the metadata specifies parameters of a horizontal filter and parameters of a vertical filter.
25. The apparatus of claim 21, 22 or 23, wherein the filter is intended to be applied to a luminance component and a chrominance component of each picture in the subset of pictures, and the metadata specifies parameters of the filter adapted to filter the luminance component and parameters of the filter adapted to filter the chrominance component that are different from parameters of the filter adapted to filter the luminance component.
26. The apparatus of any preceding claim 22 to 25, wherein the at least one first information specifies that the filter is only applied to pictures having a resolution different from a maximum resolution.
27. The apparatus of any preceding claim 22 to 26, wherein the metadata comprises second information specifying a filtering method of a plurality of filtering methods.
28. The apparatus of claim 27, wherein the plurality of filtering methods includes luma filtering, chroma filtering, bilinear filtering, directional cubic convolution interpolation, iterative curvature-based interpolation, edge-guided image interpolation, and depth-learning-based filtering methods.
29. A signal comprising metadata representing a filter and associated with a plurality of pictures representing a video sequence, the metadata comprising at least one information specifying a subset of the plurality of pictures on which the filter is to be applied.
30. A computer program comprising program code instructions for implementing the method of any preceding claim 1 to 14.
31. A non-transitory information storage medium storing program code instructions for implementing the method of any preceding claim 1 to 14.
CN202280048686.4A 2021-06-11 2022-05-23 High level syntax for picture resampling Pending CN117616752A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP21305804 2021-06-11
EP21305804.3 2021-06-11
PCT/EP2022/063898 WO2022258356A1 (en) 2021-06-11 2022-05-23 High-level syntax for picture resampling

Publications (1)

Publication Number Publication Date
CN117616752A true CN117616752A (en) 2024-02-27

Family

ID=76708166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280048686.4A Pending CN117616752A (en) 2021-06-11 2022-05-23 High level syntax for picture resampling

Country Status (5)

Country Link
EP (1) EP4352959A1 (en)
KR (1) KR20240018650A (en)
CN (1) CN117616752A (en)
BR (1) BR112023025800A2 (en)
WO (1) WO2022258356A1 (en)

Also Published As

Publication number Publication date
KR20240018650A (en) 2024-02-13
BR112023025800A2 (en) 2024-02-27
WO2022258356A1 (en) 2022-12-15
EP4352959A1 (en) 2024-04-17

Similar Documents

Publication Publication Date Title
EP3744092A1 (en) Method and apparatus for video encoding and decoding based on a linear model responsive to neighboring samples
US20230095387A1 (en) Neural network-based intra prediction for video encoding or decoding
CN112970264A (en) Simplification of coding modes based on neighboring sample-dependent parametric models
CN112335246B (en) Method and apparatus for adaptive coefficient set-based video encoding and decoding
CN117616752A (en) High level syntax for picture resampling
CN113261284A (en) Video encoding and decoding using multiple transform selections
US20230379482A1 (en) Spatial resolution adaptation of in-loop and post-filtering of compressed video using metadata
US20230262268A1 (en) Chroma format dependent quantization matrices for video encoding and decoding
US20230232003A1 (en) Single-index quantization matrix design for video encoding and decoding
US20220224902A1 (en) Quantization matrices selection for separate color plane mode
CN117561717A (en) High precision 4 x 4 DST7 and DCT8 transform matrices
CN117597926A (en) Coding of last significant coefficient in a block of a picture
CN117015969A (en) Metadata for signaling information representing energy consumption of decoding process
CN117813817A (en) Method and apparatus for encoding/decoding video
WO2023222521A1 (en) Sei adapted for multiple conformance points
CN117413520A (en) Large area spatial illumination compensation
CN114788275A (en) Derivation of quantization matrices for joint Cb-Cr coding
EP4289139A1 (en) Metadata for signaling information representative of an energy consumption of a decoding process
CN114270858A (en) Quantization matrix prediction for video encoding and decoding
WO2024002879A1 (en) Reconstruction by blending prediction and residual
EP4035367A1 (en) Video encoding and decoding using block area based quantization matrices
CN117880531A (en) Method and apparatus for luminance mapping with cross-component scaling
CN117280683A (en) Method and apparatus for encoding/decoding video
CN115362679A (en) Method and apparatus for video encoding and decoding
CN114128275A (en) Luminance to chrominance quantization parameter table signaling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination