CN105338361B

CN105338361B - For the method, apparatus and storage medium for being guided filtering in Video coding and decoding

Info

Publication number: CN105338361B
Application number: CN201510809673.6A
Authority: CN
Inventors: 苏冠铭; 曲晟; 尹鹏; 叶琰
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 2011-08-09
Filing date: 2012-08-09
Publication date: 2019-01-04
Anticipated expiration: 2032-08-09
Also published as: US20150319450A1; US9100660B2; US9338470B2; EP2557789B1; CN102957912B; CN102957912A; CN105338361A; HK1215771A1; EP2557789A2; US20130039430A1; EP2557789A3

Abstract

The invention discloses be guided picture up-sampling in Video coding.Encoder receives the first image of the first spatial resolution and the second image of second space resolution ratio, wherein both the first image and second image indicate same scene, the second space high resolution is in first spatial resolution.Filter is selected up-sampling the first image for the third image with spatial resolution identical with the second space resolution ratio.The filter factor for the up-sampling filter is calculated by minimizing the error metrics (such as MSE) between second image and the pixel value of the third image.The set of the filter factor of calculating is signaled to receiver (for example, as metadata).Decoder receives the first image (or it is approximate) and the metadata, and the filter factor of the identical filter and optimum choice derived with the encoder can be used to up-sample to the first image.

Description

Method, apparatus and storage medium for guided filtering in video encoding and decoding

The present application is a divisional application based on the patent application with application number 201210281757.3, application date 2012-8-9, entitled "guided image upsampling in video coding".

Cross Reference to Related Applications

The present invention claims priority from U.S. provisional patent application No.61/521,685 filed on 9/2011 and U.S. provisional patent application No.61/653,234 filed on 30/5/2012, which are incorporated herein by reference for all purposes.

Technical Field

The present invention generally relates to images. More particularly, embodiments of the present invention relate to guided up-sampling of digital images in video coding.

Background

As used herein, the term "dynamic range" (DR) may relate to the ability of the human psychovisual system (HVS) to perceive a range of intensities (e.g., luminance, brightness) in an image (e.g., from darkest to brightest). In this sense, DR is related to the "scene-related" strength. DR may also be related to the ability of the display device to adequately or approximately present an intensity range of a particular width (break). In this sense, DR is related to "display-related" intensity. Unless any point in this description explicitly specifies that a particular meaning is of particular importance, it should be inferred that it may be used interchangeably in any meaning, for example.

As used herein, the term "High Dynamic Range (HDR)" relates to a DR breadth spanning some 14-15 orders of magnitude of the Human Visual System (HVS). For example, a well-adapted person with a substantially normal (e.g., in one or more of a statistical, biometric, or ophthalmic (opthamological) sense) has an intensity range that spans about 15 orders of magnitude. Adapted persons can perceive dim light sources of as few as just a few photons. However, these same people can perceive near-burning bright intensities of midday sunlight in deserts, oceans, or snow (or even glances at the sun, however, brief glances prevent damage). Such a span is available to "adapted" people (e.g., people whose HVS has a time period that is reset and adjusted).

In contrast, the DR over which a person can simultaneously perceive a wide width in the intensity range may be slightly truncated relative to the HDR. As used herein, the term "visual dynamic range" or "Variable Dynamic Range (VDR)" may relate to the DR that is simultaneously perceptible by the HVS, either individually or interchangeably. As used herein, VDR may be associated with DR spanning 5-6 orders of magnitude. Thus, VDR represents a wide DR width, although somewhat narrower with respect to real scene-related HDR. As used herein, the term "simultaneous dynamic range" may relate to VDR.

Until recently, displays have had a significantly narrower DR than HDR or VDR. Typical Cathode Ray Tubes (CRTs), Televisions (TVs) with Liquid Crystal Displays (LCDs) and computer monitor devices using white backlight or plasma screen technology with constant fluorescence may be constrained to be about three orders of magnitude in their DR rendering capabilities. These conventional displays therefore represent a Low Dynamic Range (LDR), also called Standard Dynamic Range (SDR), with respect to VDR and HDR.

As for scalable video coding and HDTV technologies, extended picture DR typically involves a bifurcate (bifurcate) method. For example, scene-related HDR content captured with a modern HDR-capable camera may be used to generate a VDR version or an SDR version of the content, which may be displayed on a VDR display or a legacy SDR display. In one approach, generating an SDR version from a captured VDR version may involve applying a global Tone Mapping Operator (TMO) to intensity (e.g., luminance, luma) related pixel values in the HDR content. In a second approach, generating an SDR Image may involve applying a reversible operator (or predictor) to the VDR data, as described in "Extending Image Dynamic Range" in PCT application PCT/US 2011/048861 filed 2011, 8/23, w.gish et al. To conserve bandwidth or for other considerations, sending both the actual captured VDR content and SDR content simultaneously may not be the best approach.

Thus, an Inverse Tone Mapping Operator (iTMO) that is inverted with respect to the initial TMO or an inverse operator with respect to the initial predictor may be applied to the generated SDR content version, which allows for predicting a version of the initial VDR content. The predicted VDR content version can be compared to the generated VDR content. For example, subtracting the predicted VDR version from the initial VDR version may generate a residual image. The encoder may send the generated SDR content as a Base Layer (BL), any residual image as an Enhancement Layer (EL), and pack iTMO or other predictors, etc. as metadata.

Sending the EL and metadata along with its SDR content, residual and predictor in the bitstream typically consumes less bandwidth than would be consumed if both HDR and SDR content were sent directly into the bitstream. A compatible decoder receiving the bitstream sent by the encoder can decode and render the SDR on a legacy display. However, a compatible decoder may also use the residual image, the iTMO predictor, or the metadata to compute a predicted version of the HDR content therefrom for use on a more capable display.

In such layered VDR coding, images may be represented in different spatial resolutions, bit depths, color spaces, and chroma sub-sampling formats, all of which may force various computer intensive transformations from a first color format to a second color format.

As used herein, the term "color format" relates to a color representation that includes two variables: a) color space variables (e.g., RGB, YUV, YCbCr, etc.) and chroma subsampling variables (e.g., 4: 4, 4: 2: 0, etc.). For example, the VDR signal may have an RGB 4: 4 color format, while the SDR signal may have a YCbCr 4: 2: 0 color format.

As used herein, the term "upsampling" or "upscaling" relates to the process of transforming one or more color components of an image from one spatial resolution to a second, higher spatial resolution. For example, the video signal may be upsampled from a 4: 2: 0 format to a 4: 4 format.

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Accordingly, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, problems identified with respect to one or more methods should not be assumed to have been recognized in any prior art based on this section.

Drawings

Embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1 depicts an exemplary data stream for a layered coding system, according to an embodiment of the invention;

FIG. 2 depicts an exemplary layered decoding system, according to an embodiment of the invention;

fig. 3 depicts an example of guided image upsampling in encoding a residual signal in a layered codec according to an embodiment of the present invention;

FIG. 4 depicts an exemplary single-layer video coding system according to an embodiment of the invention;

FIG. 5 depicts an example input and output pixel array for upsampling by a factor of 2 using a 2D filter in accordance with an embodiment of the present invention;

FIG. 6 depicts an example input and output pixel array for upsampling by a factor of 2 using a 3D filter in accordance with an embodiment of the present invention;

FIG. 7 depicts an example process of guided image upsampling according to an embodiment of the invention;

FIG. 8 depicts an example process for guided color temporal improvement filtering, according to an embodiment of the invention.

Detailed Description

Guided image upsampling and color temporal improvement filtering in video coding is described herein. In the following description, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the present invention. However, it is understood that the invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in detail to avoid unnecessarily obscuring, or obfuscating the present invention.

SUMMARY

Example embodiments described herein relate to guided image upsampling and CTI filtering in video coding. An encoder receives a first image at a first spatial resolution and a guide image (guide image) at a second spatial resolution, wherein the first image and the guide image both represent the same scene, the second spatial resolution being higher than the first spatial resolution. A filter is selected to upsample the first image into a third image having the same spatial resolution as the second spatial resolution. The filter coefficients for the upsampling filter are calculated by minimizing an error metric, such as Mean Square Error (MSE), between the pixel values of the guide image and the third image. The set of calculated filter coefficients is signaled (signal) to the receiver (e.g., as metadata). A decoder receives the metadata and the first image or an approximation of the first image and may upsample the received image using the same optimization filters and filter coefficients as derived by the encoder.

In another embodiment, an encoder receives a target image to be encoded, the target image comprising a first target color component image and a second target color component image. Encoding and decoding the image to generate an encoded image and a decoded image, the decoded image including a decoded first color component image and a decoded second color component image. Selecting a Color Transient Improvement (CTI) filter to filter pixels of the decoded second color component image to generate an output color component image. Calculating CTI filter coefficients for the CTI filter based at least in part on minimizing an error metric between pixel values of the output color component image and corresponding pixel values of a second color component image in the target image. The CTI filter coefficients are signaled to the receiver (e.g., as metadata). A decoder receives the metadata and the encoded image. After decoding the encoded image, it may filter the decoded image using the same CTI filter coefficients as those derived by the encoder.

Example System in case of guided image upsampling

Image downsampling and upsampling transforms play a key role in video coding since they affect not only coding efficiency but also overall image quality. Improper spatial scaling transformation (spatial scaling transformation) can lead to erroneous colors, especially along the edges of the image. Unlike conventional "blind" image upsampling techniques, in which, given an input image having sub-sampled color components (e.g., chroma in the YCbCr 4: 2: 0 format), the color components are upsampled using only the available information within the image itself, embodiments of the present invention may also utilize information from other images in the video processing channel (pipeline).

FIG. 1 depicts an example image processing system 100 implementing guided image upsampling according to an embodiment of the present invention. System 100 represents an embodiment of a layered encoder in which two layers (a base layer 135 and an enhancement or residual layer 175) are used to encode the input signal V105.

In an embodiment, the input signal V105 may be represented as an input VDR signal represented by 16 or more bits per color component in a 4: 4: 4 color format (e.g., RGB 4: 4: 4). The dynamic range reduction process 110 may process the VDR signal to generate a signal S' 112. The signal S' may have the same spatial resolution as the signal V or a lower spatial resolution than the signal V. The signal S' may be represented by a lower bit depth resolution than V, e.g. 12 bits per color component. The signal S' may be in the same color format as V, or in other embodiments, it may be in a different color format.

In an embodiment, color transformation 120 may precede encoding 130, where S' may be transformed into another color format (e.g., YCbCr). The transform 120 may also combine sub-samples of one or more color components (e.g., from 4: 4 to 4: 2: 0). The encoded signal 135 may be transmitted as a base layer. In an embodiment, encoding 130 may be implemented by any existing video encoder, such as an MPEG-2 or MPEG-4 video encoder specified by the Motion Pictures Expert Group (MPEG) specification.

Enhancement layer 175 may be generated by decoding signal 135, generating a predicted value (165) of initial VDR signal V, and subtracting initial V (105) from its predicted value (165) to generate residual signal 175. In an embodiment, predictor 160 may be implemented using a multivariate multi-regression model as described in PCT application PCT/US 2012/033605 filed 2012, 4/13/2012 by G-MSu et al. Since the encoded signal 135 and the input signal 105 have different color formats and resolutions, the color transform process 150 transforms the output of the decoder 140 into a color format and resolution that matches the color format and resolution of the input V105. For example, unit 150 may convert input data 143 from YCbCr 4: 2: 0 to RGB 4: 4.

In the conventional sampling method, the up-sampled output 153 may be derived based on only the sub-sampled input 143 and an interpolation filter having a fixed filter coefficient. Conversely, in an embodiment, the upsampling process in 150 may perform upsampling using data from the subsampled input 143 and data having a known full spatial resolution (e.g., input S' 112 or V105). In an embodiment, the guided image upsampling process 150 may signal the upsampling related parameters (e.g., interpolation filter coefficients) to the remaining system channels (e.g., as metadata signals 155 or 167).

As used herein, the term "metadata" may relate to any ancillary information that is transmitted as part of the encoded bitstream and assists the decoder in presenting the decoded images. The metadata may include, but is not limited to, the following information: color space or gamut information, dynamic range information, tone mapping information, or other predictors, upscaling (upscaling), and quantizer operators, such as those described herein.

In an embodiment, the filter coefficients used to convert the YCbCr 4: 2: 0 data to YCbCr 4: 4 data are estimated by minimizing a measure of error (e.g., Mean Square Error (MSE)) between the prediction up-sampled value (e.g., output 153 of color transform 150) and the full spatial resolution input guide image (e.g., S' 112 or V105). The same filter may also be applied during the upsampling process in the decoder.

Fig. 2 depicts an example implementation of a layered video decoder according to an embodiment. The decoding system 200 receives an encoded bitstream 202 that includes a base layer 227, an enhancement layer (or residual) 222, and metadata 225 that are extracted after demultiplexing 220. For example, in a VDR-SDR system, the base layer 227 may represent an SDR representation of the encoded signal, and the metadata 225 may include information used in the encoder regarding prediction (160) and color transform operations (150). The encoded base layer 227 may be decoded using the base layer decoder 210 to output a decoder SDR signal S212. The encoded residual 222 may be decoded (240), dequantized (250), and added to the output 295 of the predictor 290 to generate the output VDR signal V270.

The color conversion unit 230 may incorporate upsampling (e.g., from 4: 2: 0 to 4: 4). Instead of using a "blind" upsampling technique that relies only on the input data 212, the upsampling processor in 230 may extract and apply upsampling related parameters (e.g., interpolation filter coefficients) signaled by the encoder using the metadata 225 (or 155). Such guided upsampling during decoding may result in a video signal with improved visual quality without additional computational cost.

The guided image upsampling technique may be applied to other processing steps in the video encoding channel as well. Fig. 3 depicts a system for encoding-decoding residual layers in layered coding implemented in accordance with an embodiment of the invention. The residual signal R305 (175) may be in RGB 4: 4 color format. The color transform unit 310 precedes residual coding 330 (e.g., using an MPEG-4 video encoder), which typically operates in YCbCr 4: 2: 0 format, where the input 305 may be color transformed and downsampled to YCbCr 4: 2: 0. In parallel, the upsampling processor 320 may be directed by the input R305 to compute optimized upsampled filter coefficients according to embodiments of the present invention. These filter coefficients may be signaled to the decoder, for example, using metadata 322. At the receiver, the color transform unit 350 may extract optimized upsampled filter coefficients from the metadata 322 and apply them during the process of upsampling the decoded YCbCr 4: 2: 0 data 342 into RGB 4: 4 data 355.

FIG. 4 depicts an example single-layer video coding system according to an embodiment of this disclosure. As depicted in fig. 4, processor 410 may process input signal V_I405 is downsampled. The downsampled output 412 is encoded by a video encoder 430 (e.g., an MPEG-4 encoder) and sent to a decoder including a video decoder440 (e.g., an MPEG-4 decoder) and an upsampling processor 450 (400-D). At the encoder (400-E), the upsampling unit 420 may perform guided upsampling according to the methods described in embodiments of the present invention, and may use the upsampled data from the full resolution input V_I405 and sub-sampling the signal 412 to derive optimized upsampled filter coefficients. The optimized filter coefficients may be signaled to the decoder (400-D), for example, using metadata 422. Thus, the output 442 of the video encoder (430) may be upsampled in the processing unit 450 using the same optimized set of coefficients as generated by the upsampling unit 420.

Filter design for guided image upsampling

Ordinary 2D inseparable filter

For simplicity and without loss of generality, given an input image comprising multiple color components (e.g., YCbCr or RGB), we consider a guided image upsampling process of a single color component (e.g., Cb or R). The method described herein may be repeated as desired for any image color component that needs to be upsampled.

In an example embodiment for upsampling by a factor of 2 using a 2D interpolation or upsampling filter, fig. 5 depicts a 3 x 3 array of known input pixels represented by circles 520 to be upsampled by a factor of 2 and generate a 6 x 6 pixel array as depicted by squares 510. The input pixel is denoted c_jAnd the predicted or upsampled pixel is represented asFour new estimated pixels (e.g., 510-22, 510-23, 510-32, and 510-33) are then generated for each input pixel (e.g., 520-11). In an embodiment, the upsampling process may be represented as a non-separable Finite Impulse Response (FIR) filter:

for i ═ 0, 1, 2, 3, (1)

Wherein,filter coefficients are represented (for x and y from 0 to 2). For example, in FIG. 5,shown as squares 510-22.

Now consider a guide image D having the same target resolution as the output of the upsampling process, where the pixel data is represented here asFor example, as described in fig. 4, the guide image may be the input image V_I405. Deriving filter coefficientsCan be expressed as an error minimization problem. In an embodiment, the filter coefficients are generated such that a minimum Mean Square Error (MSE) between the guide pixel data and the predicted upsampled pixel data is minimized. This can be formulated as:

where p represents the total output pixel in the up-sampled image. For example, if the output upsampled image has a resolution of m rows and n columns, then for each color component, p equals mn.

Equation (2) can be solved using various known numerical techniques (e.g., the techniques described in "Applied multivariable statistical analysis," r.a. johnson, and d.w. wicher n, 5th Edition, precision Hall, 2001). In an example embodiment, the filter coefficients are represented as

For a 3 × 3 filter, a vector of input pixel data (520) is given

A p x 9 matrix C may be formed from the input sub-sampled data as

Similarly, the p × 4 guide data matrix R may be formed as

Wherein

The estimated (upsampled) pixel may then be represented as

Wherein

From equations (1-9), the estimated pixel can be expressed as

In an embodiment, the optimization objective is to minimize the estimated error between the instructional input data and the data sampled on the estimated output, which can be expressed from equation (2):

a solution optimized in the least squares sense can be obtained via the following.

M＝(C^TC)^-1C^TR. (12)

From equation (3), for each color component (e.g., Cb and Cr) to be upsampled, this embodiment may compute 4 ＊ 9 ═ 36 coefficients, which may be communicated to the decoder using metadata or other means.

With respect to the embodiment of fig. 1, the upsampling in the color conversion unit 150 may use R pixels from the input image S' 112 or the input image V105 as a guide image.

As depicted in FIG. 5 and represented by equation (1), an embodiment of 1: 2 pixel upsampling filtering utilizes a 3 × 32D filter. The methods described herein can be readily extended to support other filter sizes (e.g., 2 x2, 5 x 5, 7 x 7, etc.) and other upsampling ratios (e.g., 1: 3,1: 4, etc.). The method can also be applied to simpler 1D filters. In the embodiment, in the formulas (1) and (3), the setting can be performed byAndand only for(for y ═ 0, 1, and 2) to define a 3-tap 1D filter.

Symmetric 2D inseparable filter

In equation (1), each upsampled image pixel is predicted using its own set of filter coefficients. To reduce the number of coefficients that need to be signaled to the decoder, the total number of filter coefficients can be reduced by sharing them among the upsampled pixels. In an example embodiment, the same filter coefficients may be used to estimate the odd and even lines of the upsampled image data.

Returning to fig. 5, in an embodiment, the upsampling process may be formulated as follows. For each input sample c_jEvaluating two samples in odd rows(510-22 and 510-23), two samples are estimated in even rows(510-32 and 510-33). The upsampling process can be represented using two 2D FIR filters sharing their filter coefficients for each upsampled pixel.

For i ═ 0, 1, (13) and

for i ═ 2, 3(14)

Similar to the previous embodiment, let

A 9x2 matrix representing the filter coefficients used in equations (13) and (14).

Is provided with

And

and input guidance data is set

The estimated upsampled pixels may then be represented as

And from equation (11) and equations (16-26), the optimized solution M in the least-squares sense can be obtained via the following

M＝(C^TC)^-1C^TR. (27)

Ordinary 3D inseparable filter

In a particular embodiment, at least one color component (e.g., the Y (luminance) component in YCbCr or the green (G) component in RGB) may not undergo any downsampling, thus preserving significant edge-related information. The upsampling process of the remaining color components (e.g., Cb and Cr in YCbCr) can be further improved if the previously described 2D upsampling filter is extended to become a 3D filter as described herein.

FIG. 6 depicts a method for upsampling by a factor of 2 by applying a 3D filter according to an embodiment of the present inventionLike example input and output pixel arrays. As before, the input pixel of the color component to be upsampled (e.g., Cb) is denoted cj, and the predicted or upsampled pixel is denoted cjThe pixel of the same guide image but from another color component (e.g., Y) available at full resolution is denoted Y_j(630). In an embodiment, it is assumed without loss of generality that data is being input (c)_j) Using a 3 x 32D filter and inputting the guide data (e.g. y)_jPixels) using a 4 x2 filter on separate components to compute each upsampled pixel, four new estimated pixels can be generated for each input pixel using a 3D filter described below:

for i ═ 0, 1, 2, 3, (28)

Wherein,andrepresenting the filter coefficients. These filter coefficients can be derived by solving an optimization problem as described previously

Wherein R is a matrix of guide image data defined in formula (6), and

wherein, givenTo be provided withA 9 × 4 matrix of coefficients (similar to the matrix represented in equation (3)) andto be provided withAn 8 x 4 matrix of coefficients,

the matrix C represents the observed pixel data. Given a

And

then

And as before, an optimal solution in the mean square sense can be defined as

M＝(C^TC)^-1C^TR. (37)

Symmetric 3D inseparable filter

The previously described method can be easily extended to embodiments where 2D and 3D filters with different numbers of pixel coefficients and different upsampling ratios can be applied.

As described in the case of the 2D filter, in equation (28), if the even and odd lines of the upsampled image data are estimated using the same filter coefficients, the number of coefficients that need to be signaled to the decoder can be reduced. In an embodiment, the prediction formula may be expressed as

For i ═ 0 and 1

For i ═ 2 and 3

Using equations (38) and (39) and the same methods previously applied, optimal filter coefficients can be derivedAnd

example processing for guided upsampling

FIG. 7 illustrates an example process of guided image upsampling according to an example implementation of the present invention. Processing begins at step 710, where an upsampling processor (e.g., processor 150, 320, or 420) receives an input image (e.g., image 143, 312, or 412) to be upsampled and an input guide image (e.g., input 112, 305, or 405), where the guide image has a spatial resolution higher than the spatial resolution of the input image. Given these two inputs, in step 720, an upsampling filter (e.g., a normal 2D non-separable filter or a symmetric 3D non-separable filter) is determined. The upsampling filter may be fixed and known by both the encoder and the decoder, or as previously described, the upsampling process may select a filter among various upsampling filters including (but not necessarily limited to) a 1D upsampling filter, a normal 2D upsampling filter, a symmetric 2D upsampling filter, a normal 3D upsampling filter, or a symmetric 3D upsampling filter. The selection of the upsampling filter may be performed using various methods that may take into account a number of criteria, including available computational and memory resources, MSE prediction error using a particular filter, and target coding efficiency.

Given an upsampling filter model, the set of filter coefficients M may be derived according to a predefined optimization criterion. For example, under the MSE criterion, an optimized solution for M may be derived using the MSE optimization techniques described herein that minimize the MSE between the guided image samples and the predicted samples of the upsampled image.

After solving for the filter coefficients M, in some embodiments, the coefficients and (optionally) characteristics (identity) of the upsampling filter may be sent to the receiver (e.g., as metadata).

The upsampling process 700 may be repeated at various intervals deemed necessary to maintain coding efficiency while using available computing resources. For example, when encoding a video signal, process 700 may be repeated for each frame, group of frames, portion of a frame, or whenever the prediction residual between the guide picture and the upsampled picture exceeds a certain threshold.

Guided Color Transient Improvement (CTI) filtering

As used herein, the term "color temporal artifact" means a color-related artifact in image or video processing. For example, in video and image coding, these artifacts can be identified by the presence of false colors in the decoded image (e.g., across the edges of an object). These artifacts may also be referred to as "color bleed". Color temporal artifacts may arise when each color plane of an image is processed separately using different compression levels. For example, in an RGB image, the red and blue planes may be quantized differently than the green plane. Similarly, in a YCbCr image, the Cb and Cr planes may be processed differently from the Y plane.

One approach for reducing color temporal artifacts is to apply post-processing filtering to the chrominance or secondary color components of the decoded image. Similar to the guided upsampling process described earlier, guided CTI filtering derives optimized filter coefficients in the encoder and sends them as metadata to the decoder. Furthermore, the filtering of a pixel in a color plane (e.g., Cb in YCbCr or B in RGB) may take into account both neighboring pixels in the same color plane and neighboring pixels of a corresponding pixel in another color plane (e.g., luminance Y plane in YCbCr, or G in RGB).

In an embodiment, the optimized filter coefficients may be derived in the encoder based on the original uncompressed data. The filter coefficients may be estimated on a block, frame or scene basis according to available resources and bandwidth. In the decoder, the filter may be applied as a post-processing (out-of-loop) filter to improve the overall picture quality.

Example derivation of filter coefficients in CTI filters

The input pixel for an image color component (e.g., Cb, Cr, R, or B) to be filtered using a CTI filter is denoted c_ijAnd the output filtered pixel is represented asOf the same image but from a second colour component (e.g. luminance Y)Or G) is represented by y_ij. In an embodiment, assuming without loss of generality that each color pixel is filtered using a generic 3D filter that applies a (2N +1) × (2N +1) kernel for the first color component and a (2M +1) × (2M +1) kernel for the second color component (e.g., 3 × 3 and 3 × 3 when N ═ M ═ 1), the filtered output may be expressed as:

wherein m is_xyAnd n_xyRepresenting the filter coefficients.

It can be appreciated that equation (40) is very similar to the general 3D upsampling filter described by equation (28); thus, the filter coefficients in equation (40) may be derived by solving an optimization problem as previously described

Wherein d is_ijRepresenting pixels of a reference or guide image (e.g., input V105).

As before, equation (40) may be represented in matrix form as

And the solution to equation (41) can be expressed as an optimization problem

Wherein R represents guide image data (d)_ij) The vector of (a) is determined,is provided with m_xyOf coefficient (2N +1)²The x 1 vector of the vector is calculated,is provided with n_xyOf coefficient (2M +1)²X 1 vector, thereby

The matrix C represents the observed pixel data (C)_ijAnd y_ij)。

As described above, the optimized solution of equation (43) in the mean-square sense can be expressed as

M＝(C^TC)^-1C^TR. (46)

This process may be repeated for each of the color components (e.g., Cb and Cr or R and B) that require CIT filtering.

FIG. 8 depicts an example process for guided color transient improvement filtering, according to an embodiment of the invention. In step 810, an encoder (e.g., system 100 depicted in fig. 1) may first reconstruct an estimate of a decoded image (e.g., V270) that would be received by a receiver (e.g., system 200 depicted in fig. 2). For example, a reconstructed picture at the decoder (e.g., 270) may be estimated by adding the output of the predictor 160 (e.g., signal 165) to the encoded and then decoded version of the residual 175.

In step 820, using equation (46), where R is based on input V105, the decoder can derive optimized CTI filter coefficients M. In step 830, these coefficients and other filtering parameters may be sent to the decoder as part of the metadata bitstream. At the decoder, after reconstruction of signal V270, a separate post-processing process may apply a CTI filter to signal V270 to improve overall quality by reducing color-related artifacts.

Process 800 may be repeated at various intervals deemed necessary to maintain coding efficiency while using available computing resources. For example, when encoding a video signal, process 800 may be repeated for each frame, group of frames, portion of a frame, or whenever the prediction residual between the guide image and the CTI-filtered image exceeds a certain threshold. Process 800 may also be repeated for each of the color components that may require CTI filtering.

Example computer System implementation

Embodiments of the invention may be implemented by a system configured in a computer system, electronic circuits and components, an Integrated Circuit (IC) device (e.g., a microcontroller, a Field Programmable Gate Array (FPGA)) or another configurable or Programmable Logic Device (PLD), a discrete-time or Digital Signal Processor (DSP), an application specific IC (asic), and/or an apparatus comprising one or more of these systems, devices or components. These computers and/or ICs may execute, control, or execute instructions related to directed upsampling or CTI filtering, such as described herein. The computer and/or IC may calculate any of a variety of parameters or values related to the guided image upsampling described herein. The image and video embodiments may be implemented in hardware, software, firmware, and various combinations thereof.

Particular implementations of the invention include a computer processor executing software instructions that cause the processor to perform the methods of the invention. For example, one or more processors in a display, encoder, set-top box, decoder, etc. may implement the above-described guided image upsampling or CTI filtering methods by executing software instructions in a program memory accessible to the processor. The present invention is also provided in the form of a program product. The program product may comprise any medium carrying a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to carry out the method of the invention. The program product according to the invention may be in any of various forms. The program product may include, for example, physical media (e.g., magnetic data storage media including floppy disks, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAMs, etc.), etc. The computer readable signal on the program product may optionally be compressed or encrypted.

Where a component (e.g., a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a "means") should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated exemplary embodiments of the invention.

Equivalents, extensions, substitutions and miscellaneous

Example embodiments related to guided image upsampling and CTI filtering are thus described. In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A method of guided filtering for an encoder, the method comprising:

receiving a guide image and a target image to be encoded by an encoder, wherein the target image and the guide image both represent a similar scene and each comprise a first color component and a second color component;

encoding the target image with an encoder to generate an encoded image;

decoding the encoded image with a decoder to generate a decoded image, the decoded image comprising a decoded first color component and a decoded second color component;

selecting a color transient improvement CTI filter to filter pixels of the decoded image to generate an output color component image;

computing CTI filter coefficients for the CTI filter, wherein filter coefficient computation is based on minimizing an error metric between pixel values of the output color component image and corresponding pixel values of a second color component in the guide image, wherein CTI filter coefficients comprise a first set of filter coefficients and a second set of filter coefficients, wherein generating the output color component image comprises combining results of filtering a first color component of the decoded image with the first set of filter coefficients and results of filtering a second color component of the decoded image with the second set of filter coefficients; and

the CTI filter coefficients are sent to the decoder.

2. The method of claim 1, further comprising: combining the CTI filter coefficients with the encoded pictures in an encoded bitstream and sending the encoded bitstream to a decoder.

3. The method of claim 1, wherein the CTI filter is selected from a plurality of available filters.

4. The method of claim 3, wherein the plurality of available filters includes a normal 3D filter and a symmetric 3D filter.

5. The method of claim 3, further comprising: the characteristics of the selected CTI filter are signaled to the decoder.

6. The method of claim 1, wherein the error metric comprises a Mean Square Error (MSE) calculation.

7. The method of claim 1, wherein a first color component of the target image, guide image, or decoded image is a luma (Y) component and a second color component of the target image, guide image, or decoded image is a chroma color component.

8. The method of claim 7, wherein the chroma color component is a Cb or Cr color component.

9. The method of claim 1, wherein a first color component of the target image, guide image, or decoded image is a green (G) color component and a second color component of the target image, guide image, or decoded image is a red (R) or blue (B) color component.

10. The method of claim 1, wherein the guide image has a higher dynamic range than the target image.

11. The method of claim 10, wherein the guide image has a Visual Dynamic Range (VDR) and the target image has a Standard Dynamic Range (SDR).

12. A method for directed filtering in a decoder, the method comprising:

receiving, by a decoder, an encoded image and CTI filter coefficients for a color transient improvement CTI filter, wherein the CTI filter coefficients include a first set of filter coefficients and a second set of filter coefficients;

decoding the encoded image to generate a decoded image comprising a first color component and a second color component; and

filtering the decoded image with a CTI filter to generate a filtered color component image, wherein generating the filtered color component image comprises combining a result of filtering a first color component of the decoded image with the first set of filter coefficients and a result of filtering a second color component of the decoded image with the second set of filter coefficients,

wherein the CTI filter coefficients are generated by an encoder and are generated by:

decoding the encoded image in the encoder to generate a decoded image in the encoder; and

minimizing an error metric between pixel values of a decoded image filtered with the CTI filter in the encoder and corresponding pixel values of a guide image received by the encoder, wherein the guide image represents the same scene as the decoded image generated in the encoder.

13. The method as defined in claim 12, wherein the error metric comprises a Mean Square Error (MSE) calculation.

14. The method of claim 12, wherein the first color component of the decoded picture is a luma (Y) component and the second color component of the decoded picture is a chroma (Cb or Cr) color component.

15. An image processing apparatus comprising a processor and configured to perform the method of claim 12.

16. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for performing a method with one or more processors in accordance with the method of claim 1.

17. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for performing a method with one or more processors in accordance with the method of claim 12.