EP4635182A1 - Procédé, appareil et produit-programme d'ordinateur pour codage et décodage vidéo - Google Patents
Procédé, appareil et produit-programme d'ordinateur pour codage et décodage vidéoInfo
- Publication number
- EP4635182A1 EP4635182A1 EP23902870.7A EP23902870A EP4635182A1 EP 4635182 A1 EP4635182 A1 EP 4635182A1 EP 23902870 A EP23902870 A EP 23902870A EP 4635182 A1 EP4635182 A1 EP 4635182A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- row
- mixture matrix
- matrix
- determining
- determined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
Definitions
- the present solution generally relates to video encoding and video decoding.
- the present solution relates to determining a set of filter parameter used in encoding/decoding.
- Background This section is intended to provide a background or context to the invention that is recited in the claims.
- the description herein may include concepts that could be pursued but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
- a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
- the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
- an apparatus comprising means for processing an input video/image; means for determining a set of filter coefficients; where a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector is determined; the determined mixture matrix is modified to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; a modified destination row in the mixture matrix is determined by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; the set of filter parameters is determined from the triangular form of the mixture matrix; the apparatus comprising means for using the set of filter coefficients in a filter.
- a method comprising: processing an input video/image; determining a set of filter coefficients, the determining comprising determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining the set of filter parameters from the triangular form of the mixture matrix; using the set of filter coefficients in a filter.
- an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: process an input video/image; determine a set of filter coefficients, comprising determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining the set of filter parameters from the triangular form of the mixture matrix; use the set of filter coefficients in a filter.
- the mixture matrix is modified by respective means by scaling at least one value on a determined row using a scaling parameter, a rounding parameter and a bit-shifting parameter, where the scaling parameter, the rounding parameter and the bit-shifting parameter are determined from a diagonal element of the mixture matrix on the determined row.
- the set of filter coefficients are used in convolutional cross-component prediction, where luma samples are used as input and predicted chroma samples are used as output.
- the mixture matrix is determined to include two cross-correlation vectors, wherein a first cross-correlation vector relates to cross-correlation between the luma samples and first chroma samples, and wherein a second cross-correlation vector relates to cross- correlation between the luma samples and second chroma samples.
- the set of filter parameters is determined from the triangular form of the mixture matrix using a back- substitution process.
- the back-substitution comprises at least one of the filter parameters to be set equal to an element in the mixture matrix
- the filter parameters when modifying the determined mixture matrix to the triangular form, at least for one diagonal sample it is checked if the value of said diagonal sample or the absolute value of said diagonal sample is below a predetermined threshold value, and when so, the value of said diagonal sample is set equal to the predetermined threshold value
- the value of said diagonal sample is set equal to the predetermined threshold value
- terminating the modifying process and setting the filter coefficients to a predetermined set of values when modifying the determined mixture matrix to the triangular form, at least for one diagonal sample it is checked if the value of said diagonal sample or the absolute value of said diagonal sample is below a predetermined threshold value, and when so, terminating the modifying process and setting the filter coefficients to a predetermined set of values.
- the processing is encoding or decoding.
- the computer program product is embodied on a non-transitory computer readable medium. Description of the Drawings In the following, various embodiments will be described in more detail with reference to the appended drawings, in which Fig.1 shows an encoding process according to an embodiment. Fig.2 shows a decoding process according to an embodiment. Fig.3 is a flowchart illustrating a method according to an embodiment. Fig.4 shows an apparatus according to an embodiment. Description of Example Embodiments The following description and drawings are illustrative and are not to be construed as unnecessarily limiting. The specific details are provided for a thorough understanding of the disclosure.
- references to one or an embodiment in the present disclosure can be, but not necessarily are, reference to the same embodiment and such references mean at least one of the embodiments.
- Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure.
- several embodiments will be described in the context of one video coding arrangement. It is to be noted, however, that the present embodiments are not necessarily limited to this particular arrangement.
- the embodiments relate to cross-component filter parameter calculation with row- based matrix operations.
- the Advanced Video Coding standard (which may be abbreviated AVC or H.264/AVC) was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardization Sector of International Telecommunication Union (ITU-T) and the Moving Picture Experts Group (MPEG) of International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC).
- JVT Joint Video Team
- MPEG Moving Picture Experts Group
- ISO International Organization for Standardization
- IEC International Electrotechnical Commission
- the H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG-4 Part 10 Advanced Video Coding (AVC).
- AVC MPEG-4 Part 10 Advanced Video Coding
- HEVC Scalable Video Coding
- MVC Multiview Video Coding
- HEVC High Efficiency Video Coding standard
- JCT-VC Joint Collaborative Team - Video Coding
- ITU-T Recommendation H.265 and ISO/IEC International Standard 23008-2 also known as MPEG-H Part 2 High Efficiency Video Coding (HEVC).
- Extensions to H.265/HEVC include scalable, multiview, three-dimensional, and fidelity range extensions, which may be referred to as SHVC, MV-HEVC, 3D-HEVC, and REXT, respectively.
- SHVC scalable, multiview, three-dimensional, and fidelity range extensions
- MV-HEVC scalable, multiview, three-dimensional, and fidelity range extensions
- REXT REXT
- the references in this description to H.265/HEVC, SHVC, MV-HEVC, 3D-HEVC and REXT that have been made for the purpose of understanding definitions, structures or concepts of these standard specifications are to be understood to be references to the latest versions of these standards that were available before the date of this application, unless otherwise indicated.
- Versatile Video Coding (which may be abbreviated VVC, H.266, or H.266/VVC) is a video compression standard developed as the successor to HEVC.
- VVC is specified in ITU-T Recommendation H.266 and equivalently in ISO/IEC 23090-3, which is also referred to as MPEG-I Part 3.
- a specification of the AV1 bitstream format and decoding process were developed by the Alliance of Open Media (AOM). The AV1 specification was published in 2018. AOM is reportedly working on the AV2 specification.
- a video codec may comprise an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
- the compressed representation may be referred to as a bitstream or a video bitstream.
- a video encoder and/or a video decoder may also be separate from each other, i.e., need not form a codec.
- the encoder may discard some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).
- the notation “(de)coder” means an encoder and/or a decoder.
- Hybrid video codecs for example ITU-T H.263, H.264/AVC and HEVC, may encode the video information in two phases. At first, pixel values in a certain picture area (or “block”) are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner).
- the prediction error i.e., the difference between the predicted block of pixels and the original block of pixels
- the prediction error is coded.
- This may be done by transforming the difference in pixel values using a specified transform (e.g., Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients.
- DCT Discrete Cosine Transform
- encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate).
- Encoding process according to an embodiment is illustrated in Figure 1.
- Figure 1 illustrates an image to be encoded (In); a predicted representation of an image block (P’n); a prediction error signal (Dn); a reconstructed prediction error signal (D’n); a preliminary reconstructed image (I’n); a final reconstructed image (R’n); a transform (T) and inverse transform (T -1 ); a quantization (Q) and inverse quantization (Q -1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F).
- video codecs such as H.265/HEVC, video pictures are divided into coding units (CU) covering the area of the picture.
- a CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU.
- CU may consist of a square block of samples with a size selectable from a predefined set of possible CU sizes.
- a CU with the maximum allowed size may be named as LCU (largest coding unit) or CTU (coding tree unit), and the video picture is divided into non-overlapping CTUs.
- a CTU can be further split into a combination of smaller CUs, e.g., by recursively splitting the CTU and resultant CUs.
- Each resulting CU may have at least one PU and at least one TU associated with it.
- Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively.
- Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g., motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs).
- each TU is associated with information describing the prediction error decoding process for the samples within the said TU (including e.g., DCT coefficient information). It may be signaled at CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered there are no TUs for the said CU.
- the division of the image into CUs, and division of CUs into PUs and TUs may be signaled in the bitstream allowing the decoder to reproduce the intended structure of these units.
- An elementary unit for the input to an encoder and the output of a decoder, respectively, in most cases is a picture.
- a picture given as an input to an encoder may also be referred to as a source picture, and a picture decoded by a decoded may be referred to as a decoded picture or a reconstructed picture.
- the source and decoded pictures are each comprised of one or more sample arrays, such as one of the following sets of sample arrays: Luma (Y) only (monochrome).
- Luma and two chroma (YCbCr or YCgCo). Green, Blue and Red (GBR, also known as RGB).
- Arrays representing other unspecified monochrome or tri-stimulus color samplings (for example, YZX, also known as XYZ).
- YZX also known as XYZ
- these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use.
- the actual color representation method in use can be indicated e.g., in a coded bitstream e.g., using the Video Usability Information (VUI) syntax of HEVC or alike.
- VUI Video Usability Information
- a component may be defined as an array or single sample from one of the three sample arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format.
- a picture may be defined to be either a frame or a field.
- a frame comprises a matrix of luma samples and possibly the corresponding chroma samples.
- a field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced. Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or chroma sample arrays may be subsampled when compared to luma sample arrays.
- the decoder reconstructs the output video by applying prediction means similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding (inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain). After applying prediction and prediction error decoding means the decoder sums up the prediction and prediction error signals (pixel values) to form the output video frame.
- the decoder (and encoder) can also apply additional filtering means to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence. Decoding process according to an embodiment is illustrated in Figure 2.
- Figure 2 illustrates a predicted representation of an image block (P’n); a reconstructed prediction error signal (D’n); a preliminary reconstructed image (I’n); a final reconstructed image (R’n); an inverse transform (T -1 ); an inverse quantization (Q -1 ); an entropy decoding (E -1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
- a color palette- based coding can be used.
- Palette-based coding refers to a family of approaches for which a palette, i.e., a set of colors and associated indexes, is defined and the value for each sample within a coding unit is expressed by indicating its index in the palette.
- Palette-based coding can achieve good coding efficiency in coding units with a relatively small number of colors (such as image areas which are representing computer screen content, like text or simple graphics).
- palette index prediction approaches can be utilized, or the palette indexes can be run length coded to be able to represent larger homogenous image areas efficiently.
- escape coding can be utilized.
- Escape coded samples are transmitted without referring to any of the palette indexes. Instead, their values are indicated individually for each escape coded sample.
- the motion information may be indicated with motion vectors associated with each motion compensated image block in video codecs. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) and the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently those may be coded differentially with respect to block specific predicted motion vectors.
- the predicted motion vectors may be created in a predefined way, for example calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
- Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor.
- the reference index of previously coded/decoded picture can be predicted.
- the reference index may be predicted from adjacent blocks and/or or co-located blocks in temporal reference picture.
- high efficiency video codecs may employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction.
- a bitstream may be defined as a sequence of bits or a sequence of syntax structures.
- a bitstream format may constrain the order of syntax structures in the bitstream.
- a syntax element may be defined as an element of data represented in the bitstream.
- a syntax structure may be defined as zero or more syntax elements present together in the bitstream in a specified order.
- a bitstream may be in the form of a network abstraction layer (NAL) unit stream or a byte stream, which forms the representation of coded pictures and associated data forming one or more coded video sequences.
- NAL unit may be defined as a syntax structure containing an indication of the type of data to follow and bytes containing that data in the form of an RBSP interspersed as necessary with start code emulation prevention bytes.
- a raw byte sequence payload (RBSP) may be defined as a syntax structure containing an integer number of bytes that is encapsulated in a NAL unit.
- An RBSP is either empty or has the form of a string of data bits containing syntax elements followed by an RBSP stop bit and followed by zero or more subsequent bits equal to 0.
- a NAL unit comprises a header and a payload.
- the NAL unit header indicates the type of the NAL unit among other things.
- a bitstream may comprise a sequence of open bitstream units (OBUs).
- OBU comprises a header and a payload, wherein the header identifies a type of the OBU.
- the header may comprise a size of the payload in bytes.
- the phrase along the bitstream (e.g., indicating along the bitstream) or along a coded unit of a bitstream (e.g., indicating along a coded tile) may be used in claims and described embodiments to refer to transmission, signaling, or storage in a manner that the "out-of-band" data is associated with but not included within the bitstream or the coded unit, respectively.
- the phrase decoding along the bitstream or along a coded unit of a bitstream or alike may refer to decoding the referred out-of-band data (which may be obtained from out-of-band transmission, signaling, or storage) that is associated with the bitstream or the coded unit, respectively.
- the phrase along the bitstream may be used when the bitstream is contained in a container file, such as a file conforming to the ISO Base Media File Format, and certain file metadata is stored in the file in a manner that associates the metadata to the bitstream, such as boxes in the sample entry for a track containing the bitstream, a sample group for the track containing the bitstream, or a timed metadata track associated with the track containing the bitstream.
- Video codecs may support motion compensated prediction from one source image (uni-prediction) and two sources (bi-prediction). In the case of uni-prediction a single motion vector is applied whereas in the case of bi- prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction.
- the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal.
- similar approach can be applied to intra picture prediction.
- the displacement vector indicates where from the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded.
- This kind of intra block copying methods can improve the coding efficiency substantially in presence of repeating structures within the frame – such as text or other graphics.
- the prediction residual after motion compensation or intra prediction may be first transformed with a transform kernel (like DCT) and then coded. The reason for this is that often there still exists some correlation among the residual and transform can in many cases help reduce this correlation and provide more efficient coding.
- Video encoders may utilize Lagrangian cost functions to find optimal coding modes, e.g., the desired Macroblock mode and associated motion vectors.
- This kind of cost function uses a weighting factor ⁇ to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area:
- C the Lagrangian cost to be minimized
- D the image distortion (e.g., Mean Squared Error) with the mode and motion vectors considered
- R the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).
- Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions, or frame rates.
- the receiver can extract the desired representation depending on its characteristics (e.g., resolution that matches best the display device).
- a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g., the network characteristics or processing capabilities of the receiver.
- a scalable bitstream may consist of a “base layer” providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer may depend on the lower layers.
- the motion and mode information of the enhancement layer can be predicted from lower layers.
- the pixel data of the lower layers can be used to create prediction for the enhancement layer.
- a scalable video codec for quality scalability also known as Signal- to-Noise or SNR
- spatial scalability may be implemented as follows.
- a base layer a conventional non-scalable video encoder and decoder is used.
- the reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer.
- the base layer decoded pictures may be inserted into a reference picture list(s) for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer. Consequently, the encoder may choose a base-layer reference picture as inter prediction reference and indicate its use e.g., with a reference picture index in the coded bitstream.
- the decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as inter prediction reference for the enhancement layer.
- a decoded base-layer picture is used as prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture.
- ⁇ Spatial scalability Base layer pictures are coded at a lower resolution than enhancement layer pictures.
- Bit-depth scalability Base layer pictures are coded at lower bit-depth (e.g., 8 bits) than enhancement layer pictures (e.g., 10 or 12 bits).
- Chroma format scalability Enhancement layer pictures provide higher fidelity in chroma (e.g., coded in 4:4:4 chroma format) than base layer pictures (e.g., 4:2:0 format).
- base layer information could be used to code enhancement layer to minimize the additional bitrate overhead. Scalability can be enabled in two basic ways.
- a reference frame -based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means.
- images can be split into independently codable and decodable image segments (slices or tiles).
- Video in this description may refer to image segments constructed of certain number of basic coding units that are processed in default coding or decoding order, while “tiles” may refer to image segments that have been defined as rectangular image regions that are processed at least to some extend as individual frames.
- Video may be encoded in YUV or YCbCr color space, since it is found to reflect some characteristics of human visual system and allows lower quality representation for Cb and Cr channels as human perception is less sensitive to the chrominance fidelity those channels represent.
- Some video coding tools perform filtering operations which convolve a set of reference samples with a set of filter parameters to output, for example, a predicted value for a certain sample in a picture. In some cases, the filter parameters may be predetermined or signaled in the bitstream.
- CCLM cross-component linear model
- CCCM cross-component convolutional model
- the CCLM of VVC/H.266 calculates its parameters by identifying two luma sample values and corresponding chroma sample values, followed by determination of sloe and offset parameters using those two sample pairs.
- Different variant of CCLM is used in Enhanced Compression Model 6 (ECM 6). In that case a least-mean-square method is used to find the parameters.
- ECM 6 Enhanced Compression Model 6
- autocorrelation matrix is generated and decomposed into triangular matrices using a method resembling an LDL decomposition, which is followed by solving the filter coefficients by a sequence including two back-substitution stages and a scaling stage.
- the present embodiments are targeted to determine a set of filter parameters by generating a mixture matrix that combines an autocorrelation matrix with one or more cross-correlation vectors and reducing the generated mixture matrix to a diagonal form using row-wise matrix operations.
- the process is concluded by a single set of back-substitution operations for each set of filter parameters that are being solved.
- a pre-emptive mechanism applied after a round of iterations of row-wise matrix operations to avoid divisions by zero during future iterations and during the back-substitution stage is disclosed.
- said set of filter parameters can be used for example by cross-component prediction filters of video and image codecs.
- the resulting value p can represent for example a predicted sample value in one of the chrominance channels and calculated as follows using the filter parameters x and the input samples z:
- Input vector z can be configured to include for example luminance sample values, functions of luminance sample values or constants, or a combination of those.
- Including a constant in input vector z corresponds to adding a constant to the output of filter p.
- This kind of a constant can be referred to as a bias term or a bias parameter and can be used to represent offsets between input and output values.
- Items in the autocorrelation matrix A (size NxN) and the cross-correlation vector y with N values can be calculated for example as follows: where M is the number of training vectors including in the process, R matrix contains input training vectors as its rows, and s represents a vector with the output training samples.
- the autocorrelation matrix A and the cross-correlation vector y can be calculated in different ways, for example, as described in JVET input contribution JVET-AB0174.
- the autocorrelation matrix A can also include regularization terms that are added to the diagonal elements of the matrix.
- a regularization term can be determined to be a constant value, such as 1 or 10, but also other values can be used depending on use case specific requirements for such regularization.
- a mixture matrix B can be created that contains elements of both autocorrelation matrix A and cross-correlation vector y as follows: This can be triangularized (also referred to as “diagonalized”) using Gaussian elimination like method by multiplying elements on a specific row and deducting the resulting values of another row until the mixture matrix B has been reduced to its upper triangular form B’ where all elements below the main diagonal are zeros: Parameters of two or more filters can be calculated simultaneously by creating a mixture matrix containing multiple cross-correlation vectors.
- multiply can be implemented using a fixed- point implementation and the division operation can be implemented using an approximated fixed point approach proposed in JVET document JVET- AB0174.
- row-wise pointers src and dst can be used to illustrate the row-based nature of the process.
- src points to the i:th row of the mixture matrix
- dst points to each of the rows below it one after another.
- a scale value may be calculated based on the ratio of the i:th element on destination and source rows.
- zeroThr a[i+1][i+1] ⁇ Or: Iterate over rows of a mixture matrix (source row) ⁇ Iterate over rows below the source row (destination row) ⁇ Calculate a scale between destination row and source row Multiply source row with the scale Deduct the multiplied source row from the destination row ⁇ Set the diagonal element of the next source row to a value not smaller than zeroThr ⁇ This guarantees the diagonal element a[i+1][i+1] on the next source row i+1 to have at least a value of zeroThr.
- zeroThr can be determined to be for example 1, or it can be determined to be some other positive number larger than zero.
- the absolute value of the diagonal element on the next source row can be used in comparison with the threshold value zeroThr.
- the autocorrelation matrix A is degenerate (or non-invertible) and no solution to the system of equations exists. If at any point during triangularization a row in the autocorrelation matrix A becomes all zero (or equivalently if each element of said row is less than zeroThr) with respective elements in cross-correlation vectors being non-zero (or larger than zeroThr), then the autocorrelation matrix A can be considered degenerate, and elimination can be terminated early. In such case, a fixed set of filter coefficients (for example all zero) can be output.
- all the division operations use diagonal elements a[i][j] of the triangularized mixture matrix as denominators and no additional checks for zero valued denominators need to be performed, if those were done during the triangularization phase as suggested above.
- the column parameter col determines which column of the triangularized mixture matrix is used as the target vector in the back-substitution process. If making a selection of the examples above and including cross-correlation vector ycb as the N:th column of the mixture matrix and cross-correlation vector vcr as the N+1:th column of the mixture matrix, col can be set to N when solving xcb coefficients for the filter used for predicting samples of a first color component Cb, and col can be set to N+1 when solving xcr coefficients for the filter used for predicting samples of a second color component Cr.
- dynamic range of the intermediate values of the mixture matrix can be controlled in different ways.
- values on a specific row of the mixture matrix can be shifted up or down to avoid overflows that can happen when the scale parameter gets large due to differences between the values on the source row i and destination row j.
- An example of such control mechanism is given in the pseudo code below.
- log2 operation refers to logarithm of base 2 of the input and can also include rounding the output value down to an integer value.
- the difference between the source row scale srcScale and destination row scale dstScale is used to determine the effective scale value which is used in the row- wise multiplication.
- the k:th value on the destination row j a[j][k]) is updated by shifting it down by difScale bits to compensate for the modified scale value.
- the destination row value a[j][i] could be shifted down instead of shifting the source row value a[i][i] down to achieve a scale close to 1.
- the source row values could be shifted up to compensate for the modification of the scale parameter.
- the mixture matrix represents a system of linear equations
- the rows of the matrix can be further multiplied, or rows scaled with multiplicative operation can be added to other rows to enable additional functionality or to control the dynamic range of the matrix elements.
- the rows of the mixture matrix can be normalized so that the diagonal elements will have a value of 1 (or equivalently 1 ⁇ DECIM_BITS, or 2 ⁇ DECIM_BITS where DECIM_BITS is the number of bits describing the decimal part of a fixed-point number).
- This may be useful, as later steps of the process require divisions by the diagonal elements.
- the diagonal elements have been forced to have values of one each, the division operations become trivial, and the divisional elements can be substituted by taking the value of the numerator as the output of the division operation.
- the result of the triangularization operation can be given as: This can be calculated, for example, using the pseudo code below.
- a diagonal value is determined on the i:th row of the mixture matrix. That value can be clipped to a certain range to avoid divisions by zero. For example, it can be determined to have a minimum value of 1 or 10 in the case fixed point arithmetic is used and it can be determined to have a different minimum value, such as 0.001 or 0.0001, if floating point arithmetic is used. This is followed by scaling a set of remaining values of that row with the determined diagVal.
- JVET contribution JVET-AB0174 teaches how the division operation can be approximated by determining a scaling parameter (scale), rounding parameter (round) and a bit-shifting parameter (shift).
- the divisions by the diagonal elements a[i][i] can be advantageously substituted with straight-forward assignment operations.
- the filter parameter x[N-1] can directly be set equal to the corresponding a[N-1][col] element in the mixture matrix.
- the parameter vector x could also be calculated directly from the mixture matrix if also the upper triangular elements of the matrix are pushed
- the filter parameters can be read from the extension columns N and N+1. This represents a straight- forward back-substitution, where the filter parameters are directly set equal to the corresponding elements in the mixture matrix.
- the mixture matrix format is given here as an example to clarify the operations used to calculate the filter coefficient in the suggested way.
- the mixture matrix can be implemented in different ways. For example, instead of arranging the data in a matrix format, the elements can be kept in row or column vector format, or the elements can be kept as separate items. Also, some of the elements can be stored in matrix format and some of the elements can be stored in vector format or as separate elements.
- the autocorrelation matrix A and its triangularized result A’ can be stored in a matrix format where and the cross-correlation vectors y cb and y cr and their new forms y’ cb and y’ cr after triangularization of the autocorrelation matrix A can be stored in a vector format.
- the triangularization of the autocorrelation matrix A using row-based scaling operations can be implemented with additional re-ordering of rows (partial pivoting). Changing the order of rows does not change the outcome of the system of equations. However, such re-ordering may be beneficial for the numerical stability of practical implementations.
- a set of cross-component prediction filter coefficients is calculated using at least in part by reducing a matrix to its triangular form using a set of row-wise scaling operations.
- a set of filter coefficients is determined using matrix triangularization that includes calculating a scaling parameter between an element on a determined row of the matrix and another row of the matrix, multiplying at least one element of the determined row of the matrix with the scaling parameter and deducting the results from at least one element of the other row of the matrix.
- At least two sets of cross-component prediction filter coefficients are calculated by combining an autocorrelation matrix with two or more cross-correlation vectors and reducing the formed mixture matrix into a triangular form using row-based scaling operations.
- triangularization can be terminated early if the autocorrelation matrix is considered degenerate (i.e., non-invertible). In such a case a fixed set of filter coefficients are produced (for example each coefficient being zero).
- the method according to an embodiment is shown in Figure 3.
- the method generally comprises processing 310 an input video/image; determining 320 a set of filter coefficients, the determining comprising: determining 330 a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; modifying 340 the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining 350 a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining 360 the set of filter parameters from the triangular form of the mixture matrix; using 370 the set of filter coefficients in a filter.
- An apparatus comprises means for processing an input video/image; means for determining a set of filter coefficients, the determining comprising: means for determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; means for modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; means for determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; means for determining the set of filter parameters from the triangular form of the mixture matrix; means for using the set of filter coefficients in a filter.
- the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
- the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 3 according to various embodiments.
- An example of a data processing system for an apparatus is illustrated in Figure 4.
- the data processing system comprises a main processing unit 100, a memory 102, a storage device 104, an input device 106, an output device 108, and a graphics subsystem 110, which are all connected to each other via a data bus 112.
- the main processing unit 100 is a conventional processing unit arranged to process data within the data processing system.
- the main processing unit 100 may comprise or be implemented as one or more processors or processor circuitry.
- the memory 102, the storage device 104, the input device 106, and the output device 108 may include conventional components as recognized by those skilled in the art.
- the memory 102 and storage device 104 store data in the data processing system 100.
- Computer program code resides in the memory 102 for implementing, for example, a method as illustrated in a flowchart of Figure 3 according to various embodiments.
- the input device 106 inputs data into the system while the output device 108 receives data from the data processing system and forwards the data, for example to a display.
- the data bus 112 is a conventional data bus and while shown as a single line it may be any combination of the following: a processor bus, a PCI bus, a graphical bus, an ISA bus. Accordingly, a skilled person readily recognizes that the apparatus may be any data processing device, such as a computer device, a personal computer, a server computer, a mobile phone, a smart phone, or an Internet access device, for example Internet tablet computer.
- the various embodiments can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the method.
- a device may comprise circuitry and electronics for handling, receiving, and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
- a network device like a server may comprise circuitry and electronics for handling, receiving, and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of various embodiment.
- the different functions discussed herein may be performed in a different order and/or concurrently with other.
- one or more of the above-described functions and embodiments may be optional or may be combined.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Les présents modes de réalisation concernent un procédé de codage et un équipement technique pour mettre en œuvre le procédé. Le procédé comprend le traitement (310) d'une vidéo/image d'entrée ; la détermination (320) d'un ensemble de coefficients de filtre, la détermination comprenant : la détermination (330) d'une matrice de mélange contenant une matrice d'autocorrélation et au moins un vecteur de corrélation croisée ; la modification (340) de la matrice de mélange déterminée en une forme triangulaire en déterminant au moins un paramètre d'échelle entre une ligne source et une ligne de destination dans la matrice de mélange ; la détermination (350) d'une ligne de destination modifiée dans la matrice de mélange en multipliant la ligne source par le paramètre d'échelle et la déduction de la ligne source mise à l'échelle d'après la ligne de destination ; la détermination (360) de l'ensemble de paramètres de filtre à partir de la forme triangulaire de la matrice de mélange ; l'utilisation (370) de l'ensemble de coefficients de filtre dans un filtre.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FI20226106 | 2022-12-15 | ||
| PCT/FI2023/050619 WO2024126889A1 (fr) | 2022-12-15 | 2023-11-09 | Procédé, appareil et produit-programme d'ordinateur pour codage et décodage vidéo |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4635182A1 true EP4635182A1 (fr) | 2025-10-22 |
Family
ID=91484886
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP23902870.7A Pending EP4635182A1 (fr) | 2022-12-15 | 2023-11-09 | Procédé, appareil et produit-programme d'ordinateur pour codage et décodage vidéo |
Country Status (6)
| Country | Link |
|---|---|
| EP (1) | EP4635182A1 (fr) |
| JP (1) | JP2025539647A (fr) |
| KR (1) | KR20250119647A (fr) |
| CN (1) | CN120419186A (fr) |
| MX (1) | MX2025006461A (fr) |
| WO (1) | WO2024126889A1 (fr) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4676039A1 (fr) * | 2024-07-02 | 2026-01-07 | InterDigital CE Patent Holdings, SAS | Régularisation pour dériver des modèles convolutifs pour une prédiction de bloc |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019107994A1 (fr) * | 2017-11-29 | 2019-06-06 | 한국전자통신연구원 | Procédé et dispositif de codage/décodage d'images employant un filtrage en boucle |
-
2023
- 2023-11-09 EP EP23902870.7A patent/EP4635182A1/fr active Pending
- 2023-11-09 WO PCT/FI2023/050619 patent/WO2024126889A1/fr not_active Ceased
- 2023-11-09 KR KR1020257023551A patent/KR20250119647A/ko active Pending
- 2023-11-09 JP JP2025535118A patent/JP2025539647A/ja active Pending
- 2023-11-09 CN CN202380086267.4A patent/CN120419186A/zh active Pending
-
2025
- 2025-06-03 MX MX2025006461A patent/MX2025006461A/es unknown
Also Published As
| Publication number | Publication date |
|---|---|
| JP2025539647A (ja) | 2025-12-05 |
| MX2025006461A (es) | 2025-07-01 |
| WO2024126889A1 (fr) | 2024-06-20 |
| CN120419186A (zh) | 2025-08-01 |
| KR20250119647A (ko) | 2025-08-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20150010068A1 (en) | Method, device, and computer program for pre-encoding and post-decoding high bit-depth content in video encoder and decoder | |
| CN121239841A (zh) | 编码方法、解码方法、编码器、解码器以及存储介质 | |
| CN108353186A (zh) | 用于空间帧内预测的具有帮助数据的视频编码 | |
| US20240187594A1 (en) | Method And An Apparatus for Encoding and Decoding of Digital Image/Video Material | |
| US20250220189A1 (en) | A method, an apparatus and a computer program product for encoding and decoding of digital media content | |
| US20220286709A1 (en) | Methods of coding images/videos with alpha channels | |
| WO2024126889A1 (fr) | Procédé, appareil et produit-programme d'ordinateur pour codage et décodage vidéo | |
| US20250193410A1 (en) | Flexible scaling factors for joint mvd coding | |
| US12537972B2 (en) | Bilateral matching based scaling factor derivation for JMVD | |
| EP3672241A1 (fr) | Procédé, appareil et produit programme informatique pour codage et décodage vidéo | |
| WO2024074753A1 (fr) | Appareil, procédé et programme informatique pour le codage et le décodage de vidéo | |
| US20250350747A1 (en) | A method, an apparatus and a computer program product for encoding and decoding of digital media content | |
| US20250330622A1 (en) | A method, an apparatus and a computer program product for video coding | |
| US12368890B2 (en) | Method, an apparatus and a computer program product for video encoding and video decoding | |
| WO2025003557A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
| WO2025103656A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
| WO2025153222A1 (fr) | Déduction de la plage d'écrêtage pour une prédiction linéaire | |
| HK40055509A (en) | A method and an apparatus for encoding and decoding of digital image/video material |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20250715 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |