EP4635182A1

EP4635182A1 - A method, an apparatus and a computer program product for video encoding and decoding

Info

Publication number: EP4635182A1
Application number: EP23902870.7A
Authority: EP
Inventors: Jani Lainema; Pekka Astola; Ramin GHAZNAVI YOUVALARI
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2022-12-15
Filing date: 2023-11-09
Publication date: 2025-10-22
Also published as: JP2025539647A; MX2025006461A; WO2024126889A1; CN120419186A; KR20250119647A

Abstract

The present embodiments relate to a method for encoding and a technical equipment for implementing the method. The method comprises processing (310) an input video/image; determining (320) a set of filter coefficients, the determining comprising: determining (330) a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; modifying (340) the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining (350) a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining (360) the set of filter parameters from the triangular form of the mixture matrix; using (370) the set of filter coefficients in a filter.

Description

A METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND DECODING Technical Field The present solution generally relates to video encoding and video decoding. In particular, the present solution relates to determining a set of filter parameter used in encoding/decoding. Background This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section. A video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. The encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed. Summary The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention. Various aspects include a method, an apparatus and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments are disclosed in the dependent claims. According to a first aspect, there is provided an apparatus comprising means for processing an input video/image; means for determining a set of filter coefficients; where a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector is determined; the determined mixture matrix is modified to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; a modified destination row in the mixture matrix is determined by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; the set of filter parameters is determined from the triangular form of the mixture matrix; the apparatus comprising means for using the set of filter coefficients in a filter. According to a second aspect, there is provided a method, comprising: processing an input video/image; determining a set of filter coefficients, the determining comprising determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining the set of filter parameters from the triangular form of the mixture matrix; using the set of filter coefficients in a filter. According to a third aspect, there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: process an input video/image; determine a set of filter coefficients, comprising determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining the set of filter parameters from the triangular form of the mixture matrix; use the set of filter coefficients in a filter. According to a fourth aspect, there is provided computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: process an input video/image; determine a set of filter coefficients, comprising determining a mixture matrix containing an autocorrelation matrix and at least one cross- correlation vector; modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining the set of filter parameters from the triangular form of the mixture matrix; use the set of filter coefficients in a filter. According to an embodiment, the mixture matrix is modified by respective means by scaling at least one value on a determined row using a scaling parameter, a rounding parameter and a bit-shifting parameter, where the scaling parameter, the rounding parameter and the bit-shifting parameter are determined from a diagonal element of the mixture matrix on the determined row. According to an embodiment, the set of filter coefficients are used in convolutional cross-component prediction, where luma samples are used as input and predicted chroma samples are used as output. According to an embodiment, the mixture matrix is determined to include two cross-correlation vectors, wherein a first cross-correlation vector relates to cross-correlation between the luma samples and first chroma samples, and wherein a second cross-correlation vector relates to cross- correlation between the luma samples and second chroma samples. According to an embodiment, the set of filter parameters is determined from the triangular form of the mixture matrix using a back- substitution process. According to an embodiment, the back-substitution comprises at least one of the filter parameters to be set equal to an element in the mixture matrix According to an embodiment, when modifying the determined mixture matrix to the triangular form, at least for one diagonal sample it is checked if the value of said diagonal sample or the absolute value of said diagonal sample is below a predetermined threshold value, and when so, the value of said diagonal sample is set equal to the predetermined threshold value According to an embodiment, wherein when modifying the determined mixture matrix to the triangular form, at least for one diagonal sample it is checked if the value of said diagonal sample or the absolute value of said diagonal sample is below a predetermined threshold value, and when so, terminating the modifying process and setting the filter coefficients to a predetermined set of values. According to an embodiment, the processing is encoding or decoding. According to an embodiment, the computer program product is embodied on a non-transitory computer readable medium. Description of the Drawings In the following, various embodiments will be described in more detail with reference to the appended drawings, in which Fig.1 shows an encoding process according to an embodiment. Fig.2 shows a decoding process according to an embodiment. Fig.3 is a flowchart illustrating a method according to an embodiment. Fig.4 shows an apparatus according to an embodiment. Description of Example Embodiments The following description and drawings are illustrative and are not to be construed as unnecessarily limiting. The specific details are provided for a thorough understanding of the disclosure. However, in certain instances, well- known or conventional details are not described in order to avoid obscuring the description. References to one or an embodiment in the present disclosure can be, but not necessarily are, reference to the same embodiment and such references mean at least one of the embodiments. Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. In the following, several embodiments will be described in the context of one video coding arrangement. It is to be noted, however, that the present embodiments are not necessarily limited to this particular arrangement. The embodiments relate to cross-component filter parameter calculation with row- based matrix operations. The Advanced Video Coding standard (which may be abbreviated AVC or H.264/AVC) was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardization Sector of International Telecommunication Union (ITU-T) and the Moving Picture Experts Group (MPEG) of International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC). The H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG-4 Part 10 Advanced Video Coding (AVC). There have been multiple versions of the H.264/AVC standard, each integrating new extensions or features to the specification. These extensions include Scalable Video Coding (SVC) and Multiview Video Coding (MVC). The High Efficiency Video Coding standard (which may be abbreviated HEVC or H.265/HEVC) was developed by the Joint Collaborative Team - Video Coding (JCT-VC) of VCEG and MPEG. The standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.265 and ISO/IEC International Standard 23008-2, also known as MPEG-H Part 2 High Efficiency Video Coding (HEVC). Extensions to H.265/HEVC include scalable, multiview, three-dimensional, and fidelity range extensions, which may be referred to as SHVC, MV-HEVC, 3D-HEVC, and REXT, respectively. The references in this description to H.265/HEVC, SHVC, MV-HEVC, 3D-HEVC and REXT that have been made for the purpose of understanding definitions, structures or concepts of these standard specifications are to be understood to be references to the latest versions of these standards that were available before the date of this application, unless otherwise indicated. Versatile Video Coding (which may be abbreviated VVC, H.266, or H.266/VVC) is a video compression standard developed as the successor to HEVC. VVC is specified in ITU-T Recommendation H.266 and equivalently in ISO/IEC 23090-3, which is also referred to as MPEG-I Part 3. A specification of the AV1 bitstream format and decoding process were developed by the Alliance of Open Media (AOM). The AV1 specification was published in 2018. AOM is reportedly working on the AV2 specification. Some key definitions, bitstream and coding structures, and concepts of H.264/AVC, HEVC, VVC, and/or AV1 and some of their extensions are described in this section as an example of a video encoder, decoder, encoding method, decoding method, and a bitstream structure, wherein the embodiments may be implemented. The aspects of various embodiments are not limited to H.264/AVC, HEVC, VVC, and/or AV1 or their extensions, but rather the description is given for one possible basis on top of which the present embodiments may be partly or fully realized. A video codec may comprise an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. The compressed representation may be referred to as a bitstream or a video bitstream. A video encoder and/or a video decoder may also be separate from each other, i.e., need not form a codec. The encoder may discard some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate). The notation “(de)coder” means an encoder and/or a decoder. Hybrid video codecs, for example ITU-T H.263, H.264/AVC and HEVC, may encode the video information in two phases. At first, pixel values in a certain picture area (or “block”) are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Then, the prediction error, i.e., the difference between the predicted block of pixels and the original block of pixels, is coded. This may be done by transforming the difference in pixel values using a specified transform (e.g., Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate). Encoding process according to an embodiment is illustrated in Figure 1. Figure 1 illustrates an image to be encoded (In); a predicted representation of an image block (P’n); a prediction error signal (Dn); a reconstructed prediction error signal (D’n); a preliminary reconstructed image (I’n); a final reconstructed image (R’n); a transform (T) and inverse transform (T^-1); a quantization (Q) and inverse quantization (Q^-1); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F). In some video codecs, such as H.265/HEVC, video pictures are divided into coding units (CU) covering the area of the picture. A CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU. CU may consist of a square block of samples with a size selectable from a predefined set of possible CU sizes. A CU with the maximum allowed size may be named as LCU (largest coding unit) or CTU (coding tree unit), and the video picture is divided into non-overlapping CTUs. A CTU can be further split into a combination of smaller CUs, e.g., by recursively splitting the CTU and resultant CUs. Each resulting CU may have at least one PU and at least one TU associated with it. Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively. Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g., motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs). Similarly, each TU is associated with information describing the prediction error decoding process for the samples within the said TU (including e.g., DCT coefficient information). It may be signaled at CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered there are no TUs for the said CU. The division of the image into CUs, and division of CUs into PUs and TUs may be signaled in the bitstream allowing the decoder to reproduce the intended structure of these units. An elementary unit for the input to an encoder and the output of a decoder, respectively, in most cases is a picture. A picture given as an input to an encoder may also be referred to as a source picture, and a picture decoded by a decoded may be referred to as a decoded picture or a reconstructed picture. The source and decoded pictures are each comprised of one or more sample arrays, such as one of the following sets of sample arrays: Luma (Y) only (monochrome). Luma and two chroma (YCbCr or YCgCo). Green, Blue and Red (GBR, also known as RGB). Arrays representing other unspecified monochrome or tri-stimulus color samplings (for example, YZX, also known as XYZ). In the following, these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use. The actual color representation method in use can be indicated e.g., in a coded bitstream e.g., using the Video Usability Information (VUI) syntax of HEVC or alike. A component may be defined as an array or single sample from one of the three sample arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format. A picture may be defined to be either a frame or a field. A frame comprises a matrix of luma samples and possibly the corresponding chroma samples. A field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced. Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or chroma sample arrays may be subsampled when compared to luma sample arrays. The decoder reconstructs the output video by applying prediction means similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding (inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain). After applying prediction and prediction error decoding means the decoder sums up the prediction and prediction error signals (pixel values) to form the output video frame. The decoder (and encoder) can also apply additional filtering means to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence. Decoding process according to an embodiment is illustrated in Figure 2. Figure 2 illustrates a predicted representation of an image block (P’n); a reconstructed prediction error signal (D’n); a preliminary reconstructed image (I’n); a final reconstructed image (R’n); an inverse transform (T^-1); an inverse quantization (Q^-1); an entropy decoding (E^-1); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F). Instead, or in addition to approaches utilizing sample value prediction and transform coding for indicating the coded sample values, a color palette- based coding can be used. Palette-based coding refers to a family of approaches for which a palette, i.e., a set of colors and associated indexes, is defined and the value for each sample within a coding unit is expressed by indicating its index in the palette. Palette-based coding can achieve good coding efficiency in coding units with a relatively small number of colors (such as image areas which are representing computer screen content, like text or simple graphics). In order to improve the coding efficiency of palette coding different kinds of palette index prediction approaches can be utilized, or the palette indexes can be run length coded to be able to represent larger homogenous image areas efficiently. Also, in the case the CU contains sample values that are not recurring within the CU, escape coding can be utilized. Escape coded samples are transmitted without referring to any of the palette indexes. Instead, their values are indicated individually for each escape coded sample. The motion information may be indicated with motion vectors associated with each motion compensated image block in video codecs. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) and the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently those may be coded differentially with respect to block specific predicted motion vectors. The predicted motion vectors may be created in a predefined way, for example calculating the median of the encoded or decoded motion vectors of the adjacent blocks. Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values, the reference index of previously coded/decoded picture can be predicted. The reference index may be predicted from adjacent blocks and/or or co-located blocks in temporal reference picture. Moreover, high efficiency video codecs may employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction. Similarly, predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co- located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks. A bitstream may be defined as a sequence of bits or a sequence of syntax structures. A bitstream format may constrain the order of syntax structures in the bitstream. A syntax element may be defined as an element of data represented in the bitstream. A syntax structure may be defined as zero or more syntax elements present together in the bitstream in a specified order. In some coding formats or standards, a bitstream may be in the form of a network abstraction layer (NAL) unit stream or a byte stream, which forms the representation of coded pictures and associated data forming one or more coded video sequences. A NAL unit may be defined as a syntax structure containing an indication of the type of data to follow and bytes containing that data in the form of an RBSP interspersed as necessary with start code emulation prevention bytes. A raw byte sequence payload (RBSP) may be defined as a syntax structure containing an integer number of bytes that is encapsulated in a NAL unit. An RBSP is either empty or has the form of a string of data bits containing syntax elements followed by an RBSP stop bit and followed by zero or more subsequent bits equal to 0. A NAL unit comprises a header and a payload. The NAL unit header indicates the type of the NAL unit among other things. In some coding formats, such as AV1, a bitstream may comprise a sequence of open bitstream units (OBUs). An OBU comprises a header and a payload, wherein the header identifies a type of the OBU. Furthermore, the header may comprise a size of the payload in bytes. The phrase along the bitstream (e.g., indicating along the bitstream) or along a coded unit of a bitstream (e.g., indicating along a coded tile) may be used in claims and described embodiments to refer to transmission, signaling, or storage in a manner that the "out-of-band" data is associated with but not included within the bitstream or the coded unit, respectively. The phrase decoding along the bitstream or along a coded unit of a bitstream or alike may refer to decoding the referred out-of-band data (which may be obtained from out-of-band transmission, signaling, or storage) that is associated with the bitstream or the coded unit, respectively. For example, the phrase along the bitstream may be used when the bitstream is contained in a container file, such as a file conforming to the ISO Base Media File Format, and certain file metadata is stored in the file in a manner that associates the metadata to the bitstream, such as boxes in the sample entry for a track containing the bitstream, a sample group for the track containing the bitstream, or a timed metadata track associated with the track containing the bitstream. Video codecs may support motion compensated prediction from one source image (uni-prediction) and two sources (bi-prediction). In the case of uni-prediction a single motion vector is applied whereas in the case of bi- prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction. In the case of weighted prediction, the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal. In addition to applying motion compensation for inter picture prediction, similar approach can be applied to intra picture prediction. In this case the displacement vector indicates where from the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded. This kind of intra block copying methods can improve the coding efficiency substantially in presence of repeating structures within the frame – such as text or other graphics. The prediction residual after motion compensation or intra prediction may be first transformed with a transform kernel (like DCT) and then coded. The reason for this is that often there still exists some correlation among the residual and transform can in many cases help reduce this correlation and provide more efficient coding. Video encoders may utilize Lagrangian cost functions to find optimal coding modes, e.g., the desired Macroblock mode and associated motion vectors. This kind of cost function uses a weighting factor λ to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area: C = D + λR (Eq.1) Where C is the Lagrangian cost to be minimized, D is the image distortion (e.g., Mean Squared Error) with the mode and motion vectors considered, and R the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors). Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions, or frame rates. In these cases, the receiver can extract the desired representation depending on its characteristics (e.g., resolution that matches best the display device). Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g., the network characteristics or processing capabilities of the receiver. A scalable bitstream may consist of a “base layer” providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer may depend on the lower layers. E.g., the motion and mode information of the enhancement layer can be predicted from lower layers. Similarly, the pixel data of the lower layers can be used to create prediction for the enhancement layer. A scalable video codec for quality scalability (also known as Signal- to-Noise or SNR) and/or spatial scalability may be implemented as follows. For a base layer, a conventional non-scalable video encoder and decoder is used. The reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer. In H.264/AVC, HEVC, and similar codecs using reference picture list(s) for inter prediction, the base layer decoded pictures may be inserted into a reference picture list(s) for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer. Consequently, the encoder may choose a base-layer reference picture as inter prediction reference and indicate its use e.g., with a reference picture index in the coded bitstream. The decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as inter prediction reference for the enhancement layer. When a decoded base-layer picture is used as prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture. In addition to quality scalability following scalability modes exist: ^ Spatial scalability: Base layer pictures are coded at a lower resolution than enhancement layer pictures. ^ Bit-depth scalability: Base layer pictures are coded at lower bit-depth (e.g., 8 bits) than enhancement layer pictures (e.g., 10 or 12 bits). ^ Chroma format scalability: Enhancement layer pictures provide higher fidelity in chroma (e.g., coded in 4:4:4 chroma format) than base layer pictures (e.g., 4:2:0 format). In all of the above scalability cases, base layer information could be used to code enhancement layer to minimize the additional bitrate overhead. Scalability can be enabled in two basic ways. Either by introducing new coding modes for performing prediction of pixel values or syntax from lower layers of the scalable representation or by placing the lower layer pictures to the reference picture buffer (decoded picture buffer, DPB) of the higher layer. The first approach is more flexible and thus can provide better coding efficiency in most cases. However, the second, reference frame -based scalability, approach can be implemented very efficiently with minimal changes to single layer codecs while still achieving majority of the coding efficiency gains available. A reference frame -based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means. In order to be able to utilize parallel processing, images can be split into independently codable and decodable image segments (slices or tiles). “Slices” in this description may refer to image segments constructed of certain number of basic coding units that are processed in default coding or decoding order, while “tiles” may refer to image segments that have been defined as rectangular image regions that are processed at least to some extend as individual frames. Video may be encoded in YUV or YCbCr color space, since it is found to reflect some characteristics of human visual system and allows lower quality representation for Cb and Cr channels as human perception is less sensitive to the chrominance fidelity those channels represent. Some video coding tools perform filtering operations which convolve a set of reference samples with a set of filter parameters to output, for example, a predicted value for a certain sample in a picture. In some cases, the filter parameters may be predetermined or signaled in the bitstream. In other cases, such as when cross-component linear model (CCLM) or cross-component convolutional model (CCCM) prediction is used, the parameters are calculated using a set of reference samples in both encoder and decoder. Generally, calculation of such filter parameter involves inversion of an auto-correlation matrix, which is a computationally challenging operation. Also, when there are many filter parameters to be determined, the size of the auto-correlation matrix becomes large, which can cause numerical stability issues (overflows or underflows). Filter parameters for filters used in video coding have been calculated in different ways. For example, the CCLM of VVC/H.266 calculates its parameters by identifying two luma sample values and corresponding chroma sample values, followed by determination of sloe and offset parameters using those two sample pairs. Different variant of CCLM is used in Enhanced Compression Model 6 (ECM 6). In that case a least-mean-square method is used to find the parameters. In the case of the CCCM prediction of ECM 6, autocorrelation matrix is generated and decomposed into triangular matrices using a method resembling an LDL decomposition, which is followed by solving the filter coefficients by a sequence including two back-substitution stages and a scaling stage. The present embodiments are targeted to determine a set of filter parameters by generating a mixture matrix that combines an autocorrelation matrix with one or more cross-correlation vectors and reducing the generated mixture matrix to a diagonal form using row-wise matrix operations. The process is concluded by a single set of back-substitution operations for each set of filter parameters that are being solved. In addition, a pre-emptive mechanism applied after a round of iterations of row-wise matrix operations to avoid divisions by zero during future iterations and during the back-substitution stage is disclosed. In a method, according to present embodiments, said set of filter parameters can be used for example by cross-component prediction filters of video and image codecs. It is appreciated that the set of parameters may be used to determine parameters for other types of filters, such as filters used in motion compensated prediction or filters used to improve sample values on sample block boundaries. The filter parameters to be determined can be represented by a vector x: ^₌ [^{^} _^ ^{… ^} _^^^]^ The filter parameters can be convolved with an input vector z, which can include, for example, luminance input samples and can be written as: ^_{= [} ^{^} _^ ^{… ^} _^^^]^ The resulting value p can represent for example a predicted sample value in one of the chrominance channels and calculated as follows using the filter parameters x and the input samples z: Input vector z can be configured to include for example luminance sample values, functions of luminance sample values or constants, or a combination of those. Including a constant in input vector z corresponds to adding a constant to the output of filter p. This kind of a constant can be referred to as a bias term or a bias parameter and can be used to represent offsets between input and output values. Filter parameters x can be generally calculated by finding a solution to a set of equations that can be represented in the matrix form as: ^^ = ^ where A represents the autocorrelation matrix of a determined set of input reference samples or “training” samples used in the process, and y represents the cross-correlation vector between the input training samples and corresponding output training samples. Items in the autocorrelation matrix A (size NxN) and the cross-correlation vector y with N values can be calculated for example as follows: where M is the number of training vectors including in the process, R matrix contains input training vectors as its rows, and s represents a vector with the output training samples. The autocorrelation matrix A and the cross-correlation vector y can be calculated in different ways, for example, as described in JVET input contribution JVET-AB0174. The autocorrelation matrix A can also include regularization terms that are added to the diagonal elements of the matrix. A regularization term can be determined to be a constant value, such as 1 or 10, but also other values can be used depending on use case specific requirements for such regularization. In order to solve the Ax = y equation for filter parameters x, a mixture matrix B can be created that contains elements of both autocorrelation matrix A and cross-correlation vector y as follows: This can be triangularized (also referred to as “diagonalized”) using Gaussian elimination like method by multiplying elements on a specific row and deducting the resulting values of another row until the mixture matrix B has been reduced to its upper triangular form B’ where all elements below the main diagonal are zeros: Parameters of two or more filters can be calculated simultaneously by creating a mixture matrix containing multiple cross-correlation vectors. This may be useful when calculating luma to chroma cross-component prediction filters for two chroma sample block when both of those chroma sample blocks use the same luma reference samples. This allows using identical autocorrelation matrix A for both case and thus a single mixture matrix can reduce the row-wise operations needed in the triangularization process of the mixture matrix to half. In such case the mixture matrix C can be constructed as follows using the luma-based autocorrelation matrix A and cross-correlation vectors ycb and ycr of color channels Cb and Cr, respectively: And after the triangularization: As the mixture matrix C has been reduced to its upper triangular form, also its submatrix A has been reduced to its upper triangular form A’, and the filter coefficients for the two filters x_cb and x_cr can be solved with back- substitution using the two triangular systems: ^′^_^^ = ^′_^^ Triangularization of a mixture matrix C with dimensions N x N+2 (N rows corresponding to rows of the autocorrelation matrix A, and N+2 columns corresponding to the N columns of autocorrelation matrix A and the two extra columns corresponding to the two cross-correlation vectors ycb and ycr included in the mixture matrix) can be calculated e.g., using the example pseudo code below: for( i = 0; i < N-1; i++ ) { for( j = i + 1; j < N; j++ ) { scale = divide(a[j][i], a[i][i]) for( k = i + 1; k < N+2; k++ ) { a[j][k] -= multiply(scale, a[i][k]) } } } The divide and multiply functions can be implemented in various ways considering e.g., floating point, fixed point, or integer implementations, or a mixture of those. For example, multiply can be implemented using a fixed- point implementation and the division operation can be implemented using an approximated fixed point approach proposed in JVET document JVET- AB0174. For simplicity, row-wise pointers src and dst can be used to illustrate the row-based nature of the process. In the example below, src points to the i:th row of the mixture matrix, and dst points to each of the rows below it one after another. For each destination row j, pointed by dst, a scale value may be calculated based on the ratio of the i:th element on destination and source rows. By multiplying the source row with the scale and deducting the result from the destination row j, the i:th element on the j:th row is pushed to 0. For reducing the number of computational operations used in the process, the calculation of that element can be omitted as the result is determined to be zero. Similarly, there is neither need to calculate any elements with k < i on the j:th destination row as all the values with k < i have already been determined to be zero on both source row i and destination row j. for( i = 0; i < N-1; i++ ) { src = a[i] for( j = i + 1; j < N; j++ ) { dst = a[j] scale = divide(dst[i], src[i]) for( k = i + 1; k < N+2; k++ ) { dst[k] -= multiply(scale, src[k]) } } } Or: Iterate over rows of a mixture matrix (source row) { Iterate over rows below the source row (destination row) { Calculate a scale between destination row and source row Multiply source row with the scale Deduct the multiplied source row from the destination row } } In order to avoid divisions by zero, and to keep diagonal values of the autocorrelation matrix A positive even after triangularization, different approaches can be used. A check for a diagonal element of the next processing row can be performed after iterating over rows below the current source row as shown below: for( i = 0; i < N-1; i++ ) { for( j = i + 1; j < N; j++ ) { scale = divide(a[j][i], a[i][i]) for( k = i + 1; k < N+2; k++ ) { a[j][k] -= multiply(scale, a[i][k]) } } a[i+1][i+1] = a[i+1][i+1] < zeroThr ? zeroThr : a[i+1][i+1] } Or: Iterate over rows of a mixture matrix (source row) { Iterate over rows below the source row (destination row) { Calculate a scale between destination row and source row Multiply source row with the scale Deduct the multiplied source row from the destination row } Set the diagonal element of the next source row to a value not smaller than zeroThr } This guarantees the diagonal element a[i+1][i+1] on the next source row i+1 to have at least a value of zeroThr. zeroThr can be determined to be for example 1, or it can be determined to be some other positive number larger than zero. As another example, the absolute value of the diagonal element on the next source row can be used in comparison with the threshold value zeroThr. In some embodiments, during triangularization, it may be desirable to know if the autocorrelation matrix A is degenerate (or non-invertible) and no solution to the system of equations exists. If at any point during triangularization a row in the autocorrelation matrix A becomes all zero (or equivalently if each element of said row is less than zeroThr) with respective elements in cross-correlation vectors being non-zero (or larger than zeroThr), then the autocorrelation matrix A can be considered degenerate, and elimination can be terminated early. In such case, a fixed set of filter coefficients (for example all zero) can be output. Similarly, if a diagonal element of the autocorrelation matrix A becomes zero or has a negative value, the matrix can be considered degenerate, and elimination can be terminated early. After the mixture matrix has been reduced to its triangular form, back- substitution operations can be used to solve the filter parameters xcb and xcr. This can be done, for example, as illustrated by the pseudo code below: x[N-1] = divide(a[N-1][col], a[N-1][N-1]) for( i = N-2; i >= 0; i-- ) { val = a[i][col] for( j = i+1; j < N; j++ ) { val -= multiply(a[i][j], x[j]) } x[i] = divide(val, a[i][i]) } As is shown above, all the division operations use diagonal elements a[i][j] of the triangularized mixture matrix as denominators and no additional checks for zero valued denominators need to be performed, if those were done during the triangularization phase as suggested above. The column parameter col determines which column of the triangularized mixture matrix is used as the target vector in the back-substitution process. If making a selection of the examples above and including cross-correlation vector ycb as the N:th column of the mixture matrix and cross-correlation vector vcr as the N+1:th column of the mixture matrix, col can be set to N when solving xcb coefficients for the filter used for predicting samples of a first color component Cb, and col can be set to N+1 when solving xcr coefficients for the filter used for predicting samples of a second color component Cr. During the triangularization process, dynamic range of the intermediate values of the mixture matrix can be controlled in different ways. For example, values on a specific row of the mixture matrix can be shifted up or down to avoid overflows that can happen when the scale parameter gets large due to differences between the values on the source row i and destination row j. An example of such control mechanism is given in the pseudo code below. In this example log2 operation refers to logarithm of base 2 of the input and can also include rounding the output value down to an integer value. The difference between the source row scale srcScale and destination row scale dstScale is used to determine the effective scale value which is used in the row- wise multiplication. Also, before the multiplication operation the k:th value on the destination row j a[j][k]) is updated by shifting it down by difScale bits to compensate for the modified scale value. for( i = 0; i < N-1; i++ ) { for( j = i + 1; j < N; j++ ) { dstScale = log2(a[j][i] < 0 ? - a[j][i] : a[j][i]) srcScale = log2(a[i][i] < 0 ? - a[i][i] : a[i][i]) difScale = dstScale - srcScale > 0 ? dstScale - srcScale : 0 difRound = difScale ? 1 << (difScale - 1) : 0 scale = divide (a[j][i], a[i][i]) << difScale) for( k = i + 1; k < N+2; k++ ) { a[j][k] = (a[j][k] + difRound) >> difScale a[j][k] -= multiply(scale, a[i][k]) } } a[i+1][i+1] = a[i+1][i+1] < zeroThr ? zeroThr : a[i+1][i+1] } It is appreciated that also for the dynamic range control, different approaches can be used. For example, when calculating the scale parameter, the destination row value a[j][i] could be shifted down instead of shifting the source row value a[i][i] down to achieve a scale close to 1. Similarly, when deducting a scaled version of the source row from the destination row, the source row values could be shifted up to compensate for the modification of the scale parameter. In general, as the mixture matrix represents a system of linear equations, the rows of the matrix can be further multiplied, or rows scaled with multiplicative operation can be added to other rows to enable additional functionality or to control the dynamic range of the matrix elements. As a further example, the rows of the mixture matrix can be normalized so that the diagonal elements will have a value of 1 (or equivalently 1 << DECIM_BITS, or 2^DECIM_BITS where DECIM_BITS is the number of bits describing the decimal part of a fixed-point number). This may be useful, as later steps of the process require divisions by the diagonal elements. In the case the diagonal elements have been forced to have values of one each, the division operations become trivial, and the divisional elements can be substituted by taking the value of the numerator as the output of the division operation. In such case, the result of the triangularization operation can be given as: This can be calculated, for example, using the pseudo code below. In this case, there is no need to calculate the diagonal elements as those are determined to be all ones and the operations needing the values of the diagonal elements, such as the divisions in the back-substitution process, can ignore the diagonal elements and use a value of one instead. First a diagonal value is determined on the i:th row of the mixture matrix. That value can be clipped to a certain range to avoid divisions by zero. For example, it can be determined to have a minimum value of 1 or 10 in the case fixed point arithmetic is used and it can be determined to have a different minimum value, such as 0.001 or 0.0001, if floating point arithmetic is used. This is followed by scaling a set of remaining values of that row with the determined diagVal. All the values on that row can be also scaled, although the result of the values of elements with index k smaller than i + 1 are known without the need to calculate those (as elements with index k smaller than i have been pushed to zero and the element with index k equal to i is pushed to have a value of one as it would be divided by itself if the arithmetic operation was carried out). for( i = 0; i < N; i++ ) { diagVal = a[i][i] < 1 ? 1 : a[i][i] for( k = i + 1; k < N+2; k++ ) { a[i][k] = divide(a[i][k], diagVal) } for( j = i + 1; j < N; j++ ) { scale = a[j][i] for( k = i + 1; k < N+2; k++ ) { a[j][k] -= multiply(scale, a[i][k]) } } } Or: Iterate over rows of a mixture matrix (source row) { scale source row using the diagonal element of the row Iterate over rows below the source row (destination row) { Determine a scale value to be equal to a leading value in the destination row Multiply source row with the scale Deduct the multiplied source row from the destination row } } Division operation is often difficult to implement in practice and different approximations can be used to make the operation faster. For example, JVET contribution JVET-AB0174 teaches how the division operation can be approximated by determining a scaling parameter (scale), rounding parameter (round) and a bit-shifting parameter (shift). In such case the division of numerator (num) and denominator (denom) can be approximates as: result = (num * scale + round) >> shift where the scale, round and shift parameters depend on the denominator denom and can be determined generally by a function that estimates those parameters at desired accuracy using a desired method (e.g., as suggested in JVET-AB0174) and can be given here as outputs of function getDivParams: scale, round, shift = getDivParams( denom) Using that notation and considering that all the elements in the row i in the mixture matrix are divided by the same value diagVal, it may be enough to calculate the parameters needed to approximate the division operation only once per row and the actual scaling operations can be advantageously implemented using only multiply, addition and bit-shifting operations as illustrated in the example pseudo code below: for( i = 0; i < N; i++ ) { diagVal = a[i][i] < 1 ? 1 : a[i][i] scale, round, shift = getDivParams(diagVal) for( k = i + 1; k < N+2; k++ ) { a[i][k] = (a[i][k] * scale + round) >> shift } for( j = i + 1; j < N; j++ ) { scale = a[j][i] for( k = i + 1; k < N+2; k++ ) { a[j][k] -= multiply(scale, a[i][k]) } } } As the above processing for mixture matrix makes the diagonal elements of that matrix to have values of one, also the back-substitution process can be simplified considering this restriction. More specifically, the divisions by the diagonal elements a[i][i] can be advantageously substituted with straight-forward assignment operations. For example, the filter parameter x[N-1] can directly be set equal to the corresponding a[N-1][col] element in the mixture matrix. x[N-1] = a[N-1][col] for( i = N-2; i >= 0; i-- ) { val = a[i][col] for( j = i+1; j < N; j++ ) { val -= multiply(a[i][j], x[j]) } x[i] = val } Or without using the intermediate parameter val: x[N-1] = a[N-1][col] for( i = N-2; i >= 0; i-- ) { x[i] = a[i][col] for( j = i+1; j < N; j++ ) { x[i] -= multiply(a[i][j], x[j]) } } As an alternative to the back-substitution process, the parameter vector x could also be calculated directly from the mixture matrix if also the upper triangular elements of the matrix are pushed to zeros with further row- based scale and subtract operations. This can be implemented, for example, using the pseudo code below. for( i = N-2; i >= 0; i-- ) { for( j = i; j >= 0; j-- ) { scale = a[j][i+1] for( k = N; k < N+2; k++ ) { a[j][k] -= multiply(scale, a[i+1][k]) } } } As the resulting mixture matrix is in this example in a diagonal form with only non-zero elements on the diagonal of the matrix and in the extension columns N and higher (in the case of two filters on columns N and N+1), and all the diagonal elements are having a value of 1, the solution for the filter coefficient vector or vectors become trivial and the resulting filter parameters can be directly read from the extension columns. For example, in the case of an example where two filters, one for first color component Cb and one for second color component Cr are calculated, the filter coefficients can be assigned as follows: ^_^^ = ^′_^^ ^_^^ = ^′_^^ As shown in the following pseudo code, the filter parameters can be read from the extension columns N and N+1. This represents a straight- forward back-substitution, where the filter parameters are directly set equal to the corresponding elements in the mixture matrix. for( i = 0; i < N; i++ ) { xCb[i] = a[i][N] xCr[i] = a[i][N+1] } It should be understood that the mixture matrix format is given here as an example to clarify the operations used to calculate the filter coefficient in the suggested way. In practice, the mixture matrix can be implemented in different ways. For example, instead of arranging the data in a matrix format, the elements can be kept in row or column vector format, or the elements can be kept as separate items. Also, some of the elements can be stored in matrix format and some of the elements can be stored in vector format or as separate elements. For example, the autocorrelation matrix A and its triangularized result A’ can be stored in a matrix format where and the cross-correlation vectors y_cb and y_cr and their new forms y’_cb and y’_cr after triangularization of the autocorrelation matrix A can be stored in a vector format. It should also be understood that in practice the triangularization of the autocorrelation matrix A using row-based scaling operations can be implemented with additional re-ordering of rows (partial pivoting). Changing the order of rows does not change the outcome of the system of equations. However, such re-ordering may be beneficial for the numerical stability of practical implementations. Although some of the examples illustrate how the triangularization is setting elements of the matrix below the main diagonal to zero, the operations can naturally also be applied in inverted row order generating a triangular matrix with zeros above the main diagonal. In an embodiment a set of cross-component prediction filter coefficients is calculated using at least in part by reducing a matrix to its triangular form using a set of row-wise scaling operations. In an embodiment a set of filter coefficients is determined using matrix triangularization that includes calculating a scaling parameter between an element on a determined row of the matrix and another row of the matrix, multiplying at least one element of the determined row of the matrix with the scaling parameter and deducting the results from at least one element of the other row of the matrix. In an embodiment at least two sets of cross-component prediction filter coefficients are calculated by combining an autocorrelation matrix with two or more cross-correlation vectors and reducing the formed mixture matrix into a triangular form using row-based scaling operations. In an embodiment triangularization can be terminated early if the autocorrelation matrix is considered degenerate (i.e., non-invertible). In such a case a fixed set of filter coefficients are produced (for example each coefficient being zero). The method according to an embodiment is shown in Figure 3. The method generally comprises processing 310 an input video/image; determining 320 a set of filter coefficients, the determining comprising: determining 330 a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; modifying 340 the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; determining 350 a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; determining 360 the set of filter parameters from the triangular form of the mixture matrix; using 370 the set of filter coefficients in a filter. An apparatus according to an embodiment comprises means for processing an input video/image; means for determining a set of filter coefficients, the determining comprising: means for determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; means for modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; means for determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; means for determining the set of filter parameters from the triangular form of the mixture matrix; means for using the set of filter coefficients in a filter. The means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry. The memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 3 according to various embodiments. An example of a data processing system for an apparatus is illustrated in Figure 4. Several functionalities can be carried out with a single physical device, e.g., all calculation procedures can be performed in a single processor if desired. The data processing system comprises a main processing unit 100, a memory 102, a storage device 104, an input device 106, an output device 108, and a graphics subsystem 110, which are all connected to each other via a data bus 112. The main processing unit 100 is a conventional processing unit arranged to process data within the data processing system. The main processing unit 100 may comprise or be implemented as one or more processors or processor circuitry. The memory 102, the storage device 104, the input device 106, and the output device 108 may include conventional components as recognized by those skilled in the art. The memory 102 and storage device 104 store data in the data processing system 100. Computer program code resides in the memory 102 for implementing, for example, a method as illustrated in a flowchart of Figure 3 according to various embodiments. The input device 106 inputs data into the system while the output device 108 receives data from the data processing system and forwards the data, for example to a display. The data bus 112 is a conventional data bus and while shown as a single line it may be any combination of the following: a processor bus, a PCI bus, a graphical bus, an ISA bus. Accordingly, a skilled person readily recognizes that the apparatus may be any data processing device, such as a computer device, a personal computer, a server computer, a mobile phone, a smart phone, or an Internet access device, for example Internet tablet computer. The various embodiments can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the method. For example, a device may comprise circuitry and electronics for handling, receiving, and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving, and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of various embodiment. If desired, the different functions discussed herein may be performed in a different order and/or concurrently with other. Furthermore, if desired, one or more of the above-described functions and embodiments may be optional or may be combined. Although various aspects of the embodiments are set out in the independent claims, other aspects comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims. It is also noted herein that while the above describes example embodiments, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications, which may be made without departing from the scope of the present disclosure as, defined in the appended claims.

Claims

Claims: 1. An apparatus, the apparatus comprising: - means for processing an input video/image; - means for determining a set of filter coefficients; where o a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector is determined; o the determined mixture matrix is modified to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; o a modified destination row in the mixture matrix is determined by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; o the set of filter parameters is determined from the triangular form of the mixture matrix; - means for using the set of filter coefficients in a filter.

2. The apparatus according to claim 1, further comprising means for modifying the mixture matrix by scaling at least one value on a determined row using a scaling parameter, a rounding parameter and a bit-shifting parameter, where the scaling parameter, the rounding parameter and the bit-shifting parameter are determined from a diagonal element of the mixture matrix on the determined row.

3. The apparatus according to claim 1 or 2, wherein the set of filter coefficients are used in convolutional cross-component prediction, where luma samples are used as input and predicted chroma samples are used as output.

4. The apparatus according to claim 3, further comprising means for determining the mixture matrix to include two cross-correlation vectors, wherein a first cross-correlation vector relates to cross-correlation between the luma samples and first chroma samples, and wherein a second cross-correlation vector relates to cross-correlation between the luma samples and second chroma samples.

5. The apparatus according to any of the claims 1 to 4, further comprising means for determining the set of filter parameters from the triangular form of the mixture matrix using a back-substitution process.

6. The apparatus according to claim 5, wherein the back-substitution comprises at least one of the filter parameters to be set equal to an element in the mixture matrix 7. The apparatus according to any of the claims 1 to 6, wherein when modifying the determined mixture matrix to the triangular form, the apparatus comprises means for checking at least for one diagonal sample if the value of said diagonal sample or the absolute value of said diagonal sample is below a predetermined threshold value, and when so, means for setting the value of said diagonal sample equal to the predetermined threshold value 8. The apparatus according to any of the claims 1 to 6, wherein when modifying the determined mixture matrix to the triangular form, the apparatus comprises means for checking at least for one diagonal sample if the value of said diagonal sample or the absolute value of said diagonal sample is below a predetermined threshold value, and when so, means for terminating the modifying process and means for setting the filter coefficients to a predetermined set of values. 9. The apparatus according to any of the claims 1 to 8, wherein the processing is encoding or decoding. 10.A method comprising: processing an input video/image; determining a set of filter coefficients, the determining comprising o determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; o modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; o determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; o determining the set of filter parameters from the triangular form of the mixture matrix; using the set of filter coefficients in a filter. An apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: process an input video/image; determine a set of filter coefficients, comprising o determining a mixture matrix containing an autocorrelation matrix and at least one cross-correlation vector; o modifying the determined mixture matrix to a triangular form by determining at least one scale parameter between a source row and destination row in the mixture matrix; o determining a modified destination row in the mixture matrix by multiplying the source row by the scale parameter and deducting the scaled source row from the destination row; o determining the set of filter parameters from the triangular form of the mixture matrix; use the set of filter coefficients in a filter.