AU2020397503A1 - Matrix based intra prediction with mode-global settings - Google Patents

Matrix based intra prediction with mode-global settings Download PDF

Info

Publication number
AU2020397503A1
AU2020397503A1 AU2020397503A AU2020397503A AU2020397503A1 AU 2020397503 A1 AU2020397503 A1 AU 2020397503A1 AU 2020397503 A AU2020397503 A AU 2020397503A AU 2020397503 A AU2020397503 A AU 2020397503A AU 2020397503 A1 AU2020397503 A1 AU 2020397503A1
Authority
AU
Australia
Prior art keywords
matrix
prediction
based intra
vector
prediction mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
AU2020397503A
Other versions
AU2020397503B2 (en
Inventor
Benjamin Bross
Philipp Helle
Tobias Hinz
Detlev Marpe
Philipp Merkle
Jonathan PFAFF
Heiko Schwarz
Michael Schäfer
Mischa Siekmann
Björn STALLENBERGER
Thomas Wiegand
Martin Winken
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Original Assignee
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV filed Critical Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Publication of AU2020397503A1 publication Critical patent/AU2020397503A1/en
Application granted granted Critical
Publication of AU2020397503B2 publication Critical patent/AU2020397503B2/en
Priority to AU2024200696A priority Critical patent/AU2024200696A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/156Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Abstract

Apparatus for decoding a predetermined block (18) of a picture using intra-prediction, configured to read, from the data stream (12), a mode index (200), the mode index pointing to one out of a list (204) of matrix-based intra-prediction modes. Additionally, the apparatus is configured to predict samples (108) of the predetermined block (18) by computing a matrix- vector product (206) between an input vector (102) derived from reference samples (17) in a neighborhood of the predetermined block (18) and a prediction matrix (19) associated with the matrix-based intra-prediction mode (k) pointed to by the mode index (200) and associating components (210) of an output vector (208) obtained by the matrix-vector product (206) onto sample positions (104) of the predetermined block. For each matrix-based intra-prediction mode, all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode are represented by a fixed point representation of a predetermined bit-depth, the predetermined bit-depth being equal for the matrix-based intra- prediction modes. Furthermore, the apparatus is configured to, for each matrix-based intra-prediction mode, compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (k) by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix-based intra-prediction modes.

Description

Matrix based intra prediction with mode-global settings
Description
Embodiments according to the invention related to Matrix based intra prediction with mode- global settings for picture and video encoding/decoding.
Typical block-based image- or video codecs usually operate by predictive coding. Thus, when the receiver of a coded image- or video-signal generates that signal on a given block, out of information already available from the coded data he constructs a prediction signal. This prediction signal serves as a first approximation of the signal on that block. In a second step, a prediction residual is decoded from the bit-stream and added to the prediction signal. The better the prediction signal is, the smaller the number of bits needed to transmit the prediction residual becomes. Thus, the quality of the prediction signal greatly affects the efficiency of the overall codec.
Typically, there are two methods of generating the prediction signal. The first method, used only in video-codecs, is inter-prediction. Here, the prediction signal is generated out of reconstructed samples that belong to a frame different from the current one. The second method is intra prediction. Here, the prediction signal is generated out of reconstructed samples that belong to the same frame and are typically spatially adjacent to the given block.
In classical codecs, intra-prediction is performed using either the angular prediction modes or the DC- and the planar modes. The angular prediction modes copy the reconstructed samples left and above the block along a specific direction defined by an angular parameter, where for fractional angle-positions, an interpolation filter is used. The DC-mode generates the prediction signal as the mean sample value of the adjacent samples left and above the block. Finally, the planar mode generates the prediction signal as a linear combination of predictions along the horizontal and the vertical directions. Optionally, a post filtering of the prediction signal or a pre-smoothing of the reference samples can be applied for any of the aforementioned prediction techniques.
Different to the classical intra prediction methods described above, matrix based intra prediction (MIP) was introduced as a new technique to generate intra prediction signals. It is part of the current draft of the evolving Versatile Video Coding (WC) standard [1]. MIP can be seen as a low-complexity variant of more general data-driven, neural network based intra prediction modes. Each MIP mode generates an intra prediction signal by multiplying a predefined matrix that depends on the prediction mode with a down-sampled version of the top and left boundary samples and then up-sampling the result. For more details, we refer to the section review of matrix based intra prediction,
A key property of MIP is that the matrices used for the various MIP modes are determined via a training algorithm that uses a large set of training data. In this training algorithm, one attempts to find the matrices such that they minimize a predefined loss function on the training data. Here, one uses a stochastic gradient descent approach in which the matrix entries are updated iteratively. Such an approach for the determination of the matrix entries requires calculations in floating point arithmetic and thus, the resulting matrix entries are given as floating point numbers. Thus, after the training, for each MIP-mode i, a matrix which has floating point entries is obtained such that in floating-point, for MIP-mode i, the reduced prediction signal is given as where rred denotes the down-sampled version of the boundary of the given block and where · denotes matrix-vector multiplication.
On the other hand, for an application in the final standard, each matrix-vector multiplication (1) needs to be approximated by a rule that is specified in integer operations. This means that for each MIP-mode i, a matrix Ai with integral entries and positive integers ci and di have to be specified such that computation of the reduced prediction signal predred is specified as predred = (( Ai — ci) · rred + (1 « ( di - 1))) » di. (2)
Here, Ai - ci denotes the matrix that arises when subtracting ci from every entry of Ai. Finally, if v and w are vectors, where w has integral entries, v + (1 « ( di — 1)) denotes the vector that arises by adding 1 « ( di — 1) to every entry of v and w » di denotes the vector that arises by shifting each entry of w to the right by di. By the underlying idea of MIP, (2) has to approximate (1) for alt possible input vectors rred.
Thus, it is desired to obtain matrices Ai with integral entries, for which equation (2) approximates equation (1) for variable input vectors rred reasonably well. Otherwise the MIP prediction modes which are specified in a codec and which need to execute the matrix-vector product in (2) using the matrices Ai might largely deviate from the behavior of the “true”, i.e. the trained MIP-modes which use a matrix-vector product with the matrices , see equation (1). Thus, the whole concept of the data driven approach to intra prediction which stands behind MIP would be violated. Therefore, it is desired to provide concepts for rendering picture coding and/or video coding more efficient to support matrix-based intra-prediction. Additionally, or alternatively, it is desired to reduce a bit stream and thus a signalization cost.
This is achieved by the subject matter of the independent claims of the present application,
Further embodiments according to the invention are defined by the subject matter of the dependent claims of the present application.
In accordance with a first aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use matrix-based intra prediction modes (MIP-modes) for predicting samples of a predetermined block of a picture stems from the fact that the matrix-vector product, i.e. the matrix-vector multiplication, performed at a MIP-mode needs to be approximated by an integer operation, whereby large deviations between the approximated and the not approximated matrix-vector product, i.e. the ‘true’ matrix-vector product, can occur. In the following, the approximated matrix-vector product might be understood as the matrix-vector product of the respective MIP-mode, since, for each MIP-mode, only this approximated matrix-vector product is calculated and not the ‘true’ matrix-vector product to determine the prediction signal of a predetermined block. According to the first aspect of the present application, this difficulty is overcome by implementing constraints for the computation of the prediction signal by the matrix-vector product. The inventors found that it is advantageous to represent all entries of a prediction matrix associated with a MIP-mode by a fixed point representation of a predetermined bit- depth and to apply for all matrix-based intra-prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the prediction matrices of all block sizes, the same predetermined bit-depth. This enables an efficient implementation of the matrix-vector product since if the entries of all prediction matrices have a common fixed predetermined bit-depth, it is possible to use specific multipliers adapted to that predetermined bit-depth and share that specific multipliers across all MIP-modes for the computation of the matrix-vector product. Moreover, an efficient memory-management when dealing with the prediction matrices is enabled if all entries of all prediction matrices can be stored in a fixed point representation, i.e. in a fixed precision. Additionally, the inventors found that it is advantageous to compute, for each MIP-mode, the matrix-vector product between an input vector and the prediction matrix associated with the respective MIP-mode by performing, for each component of the output vector, a right shift at a number of bits which is equal for all MIP-modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the prediction matrices of all block sizes. This is based on the idea that a fixed right shift for all MIP-modes enables an efficient implementation of the shifts in the matrix-vector product, since if the shift-value does not depend on the MIP-mode, a table- lookup is saved and a single fixed shifting operation can be implemented for MIP which is beneficial for a compact SIMD-implementation of the matrix-vector product and which reduces a case-dependent implementation of the matrix-vector product in a hardware implementation.
Accordingly, in accordance with a first aspect of the present application, an apparatus for decoding a predetermined block of a picture using intra-prediction, is configured to read, from a data stream, a mode index and an apparatus for encoding a predetermined block of a picture using intra-prediction, is configured to insert, into the data stream, the mode index, e.g. the apparatus for encoding, i.e. an encoder, might have selected this mode by way of a rate distortion optimization out of the list of modes and, optionally, further modes such as inter prediction modes. The mode index points to one out of a list of matrix-based intra- prediction modes. Additionally, the apparatuses, i.e. the apparatus for decoding and/or the apparatus for encoding, are/is configured to predict samples of the predetermined block by computing a matrix-vector product between an input vector derived from reference samples in a neighborhood of the predetermined block and a prediction matrix associated with the matrix-based intra-prediction mode pointed to by the mode index and associating components of an output vector obtained by the matrix-vector product onto sample positions of the predetermined block. For each matrix-based intra-prediction mode, all entries of the prediction matrix associated with the respective matrix-based intra-prediction mode are represented by a fixed point representation of a predetermined bit-depth, the predetermined bit-depth being equal for the matrix-based intra-prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the matrices of all block sizes. Additionally, the apparatuses are configured to, for each matrix-based intra-prediction mode, compute the matrix-vector product between the input vector and the prediction matrix associated with the respective matrix-based intra-prediction mode by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix- based intra-prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the matrices of all block sizes.
According to an embodiment, the number of matrix-based intra-prediction modes in the list of matrix-based intra-prediction modes is 12, 16 or 32. According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured so that the matrix-based intra-prediction modes in the list of matrix-based intra-prediction modes have associated therewith 6, 8 or 16 different matrices. For example, there could be different lists for mutually exclusive block size sets, one with 6 different matrices associated with 12 modes for block sizes within a first block size set, one with 8 different matrices associated with 16 modes for smaller block sizes within a second block size set and one with 16 different matrices associated with 32 modes for even smaller block sizes within a third block size set.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to compute the matrix-vector product between the input vector and the prediction matrix associated with the respective matrix-based intra-prediction mode in fixed point arithmetic with applying the right shift onto an intermediate result obtained by the matrix-vector product for each component of an output vector. The intermediate result, for example, is obtained by the matrix-vector product between the input vector and the prediction matrix or by the matrix-vector product between the input vector and the prediction matrix, which is offset by a positive integer, e.g. the positive integer is subtracted/added from/to every entry of the prediction matrix resulting in an intermediate matrix and the intermediate result is obtained by the matrix-vector product between the input vector and the intermediate matrix.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to, prior to computing the matrix-vector product, offset, e.g. by addition or by subtraction, for each matrix-based intra-prediction mode, all entries of the prediction matrix associated with the respective matrix-based intra-prediction mode by an offset value which is equal for the matrix-based intra-prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the matrices of all block sizes. This is based on the idea that a fixed offset value for all MIP-modes enables an efficient implementation of the matrix-vector product by saving a table-lookup.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to store, for each matrix-based intra-prediction mode, for each entry of the prediction matrix associated with the respective matrix-based intra-prediction mode, the fixed point representation in the predetermined bit-depth.
According to an embodiment, the apparatus for decoding/apparatus for encoding is configured to decode/encode the picture in 10-bit resolution, store, for each matrix-based intra-prediction mode, a magnitude of the entries of the prediction matrix associated with the respective matrix-based intra-prediction mode in a 7-bit precision, and use 6 bits as the number of bits for the right shift.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to store, for each matrix-based intra-prediction mode, the entries of the prediction matrix associated with the respective matrix-based intra-prediction mode in 8-bit sign-magnitude representation. Alternatively, in case of the entries of the prediction matrix associated with the respective matrix-based intra-prediction mode being of the same sign, the apparatus for decoding and/or the apparatus for encoding are/is configured to, prior to computing the matrix-vector product, offset, e.g. by addition or by subtraction, for each matrix-based intra-prediction mode, all entries of the prediction matrix associated with the respective matrix-based intra-prediction mode by an offset value which is equal for the matrix-based intra-prediction modes, wherein, for each matrix-based intra-prediction mode, ail entries of the prediction matrix associated with the respective matrix-based intra- prediction mode are representable by a signed 8-bit representation. Accordingly, according to this alternative, merely a 7-bit magnitude might be stored for each matrix entry, since it is not necessary to indicate the sign.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to compute the matrix-vector product between the input vector and the prediction matrix associated with the respective matrix-based intra-prediction mode in fixed point arithmetic with applying the right shift onto an intermediate result obtained by the matrix-vector product for each component of an output vector, and being represented at bit precision which is twice as high as a bit precision at which the entries of the prediction matrix associated with the matrix-based intra-prediction modes are stored. For example, the so computed matrix-vector product, e.g. the prediction signal obtained by offsetting the prediction matrix by a positive integer resulting in an intermediate matrix, computing a matrix- vector product between the input vector and the intermediate matrix resulting in the intermediate result, and performing the right shift on the intermediate result, is represented at bit precision which is twice as high as a bit precision at which the entries of the prediction matrix associated with the matrix-based intra-prediction modes are stored.
According to an embodiment, the list of matrix-based intra-prediction modes comprises one or more pairs of matrix-based intra-prediction modes. Note that the list of matrix-based intra- prediction modes may not be exclusively composed of such pairs of modes, rather there may also be other modes which are either applied using a transpose-option or a non-transpose option exclusively. For each pair of matrix-based intra-prediction modes, the prediction matrix associated with a first matrix-based intra-prediction mode of the respective pair of matrix- based intra-prediction modes is equal to the prediction matrix associated with a second matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes. The apparatuses, i.e. the apparatus for decoding and/or the apparatus for encoding, are/is configured so that, if the matrix-based intra-prediction mode pointed to by the mode index is the first matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes, e.g. a mode with odd mode index, an association of the reference samples in the neighborhood of the predetermined block with components of the input vector and of the sample positions of the predetermined block with the components of the output vector is transposed relative to the association in case of the matrix-based intra-prediction mode pointed to by the mode index being the second matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes, e.g. a mode with even mode index. That is, if a certain component of the input vector is associated with position (x,y) with (0,0) denoting the upper left corner sample of the predetermined block in the former case, then it is associated with (y,x) in the latter case. The same applies to the components of the output vector.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to use the list of matrix-based intra-prediction modes for a plurality of block dimensions.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to predict samples of the predetermined block which are offset from the sample positions with which the components of the output vector are associated, by up- sampling and/or interpolation on the basis of the output vector or on the basis of the output vector and the reference samples in the neighborhood of the predetermined block.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to derive the input vector from the reference samples in the neighborhood of the predetermined block by down-sampling and/or pooling.
According to an embodiment, the reference samples in the neighborhood of the predetermined block comprise first reference samples above the predetermined block and second reference samples to the left of the predetermined block. The apparatuses, i.e. the apparatus for decoding and/or the apparatus for encoding, are configured to derive the input vector from the reference samples in the neighborhood of the predetermined block by deriving first intermediate components from the first reference samples by down-sampling and/or pooling, deriving second intermediate components from the second reference samples by down-sampling and/or pooling, concatenating the first intermediate components and the second intermediate components to derive a preliminary input vector, and forming the input vector out of the preliminary input vector.
According to an embodiment, the apparatus for decoding/apparatus for encoding is configured to decode/encode the picture in B-bit resolution. The apparatuses are configured to form the input vector out of the preliminary input vector by subtraction of 2B-1 from a first component of the preliminary input vector so as to obtain a first component of the input vector and subtracting the first component of the preliminary input vector from further components of the preliminary input vector so as to obtain further component of the input vector, or subtracting a first component of the preliminary input vector from further components of the preliminary input vector so that the input vector is formed out of the further components. Additionally, the apparatuses are configured to correct the output vector by component-wise addition of the first component of the preliminary input vector.
According to an embodiment, the entries of the prediction matrices of the matrix-based intra- prediction modes in the list of matrix-based intra-prediction modes corresponds to the entries in table 2, shown below, but please note that maybe another shift value is chosen for listing the values in the table and maybe the values in the table might be represented at another scale.
According to an embodiment, the apparatus for decoding and/or the apparatus for encoding are/is configured to use a trained prediction matrix selected for a predetermined block for predicting samples of the predetermined block by computing a matrix-vector product between the input vector derived from reference samples in the neighborhood of the predetermined block and the trained prediction matrix which is associated with the matrix- based intra-prediction mode selected for the predetermined block and associating components of an output vector obtained by the matrix-vector product onto sample positions of the predetermined block. The trained prediction matrix is, for example, trained by an apparatus for training prediction matrices, e.g., trained by an apparatus according to the second aspect.
In accordance with a second aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use matrix-based intra prediction modes (MIP-modes) for predicting samples of a predetermined block of a picture stems from the fact that the matrix-vector product, i.e. the matrix-vector multiplication, performed at a MIP-mode needs to be approximated by an integer operation and that a trained prediction matrix for such a matrix-vector multiplication is obtained in floating point precision. According to the second aspect of the present application, this difficulty is overcome by implementing constraints for a computation of the prediction signal by the matrix-vector product already at the training of the prediction matrices for such a matrix- vector product. The inventors found that it is advantageous to optimize entries of a prediction matrix associated with a MIP-mode by using a cost function which depends on a prediction distortion measure associated with setting the entries of the prediction matrices to intermediate values onto which the representative values are mapped using a differentiable function. By this approach it is possible to restrict the range of all prediction matrix entries and avoid that some entries might not be updated during the training. During the training the entries are represented in a floating point representation and the intermediate values are then quantized onto a fixed point representation with a predetermined bit depth being equal for all MIP-modes. This enables an efficient implementation of the matrix-vector product since if the entries of all prediction matrices have a common fixed predetermined bit-depth. Such prediction matrices enable a video or picture encoder/decoder to use specific multipliers adapted to that predetermined bit-depth and share that specific multipliers across all MIP-modes for the computation of the matrix-vector product. Moreover, an efficient memory-management when dealing with the prediction matrices is enabled if all entries of all prediction matrices can be stored in a fixed point representation, i.e. in a fixed precision.
Accordingly, in accordance with a second aspect of the present application, an apparatus for training prediction matrices of a list of matrix-based intra-prediction modes among which one is ought to be selected for a predetermined block for predicting samples of the predetermined block by computing a matrix-vector product between an input vector derived from reference samples in a neighborhood of the predetermined block and one of the prediction matrices which is associated with the matrix-based intra-prediction mode selected for the predetermined block and associating components of an output vector obtained by the matrix- vector product onto sample positions of the predetermined block, is provided. The apparatus is configured to train, e.g., by use of a training set of predetermined blocks of known (e.g. original) samples and their corresponding neighborhood, the prediction matrices of the list of matrix-based intra-prediction modes by, using a gradient descent approach, optimizing representative values for entries of the prediction matrices of the list of matrix-based intra- prediction modes, which are represented in floating point representation, using a cost function which depends on a prediction distortion measure associated with setting the entries of the prediction matrices to intermediate values onto which the representative values are mapped using a differentiable function. The prediction distortion measure, e.g., defines cost increases with decreasing quality of the prediction as resulting from applying the differentiable function onto the prediction matrix under training meaning that the differentiable function is applied onto every entry of the prediction matrix under training. A domain and a codomain of the differentiable function is defined by the floating point representation, an image of the differentiable function has a predetermined dynamic range, and the differentiable function is equal for the matrix-based intra-prediction modes. Additionally, the apparatus is configured to quantize, e.g. after training, the intermediate values onto a fixed point representation so that, for each matrix-based intra-prediction mode, the prediction matrix associated with the respective matrix-based intra-prediction mode has all entries represented by a fixed point representation of a predetermined bit-depth so that the predetermined bit-depth is equal for the matrix-based intra-prediction modes, and so that, for each matrix-based intra-prediction mode, the matrix-vector product between the input vector and the prediction matrix associated with the respective matrix-based intra-prediction mode is computable by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix-based intra-prediction modes.
According to an embodiment, the differentiable function, i.e. a clipping function, has slope 1 at an origin of the image, is strictly monotonicaily increasing and has horizontal asymptotes at an upper and a lower bound of the image. The horizontal asymptotes at the upper and lower bound of the image of the differentiable function might define the predetermined dynamic range.
According to an embodiment, the differentiable function is represented/defined by wherein α,β,γ and δ are real numbers that depend on the predetermined dynamic range, i.e. a clipping range, and λ is a non-negative integer.
According to an embodiment, the differentiable function, i.e. the clipping function, is parametrizable by a shift parameter, e.g. d, in terms of a shift of the image within the codomain. The apparatus is configured to subject the shift parameter to the optimization using the gradient descent approach (could be, but does not have to be), and to derive an offset value which is equal for the matrix-based intra-prediction modes from the shift parameter so as to be used, prior to the computation of the matrix-vector product, to offset, e.g. by addition or by subtraction, for each matrix-based intra-prediction mode, all entries of the prediction matrix associated with the respective matrix-based intra-prediction mode. An embodiment is related to a method for decoding a predetermined block of a picture using intra-prediction, comprising reading, from the data stream, a mode index, the mode index pointing to one out of a list of matrix-based intra-prediction modes, and predicting samples of the predetermined block by computing a matrix-vector product between an input vector derived from reference samples in a neighborhood of the predetermined block and a prediction matrix associated with the matrix-based intra-prediction mode pointed to by the mode index and associating components of an output vector obtained by the matrix-vector product onto sample positions of the predetermined block. For each matrix-based intra- prediction mode, all entries of the prediction matrix associated with the respective matrix- based intra-prediction mode are represented by a fixed point representation of a predetermined bit-depth, the predetermined bit-depth being equal for the matrix-based intra- prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the matrices of all block sizes. Additionally, the method comprises, for each matrix-based intra-prediction mode, computing the matrix-vector product between the input vector and the prediction matrix associated with the respective matrix-based intra-prediction mode by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix-based intra-prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the matrices of all block sizes.
An embodiment is related to a method for encoding a predetermined block of a picture using intra-prediction, comprising inserting, into the data stream, a mode index, the mode index pointing to one out of a list of matrix-based intra-prediction modes, e.g. this mode might have been selected by way of a rate distortion optimization out of the list of matrix-based intra- prediction modes and, optionally, also out of further modes such as inter prediction modes. The method comprises further predicting samples of the predetermined block by computing a matrix-vector product between an input vector derived from reference samples in a neighborhood of the predetermined block and a prediction matrix associated with the matrix- based intra-prediction mode pointed to by the mode index and associating components of an output vector obtained by the matrix-vector product onto sample positions of the predetermined block. For each matrix-based intra-prediction mode, all entries of the prediction matrix associated with the respective matrix-based intra-prediction mode are represented by a fixed point representation of a predetermined bit-depth, the predetermined bit-depth being equal for the matrix-based intra-prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the matrices of all block sizes. Additionally, the method comprises, for each matrix-based intra-prediction mode, computing the matrix-vector product between the input vector and the prediction matrix associated with the respective matrix-based intra-prediction mode by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix-based intra- prediction modes, e.g. at least for the ones relating to the same block size, but maybe, optionally, for the matrices of all block sizes.
The methods as described above are based on the same considerations as the above- described encoder/decoder. The methods can, by the way, be completed with all features and functionalities, which are also described with regard to the encoder/decoder.
An embodiment is related to a method for training prediction matrices of a list of matrix-based intra-prediction modes among which one is ought to be selected for a predetermined block for predicting samples of the predetermined block by computing a matrix-vector product between an input vector derived from reference samples in a neighborhood of the predetermined block and one of the prediction matrices which is associated with the matrix- based intra-prediction mode selected for the predetermined block and associating components of an output vector obtained by the matrix-vector product onto sample positions of the predetermined block. The method comprises training, e.g. by use of a training set of predetermined blocks of known (original) samples and their corresponding neighborhood, the prediction matrices of the list of matrix-based intra-prediction modes by, using a gradient descent approach, optimizing representative values for entries of the prediction matrices of the list of matrix-based intra-prediction modes, which are represented in floating point representation, using a cost function which depends on a prediction distortion measure associated with setting the entries of the prediction matrices to intermediate values onto which the representative values are mapped using a differentiable function a domain and a codomain of which is defined by the floating point representation, an image of which has a predetermined dynamic range, and which is equal for the matrix-based intra-prediction modes. Additionally, the method comprises quantizing, e.g. after training, the intermediate values onto a fixed point representation so that, for each matrix-based intra-prediction mode, the prediction matrix associated with the respective matrix-based intra-prediction mode has all entries represented by a fixed point representation of a predetermined bit-depth so that the predetermined bit-depth is equal for the matrix-based intra-prediction modes, and so that, for each matrix-based intra-prediction mode, the matrix-vector product between the input vector and the prediction matrix associated with the respective matrix-based intra-prediction mode is computable by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix-based intra-prediction modes. The method as described above is based on the same considerations as the above- described apparatus for training prediction matrices. The method can, by the way, be completed with ail features and functionalities, which are also described with regard to the apparatus for training prediction matrices.
An embodiment is related to a data stream having a picture or a video encoded thereinto using a herein described method for encoding.
An embodiment is related to a computer program having a program code for performing, when running on a computer, a herein described method.
Brief Description of the Drawings
The drawings are not necessarily to scale; emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:
Fig. 1 shows an embodiment of an encoding into a data stream;
Fig. 2 shows an embodiment of an encoder;
Fig. 3 shows an embodiment of a reconstruction of a picture;
Fig. 4 shows an embodiment of a decoder;
Fig. 5.1 shows a prediction of a block with a reduced sample value vector according to an embodiment;
Fig. 5.2 shows a prediction of a block using an interpolation of samples according to an embodiment;
Fig. 5.3 shows a prediction of a block with a reduced sample value vector, wherein only some boundary samples are averaged, according to an embodiment;
Fig. 5.4 shows a prediction of a block with a reduced sample value vector, wherein groups of four boundary samples are averaged, according to an embodiment; Fig. 6.1 shows a matrix-based intra prediction of a predetermined block of a picture based on a mode index;
Fig. 6.2 shows a relationship between a pair of matrix-based intra prediction modes and an application of an inter-sample distance setting;
Fig. 7 shows an apparatus for decoding using a MIP-mode for prediction, according to an embodiment;
Fig. 8 shows an apparatus for training a prediction matrix, according to an embodiment; and
Fig. 9 shows an exemplary differentiable function. Detailed Description of the Embodiments
Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals even if occurring in different figures.
In the following description, a plurality of details is set forth to provide a more throughout explanation of embodiments of the present invention. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present invention. In addition, features of the different embodiments described herein after may be combined with each other, unless specifically noted otherwise.
In the following, various examples are described which may assist in achieving a more effective compression when using matrix-based intra prediction. The matrix-based intra prediction may be added to other intra-prediction modes heuristically designed, for instance, or may be provided exclusively.
In order to ease the understanding of the following examples, the description starts with a presentation of possible encoders and decoders fitting thereto into which the above outlined examples of the present application could be built. Fig. 1 shows an apparatus for block-wise encoding a picture 10 into a data stream 12. The apparatus is indicated using reference sign 14 and may be a still picture encoder or a video encoder. In other words, picture 10 may be a current picture out of a video 16 when the encoder 14 is configured to encode video 16 including picture 10 into data stream 12, or encoder 14 may encode picture 10 into data stream 12 exclusively.
As mentioned, encoder 14 performs the encoding in a block-wise manner or block-based. To this end, encoder 14 subdivides picture 10 into blocks, units of which encoder 14 encodes picture 10 into data stream 12. Examples of possible subdivisions of picture 10 into blocks 18 are set out in more detail below. Generally, the subdivision may end-up into blocks 18 of constant size such as an array of blocks arranged in rows and columns or into blocks 18 of different block sizes such as by use of a hierarchical multi-tree subdivisioning with starting the multi-tree subdivisioning from the whole picture area of picture 10 or from a pre- partitioning of picture 10 into an array of tree blocks wherein these examples shall not be treated as excluding other possible ways of subdivisioning picture 10 into blocks 18. Further, encoder 14 is a predictive encoder configured to predictively encode picture 10 into data stream 12. For a certain block 18 this means that encoder 14 determines a prediction signal for block 18 and encodes the prediction residual, i.e. the prediction error at which the prediction signal deviates from the actual picture content within block 18, into data stream 12.
Encoder 14 may support different prediction modes so as to derive the prediction signal for a certain block 18. The prediction modes, which are of importance in the following examples, are intra-prediction modes according to which the inner of block 18 is predicted spatially from neighboring, already encoded samples of picture 10. The encoding of picture 10 into data stream 12 and, accordingly, the corresponding decoding procedure, may be based on a certain coding order 20 defined among blocks 18. For instance, the coding order 20 may traverse blocks 18 in a raster scan order such as row-wise from top to bottom with traversing each row from left to right, for instance. In case of hierarchical multi-tree based subdivisioning, raster scan ordering may be applied within each hierarchy level, wherein a depth-first traversal order may be applied, i.e. leaf notes within a block of a certain hierarchy level may precede blocks of the same hierarchy level having the same parent block according to coding order 20. Depending on the coding order 20, neighboring, already encoded samples of a block 18 may be located usually at one or more sides of block 18. In case of the examples presented herein, for instance, neighboring, already encoded samples of a block 18 are located to the top of, and to the left of block 18.
Intra-prediction modes may not be the only ones supported by encoder 14. In case of encoder 14 being a video encoder, for instance, encoder 14 may also support intra-prediction modes according to which a block 18 is temporarily predicted from a previously encoded picture of video 16. Such an intra-prediction mode may be a motion-compensated prediction mode according to which a motion vector is signaled for such a block 18 indicating a relative spatial offset of the portion from which the prediction signal of block 18 is to be derived as a copy. Additionally, or alternatively, other non-intra-prediction modes may be available as well such as inter-view prediction modes in case of encoder 14 being a multi-view encoder, or non-predictive modes according to which the inner of block 18 is coded as is, i.e. without any prediction.
Before starting with focusing the description of the present application onto intra-prediction modes, a more specific example for a possible block-based encoder, i.e. for a possible implementation of encoder 14, as described with respect to Fig. 2 with then presenting two corresponding examples for a decoder fitting to Figs. 1 and 2, respectively. Fig. 2 shows a possible implementation of encoder 14 of Fig. 1, namely one where the encoder is configured to use transform coding for encoding the prediction residual although this is nearly an example and the present application is not restricted to that sort of prediction residual coding. According to Fig. 2, encoder 14 comprises a subtractor 22 configured to subtract from the inbound signal, i.e. picture 10 or, on a block basis, current block 18, the corresponding prediction signal 24 so as to obtain the prediction residual signal 26 which is then encoded by a prediction residual encoder 28 into a data stream 12. The prediction residual encoder 28 is composed of a lossy encoding stage 28a and a lossless encoding stage 28b. The lossy stage 28a receives the prediction residual signal 26 and comprises a quantizer 30 which quantizes the samples of the prediction residual signal 26. As already mentioned above, the present example uses transform coding of the prediction residual signal 26 and accordingly, the lossy encoding stage 28a comprises a transform stage 32 connected between subtractor 22 and quantizer 30 so as to transform such a spectrally decomposed prediction residual 26 with a quantization of quantizer 30 taking place on the transformed coefficients where presenting the residual signal 26. The transform may be a DCT, DST, FFT, Hadamard transform or the like. The transformed and quantized prediction residual signal 34 is then subject to lossless coding by the lossless encoding stage 28b which is an entropy coder entropy coding quantized prediction residual signal 34 into data stream 12. Encoder 14 further comprises the prediction residual signal reconstruction stage 36 connected to the output of quantizer 30 so as to reconstruct from the transformed and quantized prediction residual signal 34 the prediction residual signal in a manner also available at the decoder, i.e. taking the coding loss is quantizer 30 into account. To this end, the prediction residual reconstruction stage 36 comprises a dequantizer 38 which perform the inverse of the quantization of quantizer 30, followed by an inverse transformer 40 which performs the inverse transformation relative to the transformation performed by transformer 32 such as the inverse of the spectral decomposition such as the inverse to any of the above-mentioned specific transformation examples. Encoder 14 comprises an adder 42 which adds the reconstructed prediction residual signal as output by inverse transformer 40 and the prediction signal 24 so as to output a reconstructed signal, i.e. reconstructed samples. This output is fed into a predictor 44 of encoder 14 which then determines the prediction signal 24 based thereon. It is predictor 44 which supports all the prediction modes already discussed above with respect to Fig. 1. Fig. 2 also illustrates that in case of encoder 14 being a video encoder, encoder 14 may also comprise an in-loop filter 46 with filters completely reconstructed pictures which, after having been filtered, form reference pictures for predictor 44 with respect to inter-predicted block. As already mentioned above, encoder 14 operates block-based. For the subsequent description, the block-based operation of interest is the one subdividing picture 10 into blocks for which the intra-prediction mode is selected out of a set or plurality of intra-prediction modes supported by predictor 44 or encoder 14, respectively, and the selected intra- prediction mode performed individually. Other sorts of blocks into which picture 10 is subdivided may, however, exist as well. For instance, the above-mentioned decision whether picture 10 is inter-coded or intra-coded may be done at a granularity or in units of blocks deviating from blocks 18. For instance, the inter/intra mode decision may be performed at a level of coding blocks into which picture 10 is subdivided, and each coding block is subdivided into prediction blocks. Prediction blocks with encoding blocks for which it has been decided that intra-prediction is used, are each subdivided to an intra-prediction mode decision. To this, for each of these prediction blocks, it is decided as to which supported intra-prediction mode should be used for the respective prediction block. These prediction blocks will form blocks 18 which are of interest here. Prediction blocks within coding blocks associated with inter-prediction would be treated differently by predictor 44. They would be inter-predicted from reference pictures by determining a motion vector and copying the prediction signal for this block from a location in the reference picture pointed to by the motion vector. Another block subdivisioning pertains to the subdivisioning into transform blocks at units of which the transformations by transformer 32 and inverse transformer 40 are performed. Transformed blocks may, for instance, be the result of further subdivisioning coding blocks. Naturally, the examples set out herein should not be treated as being limiting and other examples exist as well. For the sake of completeness only, it is noted that the subdivisioning into coding blocks may, for instance, use multi-tree subdivisioning, and prediction blocks and/or transform blocks may be obtained by further subdividing coding blocks using multi-tree subdivisioning, as well.
A decoder 54 or apparatus for block-wise decoding fitting to the encoder 14 of Fig. 1 is depicted in Fig. 3. This decoder 54 does the opposite of encoder 14, i.e. it decodes from data stream 12 picture 10 in a block-wise manner and supports, to this end, a plurality of intra- prediction modes. The decoder 54 may comprise a residual provider 156, for example. All the other possibilities discussed above with respect to Fig. 1 are valid for the decoder 54, too. To this, decoder 54 may be a still picture decoder or a video decoder and all the prediction modes and prediction possibilities are supported by decoder 54 as well. The difference between encoder 14 and decoder 54 lies, primarily, in the fact that encoder 14 chooses or selects coding decisions according to some optimization such as, for instance, in order to minimize some cost function which may depend on coding rate and/or coding distortion. One of these coding options or coding parameters may involve a selection of the intra-prediction mode to be used for a current block 18 among available or supported intra- prediction modes. The selected intra-prediction mode may then be signaled by encoder 14 for current block 18 within data stream 12 with decoder 54 redoing the selection using this signalization in data stream 12 for block 18. Likewise, the subdivisioning of picture 10 into blocks 18 may be subject to optimization within encoder 14 and corresponding subdivision information may be conveyed within data stream 12 with decoder 54 recovering the subdivision of picture 10 Into blocks 18 on the basis of the subdivision information. Summarizing the above, decoder 54 may be a predictive decoder operating on a block-bases and besides intra-prediction modes, decoder 54 may support other prediction modes such as inter-prediction modes in case of, for instance, decoder 54 being a video decoder. In decoding, decoder 54 may also use the coding order 20 discussed with respect to Fig. 1 and as this coding order 20 is obeyed both at encoder 14 and decoder 54, the same neighboring samples are available for a current block 18 both at encoder 14 and decoder 54. Accordingly, in order to avoid unnecessary repetition, the description of the mode of operation of encoder 14 shall also apply to decoder 54 as far the subdivision of picture 10 into blocks is concerned, for instance, as far as prediction is concerned and as far as the coding of the prediction residual is concerned. Differences lie in the fact that encoder 14 chooses, by optimization, some coding options or coding parameters and signals within, or inserts into, data stream 12 the coding parameters which are then derived from the data stream 12 by decoder 54 so as to redo the prediction, subdivision and so forth.
Fig. 4 shows a possible implementation of the decoder 54 of Fig. 3, namely one fitting to the implementation of encoder 14 of Fig. 1 as shown in Fig. 2. As many elements of the encoder 54 of Fig. 4 are the same as those occurring in the corresponding encoder of Fig. 2, the same reference signs, provided with an apostrophe, are used in Fig. 4 in order to indicate these elements. In particular, adder 42', optional in-loop filter 46' and predictor 44' are connected into a prediction loop in the same manner that they are in encoder of Fig. 2. The reconstructed, i.e. dequantized and retransformed prediction residual signal applied to added 42' is derived by a sequence of entropy decoder 56 which inverses the entropy encoding of entropy encoder 28b, followed by the residual signal reconstruction stage 36' which is composed of dequantizer 38' and inverse transformer 40' just as it is the case on encoding side. The decoder's output is the reconstruction of picture 10. The reconstruction of picture 10 may be available directly at the output of adder 42' or, alternatively, at the output of in- loop filter 46'. Some post-filter may be arranged at the decoder's output in order to subject the reconstruction of picture 10 to some post-filtering in order to improve the picture quality, but this option is not depicted in Fig. 4. Again, with respect to Fig. 4 the description brought forward above with respect to Fig. 2 shall be valid for Fig. 4 as well with the exception that merely the encoder performs the optimization tasks and the associated decisions with respect to coding options. However, all the description with respect to block-subdivisioning, prediction, dequantization and retransforming is also valid for the decoder 54 of Fig. 4.
The embodiments described herein make use of a so-called matrix-based intra-prediction. The general concept is outlined below.
Review of matrix based intra prediction
In order to keep the present application self-contained, in this section, the main steps of the current matrix based intra prediction (MIP) method included in the Working Draft 7 of the Versatile Video Coding [1] are described. For more details, it is referred to [1].
Matrix based intra prediction (MIP) is a method for generating an intra prediction signal on a rectangular block of width W and height H. Input for the MIP-prediction process are the reconstructed samples r consisting of the reconstructed samples rtop of one row above the block and of the reconstructed samples rleft of one column left of the block, a MlP-mode- index i and the information whether the MIP-mode is to be transposed or not. Then, the MIP- prediction signal is generated using the following three steps:
1 , For specified natural numbers win,red and hin,red that depend on W and H and satisfy win,red ≤ W and hin,red ≤ H, out of rtop one generates the reduced top input rtop,red of size win,red by down-sampling/averaging and out of rleft one generates the reduced left input rleft,red of size hin,red by down-sampling/averaging.
Then, one concatenates rtop,red and rleft,red to the reduced input rred,full which is defined as if the MIP-mode is not to be transposed and as if the MIP-mode is to be transposed. Next, out of rred,full one defines the reduced input rred. Here, rred is either of the same size win,red + hin,red as rred,full or of size win,red + hin,red — 1. In the first case, rred is defined as rred[0] = rred,full[0] - 2B- 1, where B is the bit-depth and as
In the second case, rred is defined as
2. For specified natural numbers wout,red and hout,red that depend on W and H and satisfy wout,red ≤ W and hout,red ≤ H, one generates the reduced prediction signal predred on a block of width wout,red and height hout,red as predred = ((Ai — ci) · rred + (1 << (di — 1))) >> di·
Here Ai is a matrix that depends on W and H and on the MIP-mode-index i and, ci and di are non-negative integers that depend on the MIP-mode-index i, where this dependency is to be removed by the present invention. Moreover, predred is a wout,red · hout,red — dimensional vector that is identified with a signal on a block of width wout,red and height hout,red in a row-major order, if the MIP-mode does not need to be transposed, and in a column-major order, if the MIP-mode needs to be transposed.
Afterwards, one adds rred,full [0] to predred.
Finally, the result is clipped to the given bit-range [0, 2B).
3- If wout,red < W or hout,red < H, one applies up-sampling/linear interpolation to generate the full MIP-prediction signal out of the reduced prediction signal obtained at the end of the previous step. Here, the reconstructed samples are included in the linear interpolation.
Presentation of implementation examples
The general concept has been outlined above. The concept is sometimes called ALWIP (Affine-linear weighted intra prediction) in the following, as an alternative synonym for MIP (Matrix-based Intra Prediction), in order to explain the usage of these modes again in more detail.
The entire process of populating the input vector on the basis of the neighborhood, computing the matrix-vector-multiplication and linear Interpolation is illustrated for different blocks shapes in the subsequent Figs. 5.1-5.4. Note, that the remaining shapes are treated as in one of the depicted cases. Given a 4 x 4 block, ALWIP (or MIP) may take two averages along each axis of the boundary, see Fig. 5.1. As an alternative of averaging, every second sample of the neighborhood is taken, or to be more general and precise, every component of the input vector for the matrix-vector-multiplication 19 is taken from exactly one sample in the neighborhood. The resulting four input samples enter the matrix-vector- multiplication. The matrices are taken from the set S0, which is a set of matrices for the block size at hand. After adding an offset, this may yield the 16 final prediction samples. Linear interpolation is not necessary for generating the prediction signal. Thus, a total of (4 * 16)/(4 * 4) = 4 multiplications per sample are performed. See, for example, Fig. 5.1 illustrating ALWIP for 4x4 blocks. The exact computation has been explained above. Given a 8 x 8 block, ALWIP may take four averages along each axis of the boundary, see Fig. 5.2. The resulting eight input samples enter the matrix-vector-multiplication 19. The matrices are taken from the set S1. This yields 16 samples on the odd positions of the prediction block. Thus, a total of (8 * 16)/(8 * 8) = 2 multiplications per sample are performed. After adding an offset, these samples are interpolated vertically by using the reduced top boundary. Horizontal interpolation follows by using the original left boundary. See, for example, Fig. 5.2 illustrating ALWIP for 8x8 blocks. Given a 8 x 4 block, ALWIP may take four averages along the horizontal axis of the boundary and the four original boundary values on the left boundary, see Fig. 5.3. The resulting eight input samples enter the matrix-vector-multiplication. The matrices are taken from the set S1. This yields 16 samples on the odd horizontal and each vertical positions of the prediction block. Thus, a total of (8 * 16)/(8 * 4) = 4 multiplications per sample are performed. After adding an offset, these samples are interpolated horizontally by using the original left boundary. See, for example, Fig. 5.3 illustrating ALWIP for 8x4 blocks.
The transposed case is treated accordingly. Given a 16 x 16 block, ALWIP may take four averages along each axis of the boundary. The resulting eight input samples enter the matrix-vector-multiplication. The matrices are taken from the set S2. This yields 64 samples on the odd positions of the prediction block. Thus, a total of (8 * 64)/(16 * 16) = 2 multiplications per sample are performed. After adding an offset, these samples are interpolated vertically by using eight averages of the top boundary. Horizontal interpolation follows by using the original left boundary. See, for example, Fig. 5.4 illustrating ALWIP for 16x16 blocks.
For larger shapes, the procedure may be essentially the same and it is easy to check that the number of multiplications per sample is less than two.
For Wx8 blocks, only horizontal interpolation is necessary as the samples are given at the odd horizontal and each vertical positions. Thus, at most (8 * 64)/(16 * 8) = 4 multiplications per sample are performed in these cases.
Finally for Wx4 blocks with W>8, let Ak be the matrix that arises by leaving out every row that correspond to an odd entry along the horizontal axis of the downsampled block. Thus, the output size may be 32 and again, only horizontal interpolation remains to be performed. At most (8 * 32)/(16 * 4) = 4 multiplications per sample may be performed.
The transposed cases may be treated accordingly. This is illustrated in the subsequent figures.
Fig. 6.1 shows an apparatus 54 for decoding a predetermined block 18 of a picture using intra-prediction.
The apparatus 54 is configured to read, from a data stream 12, a mode index 200 using a binarization code 202, the mode index pointing to one out of a list 204 of matrix-based intra- prediction modes. The list 204 of matrix-based intra-prediction modes consists of an even number of matrix-based intra-prediction modes, wherein the matrix-based intra-prediction modes of the list 204 are grouped into pairs 212 of matrix-based intra-prediction modes. Each pair 212 consists of a first matrix-based intra-prediction mode and a second matrix- based intra-prediction mode. The apparatus 54 is configured to read, from the data stream 12, the mode index 200 using the binarization code 202 in a manner so that for each pair 212 of matrix-based intra-prediction modes the first matrix-based intra-prediction mode is assigned a first codeword and the second matrix-based intra-prediction mode is assigned a second codeword and both codewords are equal in length.
Optionally, the binarization code 202 is a variable length code, the variable length code comprises codewords of different lengths. Alternatively, the binarization code may be a truncated binary code and the number of matrix-based intra-prediction modes is not a power of two, so that the truncated binary code has codewords of different lengths. A matrix-based intra-prediction mode associated with a first pair 212 of matrix-based intra-prediction modes may be assigned a codeword different in length as a codeword assigned to matrix-based intra-prediction mode associated with a second pair 212 of matrix-based intra-prediction modes. However, both codewords of a pair 212 of matrix-based intra-prediction modes are equal in length.
According to an embodiment, the apparatus 54 may be configured to read the mode index 200 from the data stream 12 using an equi-probability bypass mode of a context adaptive binary arithmetic decoder.
Similarly, to the apparatus 54 (i.e. a decoder) for decoding the predetermined block 18 of the picture using intra-prediction, an apparatus (i.e. an encoder) for encoding the predetermined block 18 of the picture using intra-prediction can be configured to encode the mode index 200 into the data stream 12 using the binarization code 202 and optionally using the equi- probability bypass mode of a context adaptive binary arithmetic encoder.
The decoder and the encoder are configured to predict samples 108 of the predetermined block 18 by computing a matrix-vector product 206 between an input vector 102 derived from reference samples 17 in a neighborhood of the predetermined block 18 and a prediction matrix 19 associated with the matrix-based intra-prediction mode k pointed to by the mode index 200. The computation of the matrix-vector product 206 results in an output vector 208. Furthermore, the samples 108 of the predetermined block 18 are predicted by associating components 210 of the output vector 208 obtained by the matrix-vector product 206 onto sample positions 104 of the predetermined block 18. This prediction of the samples 108 of the predetermined block 18 may be performed as described with regard to Figs. 5.1 to 5.4.
For each pair 212 of matrix-based intra-prediction modes, the prediction matrix 19 associated with a first matrix-based intra-prediction mode of the respective pair 212 of matrix-based intra-prediction modes is equal to the prediction matrix 19 associated with a second matrix- based intra-prediction mode of the respective pair 212 of matrix-based intra-prediction modes. Thus, for matrix-based intra-prediction modes 2k and 2k+1, the same prediction matrix 19 is used. For each pair 212 of matrix-based intra-prediction modes, the encoder and the decoder are configured so that, if the matrix-based intra-prediction mode pointed to by the mode index 200 is the first matrix-based intra-prediction mode of the respective pair 212 of matrix-based intra-prediction modes, e.g. a mode with odd mode index 2k+1, an association of the reference samples 17 in the neighborhood of the predetermined block 18 with components 214 of the input vector 112 and of the sample positions 104 of the predetermined block 18 with the components 210 of the output vector 208 is transposed relative to the association in case of the matrix-based intra-prediction mode pointed to by the mode index 200 being the second matrix-based intra-prediction mode of the respective pair 212 of matrix-based intra-prediction modes, e.g, a mode with even mode index 2k.
The decoder/encoder might be configured to determine whether the matrix-based intra- prediction mode pointed to by the mode index 200 is the first matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes or the second matrix- based intra-prediction mode of the respective pair 212 of matrix-based intra-prediction modes, based on the parity of the mode index 200. The parity of the mode index 200 might indicate whether the input vector 102 and the output vector 208 are used in a transposed way or not for the prediction of the samples 108 of the predetermined block 18. That is, as shown in Fig. 6.2, if a certain component of the components 1 to n of the input vector 102 is associated with position (x,y) with (0,0) denoting the upper left corner sample AA of the predetermined block 18 in the former case, then it is associated with (y,x) in the latter case. The same applies to the components (AA, AB, AC, BA, CA, ...) of the output vector 208.
Each pair 212 consists of a first matrix-based intra-prediction mode and a second matrix- based intra-prediction mode, which modes are related to each other by the same prediction matrix 19 and only differ among each other in terms of the input vector 102 and the output vector 208 being transposed or not. This is advantageous, since only the mode index 200 is needed in the data stream 12, to indicate the matrix-based intra-prediction mode and whether the matrix-based intra-prediction mode is used in a transposed way or not. No additional index or flag is needed to indicate for the matrix-vector product 206, that the input vector 102 and the output vector 208 are to be used in a transposed way.
According to an embodiment, the decoder/encoder is configured to index the prediction matrix 19 out of a plurality of prediction matrices using the integer part of the mode index 200 divided by 2. This is based on the idea, that both matrix-based intra-prediction modes of a pair 212 use the same prediction matrix 19 for the prediction of the samples 108 of the predetermined block 18, for which reason the prediction matrix 19 is already sufficiently indicated by pointing with the mode index 200 to the relevant pair 212 in the list 204.
As shown in Figs. 6.1 and 6.2, the decoder/encoder might be configured to set 217 an inter- sample distance 216 of the sample positions 104 of the predetermined block 18 and an inter- sample distance 218 of the reference samples 17 in the neighborhood of the predetermined block 18 horizontally according to a first ratio of a horizontal dimension 220 of the predetermined block 18 relative to a horizontal default dimension and/or vertically according to a second ratio of a vertical dimension 222 of the predetermined block 18 relative to a vertical default dimension. This enables the usage of the list 204 of matrix-based intra- prediction modes for a plurality of block dimensions. The apparatus might fill spaces between the predicted samples by interpolation. The inter-sample distance setting 217 of the inter- sample distance 216 of the sample positions 104 of the predetermined block 18 and of the inter-sample distance 218 of the reference samples 17 in the neighborhood of the predetermined block 18 enables an improved distribution of the predicted samples 108 in the predetermined block 18 and of the reference samples 17 in the neighborhood of the predetermined block 18. Thus, the predicted samples might be equally distributed enabling an improved interpolation of samples of the predetermined block 18.
According to an embodiment, the decoder/encoder is configured to order the matrix-based intra-prediction modes in the list 204 of matrix-based intra-prediction modes equally for the plurality of block dimensions. Alternatively, the order might be adapted to, for instance, the block being wider than high or vice versa, i.e. higher than wide, or quadratic. This ordering may increase the coding efficiency and reduce the bitstream, since matrix-based intra- prediction modes for common block dimensions may be associated with short codewords and matrix-based intra-prediction modes for rare block dimensions may be associated with longer codewords.
Optionally, the plurality of block dimensions includes at least one block dimension corresponding to an aspect-ratio of larger than 4. The matrix-based intra-prediction might be optimized such that the predetermined block 18 with an aspect-ratio of the horizontal dimension 220 to the vertical dimension 222 is larger than 4. That is, the plurality of block dimensions includes a predetermined block with an at least four times larger horizontal dimension 220 than the vertical dimension 222 and/or a predetermined block with an at least four times larger vertical dimension 222 than the horizontal dimension 220. Fig. 6.2 might show a predetermined block 18 with a block dimension corresponding to an aspect-ratio of larger than 4.
According to the embodiments proposed below, the MIP modes are applied in a manner which renders the usage of MIP even more efficient than compared to the usage so far anticipated in the current WC version. The embodiments in the following will mostly illustrate the features and functionalities in view of a decoder. However, it is clear that the same or similar features and functionalities can be comprised by an encoder, e.g., a decoding performed by a decoder can correspond to an encoding by the encoder. Furthermore, the encoder might comprise the same features as described with regard to the decoder in a feedback loop, e.g., in the prediction stage 36.
Fig. 7 shows an apparatus 54 for decoding a predetermined block 18 of a picture using intra- prediction.
The apparatus 54 is configured to read, from a data stream 12, a mode index 200. The mode index 200 points to one out of a list 204 of matrix-based intra-prediction modes 2051-205n, i.e. a MIP-mode. The number n of matrix-based intra-prediction modes 2051-205n in the list 204 of matrix-based intra-prediction modes, e.g., is 12, 16 or 32. Also the embodiment is focused on intra-prediction using a MIP-mode 2051-205n, it is clear that the mode index 200 might also be usable for indicating further modes, like further intra prediction modes and/or inter prediction modes. The mode index 200 might be inserted into the data stream 12 by an apparatus 14 for encoding the predetermined block 18 of the picture using intra-prediction.
According to an embodiment, the matrix-based intra-prediction modes 2051-205n in the list 204 of matrix-based intra-prediction modes have associated therewith 6, 8 or 16 different prediction matrices 19.
According to an embodiment, the list 204 of matrix-based intra-prediction modes comprises MIP-modes for a plurality of block dimensions.
According to an embodiment, there could exist two or more lists 204 of MIP-modes, wherein the two or more lists 204 of MIP-modes differ from each other in terms of a block size of the predetermined block 18 with which the MIP modes are associated. MIP-modes associated with the same or a similar block size are comprised in the same list of the two or more lists 204 of MIP-modes. For example, there could be different lists for mutually exclusive block size sets, one with 6 different matrices associated with 12 modes for block sizes within a first block size set, one with 8 different matrices associated with 16 modes for smaller block sizes within a second block size set and one with 16 different matrices associated with 32 modes for even smaller block sizes within a third block size set. This is only an example and it is clear that also a different number of lists 204 is possible and that each list might comprise MIP-modes associated with a block size set different from the ones described above. The apparatus 54 is configured to predict samples 108 of the predetermined block 18 by computing a matrix-vector product 206 between an input vector 102 derived from reference samples 17 in a neighborhood of the predetermined block 18 and a prediction matrix 19 associated with the matrix-based intra-prediction mode 205 pointed to by the mode index 200 and associating components 210 of an output vector 208 obtained by the matrix-vector product 206 onto sample positions 104 of the predetermined block 18.
For each matrix-based intra-prediction mode 2051-205n, all entries of the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205 are represented by a fixed point representation 190 of a predetermined bit-depth 192. The fixed point representation 190 might have a fractional part 194 with n bits, an integer part 196 with m bits and optionally a sign bit 198. The predetermined bit-depth 192 is equal for all matrix- based intra-prediction modes 2051-205n of the list 204 of MIP modes, as shown in Fig. 7, compare constraint 1 below. In case there exist two or more lists 204 of MIP-modes, for each list of the two or more lists 204 of MIP-modes, the predetermined bit-depth 192, for example, is equal for all matrix-based intra-prediction modes 2051-205n of the respective list of MIP modes. In other words, the predetermined bit-depth 192, e.g., is at least the same for the MIP-modes relating to the same block size set.
According to an embodiment, the apparatus 54 is configured to store, for each matrix-based intra-prediction mode 2051-205n, for each entry of the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205, the fixed point representation in the predetermined bit-depth. This, for example, is illustrated in the table examples below, see listing 1 to listing 4,
For each matrix-based intra-prediction mode 2051-205n, the apparatus 54 is configured to compute the matrix-vector product 206 between the input vector 102 and the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205 by performing, for each component 210 of the output vector, a right shift 209 at a number 211 of bits which is equal for all matrix-based intra-prediction modes 2051-205n of the list 204 of MIP-modes, as shown in Fig. 7. The number 211 of bits for the right shift, e.g., are x bits. In case there exist two or more lists 204 of MIP-modes, for each list of the two or more lists 204 of MIP-modes, the number 211 of bits, for example, is equal for all matrix-based intra- prediction modes 2051-205n of the respective list of MIP modes. In other words, the number 211 of bits, e.g., is at least the same for the MIP-modes relating to the same block size set. The right shift 209, e.g., is indicated by » in the above described equation (2) and for the number 211 of bits compare di, wherein the apparatus 54 applies the constraint that the number 211 of bits is equal for the matrix-based intra-prediction modes 2051-205n, compare constraint 2 below.
The following constraints on the matrices Ai, i.e. the prediction matrices 19, and the parameters ci and di in equation (2) are desirable.
1. The range of the matrix entries used for MIP is fixed, e.g., the entries of the prediction matrix 19 are represented by a fixed point representation 190 of a predetermined bit- depth 192. Thus, there exist, e.g., predefined non-negative integers μ1,low, μ1,up and μ2,low, μ2,up such that for each MIP mode i 205i-205n and for each matrix entry ak,l of the matrix Ai, i.e. the prediction matrix 19, one has —2μ1, low ≤ ak,l < 2μ1,up — 1 and —2μ2,low ≤ ak,l — ci ≤ 2¾>up — 1.
Two particular examples of this constraint are given as follows.
The first example is that for a fixed positive integer μ, for all matrix entries ak,l of all matrices Ai used in the MIP-prediction one has
0 ≤ ak,l ≤ 2μ — 1 and —2μ < ak,l —ci ≤ 2μ — 1.
According to the first example, the entries of the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205 are of the same sign, e.g. positive, and the apparatus 54 is configured to, prior to computing the matrix-vector product 206, offset, for each matrix-based intra-prediction mode 205i-205n, all entries of the prediction matrix 19 associated with the respective matrix-based intra- prediction mode 205 by an offset value ci which is equal for the matrix-based intra- prediction modes 205i-205n wherein, for each matrix-based intra-prediction mode 205i-205n, all entries of the prediction matrix 19 associated with the respective matrix- based intra-prediction mode 205 are representable by a signed 8-bit representation. Accordingly, according to this alternative, merely a 7-bit magnitude might be stored for each matrix entry. This is due to the fact, that the sign bit doesn’t have to be stored.
The second example is that one has ci = 0 for each MIP mode i 2051-205n and that there exists a fixed positive integer v such that for each MIP mode i 2051-205n and each matrix entry ak,l of the matrix Ai one has
-2V ≤ ak,l ≤ 2V - 1.
According to the second example, the apparatus 54 is configured to store, for each matrix-based intra-prediction mode 2051-205n, the entries of the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205 in 8-bit sign- magnitude representation. In this second example the apparatus 54 does not, prior to computing the matrix-vector product 206, offset the entries of the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205 by an offset value Ci.
2. The shift values di, i.e. the number 211 of bits for the right shift 209, are independent of the MIP-mode i 2051-205n. Thus, there exists a positive integer d such that one has di = d for each MIP-mode i 2051-205n.
Optionally, also the following constraint may be desirable:
3. The values ci, i.e. an offset to entries of the prediction matrix 19i are independent of the MIP-mode i 2051-205n. Thus, there exists a positive integer c such that one has ci = c for each MIP-mode i 2051-205n.
Thus, the apparatus 54 might be configured to, prior to computing the matrix-vector product 206, offset, e.g. by addition or by subtraction, for each matrix-based intra- prediction mode 2051-205n, all entries of the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205 by an offset value, compare c, which is equal for all matrix-based intra-prediction modes 2051-205n of the list 204 of MIP-modes.
In case of two or more lists 204 of MIP-modes, for each list of the two or more lists 204 of MIP-modes, the offset value c, for example, is equal for all matrix-based intra- prediction modes 2051-205n of the respective list of MIP modes. In other words, the offset value c, e.g., is at least the same for the MIP-modes relating to the same block size set. The reasons why one imposes these constraints are as follows. Constraint 1 enables an efficient implementation of the matrix-vector multiplication 206 (Ai - ci) · rred of equation (2) since if the entries of all matrices (Ai - ci) have a common fixed bit-depth, i.e. a predetermined bit-depth 192, specific multipliers adapted to that bit-depth can be used and shared across all MIP-modes 2051-205n for the computation of the matrix-vector product 206. Moreover, an efficient memory-management when dealing with the matrices Ai is enabled if all entries of all matrices Ai can be stored in fixed precision, i.e. the entries of the prediction matrix 19 are represented in a fixed point representation 190. Here, an important example is that all entries of all matrices Ai can be stored in 8-bit precision, i.e. in one byte, in other words, the entries of the prediction matrix 19 might be represented in a fixed point representation 190 with the predetermine bit-depth 192 being 8-bit.
Constraint 2 enables an efficient implementation of the shifts in the expression (2), since if the shift-value, i.e. the number 211 of bits for the right shift, does not depend on the MIP- mode i, a table-lookup is saved and a single fixed shifting operation can be implemented for MIP which is beneficial for a compact SIMD-implementation of equation (2) and which reduces a case-dependent implementation of equation (2), i.e. of a matrix-vector product 206 approximating a matrix-vector product with a prediction matrix in floating point precision, in a hardware implementation. Here, a particularly important example is that the value 6 is used as a fixed shift, i.e. as the number 211 of bits. The reason is that for 10-bit content, a clipping to the 10-bit range is applied to predred in the MIP-prediction process. Thus, before down- shifting by 6, i.e. performing the right shift 209 with the number 211 of bits being 6, in equation (2), one can store the term
(Ai - ci) · rred + (1 « (di - 1)) in 16 bit, i.e. 2 bytes, wherein the term (Ai - ci) · rred + (1 << (di - 1)) represents an intermediate result 108’ obtained by the matrix-vector product 206 for each component 210 of the output vector 208.
According to an embodiment, the apparatus 54 is configured to compute the matrix-vector product 206 between the input vector 102 and the prediction matrix 19 associated with the respective matrix-based intra-prediction mode 205 in fixed point arithmetic with applying the right shift 209 onto the intermediate result 108’, e.g. (Ai - ci) · rred + (1 << (di - 1)), obtained by the matrix-vector product 206 for each component 210 of the output vector 208. Optionally, the intermediate result 108’ is represented at bit precision which is at least twice as high as a bit precision at which the entries of the prediction matrix 19 associated with the matrix-based intra-prediction modes are stored, e.g. the intermediate result 108’ might be stored in 16-bit and the entries of the prediction matrix 19 might be stored as a 7-bit magnitude or as a signed 8-bit representation.
According to an embodiment, the apparatus is configured to decode the picture in 10-bit resolution, store, for each matrix-based intra-prediction mode 2051-205n, a magnitude of the entries of the prediction matrix 19 associated with the respective matrix-based intra- prediction mode in a 7-bit precision, and use 6 bits as the number 211 of bits.
Similar to Constraint 2), Constraint 3) enables a more efficient implementation of equation (2) saving again a table-lookup.
The problem that the present application intends to solve is that it is not obvious how Constraint 2 or Constraint 3 should be satisfied together with Constraint 1 such that equation (2) serves as an approximation for equation (1).
Assume for example that by Constraint 1, all matrix entries of each MIP-matrix Ai have to be stored in 8-bit precision and that by Constraint 2, the fixed shift di = 6 has to be used for all MIP-modes i in equation (2). Also, assume for simplicity that ci = 0. This would mean that if equation (2) should approximate equation (1), for each floating point matrix that is a result of the training-algorithm for MIP, there has to exist a non-negative integer ci such that each entry of has to satisfy where ε is reasonably small so that putting one can execute equation (2) to reasonably approximate equation (1). Here, the rounding round is applied to each entry of the matrix. Moreover, for a real number x one defines and denotes by clip(A, 27), A a matrix, the matrix that arises by applying clip( —,27) to each entry of A. Note that the matrix Ai defined in assignment (4) has to approximate reasonably well if one assumes that there exists a matrix A'i with integral entries in the 8-bit range for which equation (2) approximates equation (1) for variable input vectors rred with the fixed shift 6.
On the other hand, a priori, there is no reason why a training algorithm whose output are MIP-matrices should yield matrices which are approximated by the corresponding matrices Ai defined in (4). The problem is that may contain matrix entries of absolute value greater than and that thus the clipping in equation (4) introduces a substantial difference between (1) and (2) by discarding parts of the most significant matrix entries of the matrices Thus, applying (4) a posteriori to trained matrices may lead to the phenomenon that the MIP prediction modes which are specified in a codec and which need to execute the matrix-vector product in equation (2) using the matrices Ai largely deviate from the behavior of the “true”, i.e. the trained MIP- modes which use a matrix-vector product with the matrices Thus, the whole concept of the data driven approach to intra prediction which stands behind MIP would be violated.
In fact, it can be observed that applying equation (4) to the trained matrices which are the basis of the MIP-modes used in the current VVC-draft [1] significantly changes the behavior of some MIP-modes when compared to the underlying trained modes since some of the matrices contain entries that are much larger than 2.
Finally, note that to solve solely Constraint 1 without Constraint 2 is trivial as long as the entries of each matrix lie between -27 and 27-1 , which is the case for the matrices that stand behind the MIP-modes of the current VVC-draft [1]. Here, assuming ci = 0 for the moment, one simply defines the shift-value di such that holds for each matrix-entry of and such that (5) does not hold for any
Further it is to be said, that the apparatus 54 might comprise features and/or functionalities as described with regard to Figs. 6.1 and 6.2. This means, for example, that the list 204 of matrix-based intra-prediction modes 2051-205n comprises one or more pairs 212 of matrix- based intra-prediction modes. Note that the list 204 may not be exclusively composed of such pairs 212 of MIP-modes as they are depicted to be present in list 204 in figure 6.1, rather there may also be other MIP-modes which are either applied using the transpose- option or the non-transpose option exclusively. For each pair 212 of matrix-based intra- prediction modes 2051-205n, the prediction matrix 19 associated with a first matrix-based intra-prediction mode of the respective pair 212 of matrix-based intra-prediction modes is equal to the prediction matrix 19 associated with a second matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes, e.g. for modes 2k and 2k+1 , the same matrix 19 is used. The apparatus is configured so that, if the matrix-based intra-prediction mode 205 pointed to by the mode index 200 is the first matrix-based intra- prediction mode of the respective pair 212 of matrix-based intra-prediction modes an association of the reference samples 17 in the neighborhood of the predetermined block with components 214 of the input vector 112 and of the sample positions 104 of the predetermined block 18 with the components 210 of the output vector 208 is transposed relative to the association in case of the matrix-based intra-prediction mode 205 pointed to by the mode index 200 being the second matrix-based intra-prediction mode of the respective pair 212 of matrix-based intra-prediction modes. That is, if a certain component of input vector 102 is associated with position (x,y) with (0,0) denoting the upper left corner sample of the predetermined block 18 in the former case, then it is associated with (y,x) in the latter case. The same applies to the components of the output vector 208. For more details, see description of Fig. 6.1 and 6.2.
Further it is to be said, that the apparatus 54 might comprise features and/or functionalities as described with regard to Figs. 5.1 to 5.4.
According to an embodiment, the apparatus 54 is configured to predict samples of the predetermined block 18 which are offset from the sample positions with which the components 210 of the output vector 208 are associated, by up-sampling and/or interpolation on the basis of the output vector 208 or on the basis of the output vector 208 and the reference samples 17 in the neighborhood of the predetermined block 18, as, for example, shown in one of Figs. 5.1 to 5.4.
According to an embodiment, the apparatus 54 is configured to derive the input vector 102 from the reference samples 17 in the neighborhood of the predetermined block 18 by down- sampling and/or pooling, as, for example, shown in one of Figs. 5.1 to 5.4.
According to an embodiment, the reference samples 17 in the neighborhood of the predetermined block 18 comprise first reference samples 17c above the predetermined block 18 and second reference samples 17a to the left of the predetermined block 18. The apparatus 54 is configured to derive the input vector 102 from the reference samples 17 in the neighborhood of the predetermined block 18 by deriving first intermediate components from the first reference samples 17c by down-sampling and/or pooling, deriving second intermediate components from the second reference samples 17a by down-sampling and/or pooling, concatenating the first intermediate components and the second intermediate components to derive a preliminary input vector, and forming the input vector out of the preliminary input vector. According to an embodiment, the apparatus 54 is configured to decode the picture in B-bit resolution. The apparatus 54 might be configured to form the input vector 102 out of the preliminary input vector by subtraction of 2B-1 from a first component of the preliminary input vector so as to obtain a first component of the input vector 102 and subtracting the first component of the preliminary input vector from further components of the preliminary input vector so as to obtain further components of the input vector 102. Alternatively, the apparatus 54 might be configured to form the input vector 102 out of the preliminary input vector by subtracting a first component of the preliminary input vector from further components of the preliminary input vector so that the input vector 102 is formed out of the further components. Additionally, the apparatus 54 is configured to correct the output vector 208 by component-wise addition of the first component of the preliminary input vector.
According to an embodiment, the entries of the prediction matrices of the matrix-based intra- prediction modes 2051-205n in the list 204 of matrix-based intra-prediction modes corresponds to the entries in table 2 below, see, e.g., listing 2. However, note that maybe another shift value, i.e. another number 211 of bits, is chosen for listing the values in the table and maybe the values in the table might be represented at another scale.
The following embodiments will focus on data driven training of matrix based intra prediction modes having a predefined fixed coefficient range and predefined fixed shifts and their application in a codec.
The solution presented in the present invention for the problem of getting a fixed bit-depth, a fixed shift and a fixed offset is to include Constraint 1, Constraint 2 and Constraint 3 already in the training of the MIP-prediction modes, i.e. in the derivation of the matrices Thus, one restricts the range of all matrix entries already during training, where a gradient descent algorithm is applied to successively steer the matrices towards a (local) optimum with respect to a predefined loss function on a large set of training data.
The simplest way to do this would be to multiply each matrix by 2d, d as in Constraint 2, then to add the offset c from Constraint 3 (if desired), then clip the result to the desired range of Constraint 1, then subtract the offset c and finally divide the result by 2d . However, this is infeasible since the clipping function has gradient zero outside the clipping range and thus, in such an approach, every weight that falls outside the clipping range at some point of the stochastic gradient descent would never be updated from then on. Fig. 8 shows an embodiment of an apparatus 310 for training prediction matrices 19i of a list 204 of matrix-based intra-prediction modes 2051-205n among which one 205i is ought to be selected for a predetermined block 18 for predicting samples 108 of the predetermined block 18 by computing a matrix-vector product 206 between an input vector 102 derived from reference samples 17 in a neighborhood of the predetermined block 18 and one of the prediction matrices 19i which is associated with the matrix-based intra-prediction mode 205, selected for the predetermined block 18 and associating components 210 of an output vector 208 obtained by the matrix-vector product 206 onto sample positions 104 of the predetermined block 18.
The apparatus 310 is configured to train 320 the prediction matrices 19i of the list 204 of matrix-based intra-prediction modes 2051-205n using a gradient descent approach 322. The prediction matrices 19i, for example, are trained 320 by use of a training set of predetermined blocks 18 of known original samples and their corresponding neighborhood 17. The prediction matrices 19i are trained 320 by optimizing representative values, compare for entries of the prediction matrices 19i of the list 204 of matrix-based intra-prediction modes 2051-205n, which are represented in floating point representation, using a cost function 324 which depends on a prediction distortion measure 326 associated with setting the entries of the prediction matrices 19i to intermediate values onto which the representative values are mapped using a differentiable function 328, compare f(x). The cost function 324 depends on a prediction distortion measure 326, for example such that a cost 325 increases with decreasing quality of the prediction as resulting from meaning that f(x ) is applied onto every entry of The prediction distortion measure 326 might define a deviation between a prediction signal obtainable using a prediction matrix comprising the intermediate values and an original signal associated with a predetermined block of the training set. As can be seen in Fig. 8, the prediction distortion measure 326 is zero in case of the prediction signal being equal to the original signal. The entries of the matrix might be the above mentioned representative values and the entries of the matrix might be the above mentioned intermediate values. The matrix might represent the prediction matrix under training. It is to be noted that the gradient descent approach 322 with the cost function 324 and the dependency of the cost 325 from the prediction distortion measure 326 are only shown schematically to illustrate the basic principle underlying the apparatus 310.
A domain, e.g. the x axis 302 in Fig. 9, and a codomain, e.g. the y axis 304 in Fig. 9, of the differentiable function 328 is defined by the floating point representation, an image 300 of the differentiable function 328 has a predetermined dynamic range, and the differentiable function 328 is equal for the matrix-based intra-prediction modes 2051-205n. The predetermined dynamic range, e.g., is defined by max(image 300)/min(image 300). According to an embodiment, max(image 300) is α + δ and min(image 300) is -α + δ, see equation (7) below. It is to be noted that Fig. 9 shows only a graph of an exemplary differentiable function f(x) 328.
Additionally, the apparatus 310 is configured to quantize 330, e.g. after training 320, the intermediate values onto a fixed point representation 190 so that, for each matrix-based intra-prediction mode 2051-205n, the prediction matrix 19i associated with the respective matrix-based intra-prediction mode 205i has all entries represented by a fixed point representation 190 of a predetermined bit-depth 192 so that the predetermined bit-depth 192 is equal for the matrix-based intra-prediction modes 2051-205n, and so that, for each matrix- based intra-prediction mode 205i-205n, the matrix-vector product 206 between the input vector 102 and the prediction matrix 19 associated with the respective matrix-based intra- prediction mode 205i is computable by performing, for each component of the output vector 208, a right shift 209 at a number 211 of bits which is equal for the matrix-based intra- prediction modes 2051-205n, e.g. meaning that the non-shifted-out portion bx+i to by+x of the fixed-point representation suffices to represent α, see equation (7) below. It is to be noted that Fig. 8 shows a fixed point representation 190 being a (x+y+1)-bit sign magnitude representation, However, it is also possible that the fixed point representation 190 is a (x+y)-bit magnitude representation, e.g., in case of all intermediate values having the same sign, e.g., all intermediate values might be positive values.
Thus, as a solution, in the present invention the clipping operation is approximated by a smooth function, e.g., the differentiable function 328. More precisely, out of Constraint 1, Constraint 2 and, optionally, Constraint 3, one computes the range for the unsealed matrix entries, i.e. for the representative values for entries of the prediction matrix, compare such that if one applies during training, where denotes a current matrix in the training process, the result lies within the range of Constraint 1. Then, during training, one clips each to this range by applying a smooth approximation f, i.e. the differentiable function 328, of the clipping function that is realized as where α, β, γ and δ are real numbers that depend on the clipping range, i.e. the predetermined dynamic range. Moreover, λ is a non-negative integer that might be chosen experimentally. Figure 9 shows an example of the clipping function f(x), i.e. of the differentiable function 328.
According to an embodiment, the differentiable function 328 has slope 1 at the origin, is strictly monotonicaily increasing and has horizontal asymptotes at the upper and lower bound of the image 300.
According to an embodiment, the differentiable function 328 is parametrizable by a shift parameter, e.g. δ, in terms of a shift of the image 300 within the codomain 304. Additionally, the apparatus 310 might be configured to subject the shift parameter to optimization using the gradient descent approach 322. Furthermore, the apparatus 310 might be configured to derive an offset value, compare c, from the shift parameter so as to be used, prior to the computation of the matrix-vector product 206, to offset, e.g., by addition or by subtraction, for each matrix-based intra-prediction mode, ail entries of the prediction matrix 19 associated with the respective matrix-based intra-prediction mode. The derived offset value c is equal for the matrix-based intra-prediction modes 2051-205n.
In summary, the invention of the present application is a realization of MIP having a part given by equation (2) for which Constraint 1 and Constraint 2 are satisfied or a realization of MIP having a part given by equation (2) for which Constraint 1, Constraint 2 and Constraint 3 are satisfied and for which in both cases, in the training algorithm for the floating point matrices that are then quantized to the integral matrices, Constraint 1 and Constraint 2, and, if desirable, also Constraint 3 are employed as described in this section.
The following embodiments describe examples for a stored representation of the prediction matrices.
The following listing shows the floating-point matrices that resulted from a training using the techniques provided in the present application for the MIP-modes used for mipSizeld = 2, [1]. In detail, the parameters of the dipping function, i.e. the differentiable function, were selected in such a way that the training generates matrix coefficients that can be represented using 7-bit unsigned integer numbers with a fixed shift of 6 and a fixed offset of 32.
Listing 1 : Floating-point matrix coefficients resulting from the training
The matrix coefficients shown in Listing 1 fulfill the requirements presented in the previous sections. To illustrate this, Listing 2 shows the matrix coefficients after a multiplication by 26. The range of these coefficients deviates from the final range only by the fixed offset 32. Thus, the below values are the stored matrix entries in fixed point representation in accordance with an example. The matrices are 7x64 matrices (7-component input vector and 64 component output vector). 6 matrices for 6 modes are there. According to an embodiment, the entries may deviate from the values shown below. For example, the multiplication by 26 has been chosen for illustration purposes only and, accordingly, the entries of the matrices as they are shown below might look different when choosing another factor.
Listing 2: Scaled floating-point matrix coefficients
Now, adding the fixed offset of 32 to these matrix coefficients, a set of matrices results with all coefficients equal to or greater than -0.5 and therefore rounded to non-negative values. This set is shown in Listing 3.
Listing 3: Scaled floating-point matrix coefficients after addition of constant offset
Finally, the above matrix coefficients are rounded to integer precision, i.e. the intermediate values are quantized 330 onto the fixed point representation 190. Since the minimum coefficient of the above set is -0.5, the resulting integer coefficients are non-negative and therefore can be represented by unsigned integer numbers. The maximum coefficient of the above matrix set is 127.5. These coefficients would normally be rounded to 128 but are rounded to 127 here introducing the same absolute rounding error. Thus, the resulting integer coefficients shown in Listing 4 are from the unsigned 7-bit range. In addition, since minimum rounded coefficient is 0 and maximum rounded coefficient is 127, the resulting integer coefficients fully exploit this range.
Listing 4: Unsigned 7-bit integer coefficients
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
The inventive data stream can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non- transitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.
The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
The apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.
The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer. The methods described herein, or any components of the apparatus described herein, may be performed at least partially by hardware and/or by software.
The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein. References
[1] B. Bross et al., Versatile Video Coding (Draft 7), Document JVET-P2001, Geneva, October 2019

Claims (41)

1. Apparatus (54) for decoding a predetermined block (18) of a picture using intra- prediction, configured to read, from the data stream (12), a mode index (200), the mode index pointing to one out of a list (204) of matrix-based intra-prediction modes, predict samples (108) of the predetermined block (18) by computing a matrix-vector product (206) between an input vector (102) derived from reference samples (17) in a neighborhood of the predetermined block (18) and a prediction matrix (19) associated with the matrix-based intra-prediction mode (k) pointed to by the mode index (200) and associating components (210) of an output vector (208) obtained by the matrix-vector product (206) onto sample positions (104) of the predetermined block, wherein, for each matrix-based intra-prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) are represented by a fixed point representation (190) of a predetermined bit-depth (192), the predetermined bit-depth (192) being equal for the matrix-based intra-prediction modes (2051-205n), wherein the apparatus (54) is configured to, for each matrix-based intra-prediction mode (2051-205n), compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) by performing, for each component (210) of the output vector (208), a right shift (209) at a number (211) of bits which is equal for the matrix-based intra-prediction modes (2051-205n).
2. Apparatus (54) of claim 1 , wherein the number of matrix-based intra-prediction modes (2051-205n) in the list of matrix-based intra-prediction modes is 12, 16 or 32.
3. Apparatus (54) of claim 1 or 2, configured so that the matrix-based intra-prediction modes (2051-205n) in the list of matrix-based intra-prediction modes have associated therewith 6, 8 or 16 different matrices.
4. Apparatus (54) of any previous claim 1 to 3, wherein the apparatus (54) is configured to compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in fixed point arithmetic with applying the right shift (209) onto an intermediate result (108’) obtained by the matrix-vector product (206) for each component (210) of an output vector (208).
5. Apparatus (54) of any of claims 1 to 4, wherein the apparatus (54) is configured to, prior to computing the matrix-vector product (206), offset, for each matrix-based intra- prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) by an offset value which is equal for the matrix-based intra-prediction modes (2051-205n).
6. Apparatus (54) of any of previous claims 1 to 5, wherein the apparatus (54) is configured to store, for each matrix-based intra-prediction mode (2051-205n), for each entry of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i), the fixed point representation (190) in the predetermined bit-depth (192).
7. Apparatus (54) of any of previous claims 1 to 6, wherein the apparatus (54) is configured to decode the picture in 10-bit resolution, store, for each matrix-based intra-prediction mode (2051-205n), a magnitude of the entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in a 7-bit precision, and use 6 bits as the number (211) of bits.
8. Apparatus (54) of claim 7, wherein the apparatus (54) is configured to store, for each matrix-based intra-prediction mode (2051-205n), the entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in 8-bit sign-magnitude representation or wherein the entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) are of the same sign and the apparatus (54) is configured to, prior to computing the matrix-vector product (206), offset, for each matrix-based intra- prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) by an offset value which is equal for the matrix-based intra-prediction modes (2051-205n) wherein, for each matrix-based intra- prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) are representable by a signed 8-bit representation.
9. Apparatus (54) of any of previous claims 1 to 8, wherein the apparatus (54) is configured to compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in fixed point arithmetic with applying the right shift (209) onto an intermediate result (108’) obtained by the matrix-vector product (206) for each component (210) of an output vector (208), and being represented at bit precision which is twice as high as a bit precision at which the entries of the prediction matrix (19) associated with the matrix-based intra-prediction modes (2051-205n) are stored.
10. Apparatus (54) of any of previous claims 1 to 9, configured to wherein the list (204) of matrix-based intra-prediction modes (2051-205n) comprises one or more pairs (212) of matrix-based intra-prediction modes, and, for each pair of matrix-based intra-prediction modes, the prediction matrix (19) associated with a first matrix-based intra- prediction mode of the respective pair of matrix-based intra-prediction modes is equal to the prediction matrix (19) associated with a second matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes, and the apparatus (54) is configured so that, if the matrix-based intra-prediction mode pointed to by the mode index is the first matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes, an association of the reference samples (17) in the neighborhood of the predetermined block with components (214) of the input vector (112) and of the sample positions (104) of the predetermined block (18) with the components (210) of the output vector (208) is transposed relative to the association in case of the matrix-based intra- prediction mode pointed to by the mode index being the second matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes.
11. Apparatus (54) of any of previous claims 1 to 10, configured to use the list of matrix- based intra-prediction modes (2051-205n) for a plurality of block dimensions.
12. Apparatus (54) of any of previous claims 1 to 11, wherein the apparatus (54) is configured to predict samples of the predetermined block which are offset from the sample positions with which the components (210) of the output vector (208) are associated, by up- sampling and/or interpolation on the basis of the output vector or on the basis of the output vector and the reference samples in the neighborhood of the predetermined block.
13. Apparatus (54) of any of previous claims 1 to 12, wherein the apparatus (54) is configured to derive the input vector (102) from the reference samples (17) in the neighborhood of the predetermined block (18) by down-sampling and/or pooling.
14. Apparatus (54) of any of previous claims 1 to 13, wherein the reference samples (17) in the neighborhood of the predetermined block (18) comprise first reference samples above the predetermined block and second reference samples to the left of the predetermined block, wherein the apparatus (54) is configured to derive the input vector (102) from the reference samples (17) in the neighborhood of the predetermined block (18) by deriving first intermediate components from the first reference samples by down- sampling and/or pooling, deriving second intermediate components from the second reference samples by down-sampling and/or pooling, concatenating the first intermediate components and the second intermediate components to derive a preliminary input vector, and forming the input vector out of the preliminary input vector.
15. Apparatus (54) of claim 14, configured to decode the picture in B-bit resolution; form the input vector out of the preliminary input vector by subtraction of 2B-1 from a first component of the preliminary input vector so as to obtain a first component of the input vector and subtracting the first component of the preliminary input vector from further components of the preliminary input vector so as to obtain further component of the input vector, or subtracting a first component of the preliminary input vector from further components of the preliminary input vector so that the input vector is formed out of the further components, and correct the output vector by component-wise addition of the first component of the preliminary input vector.
16. Apparatus (54) of any of previous claims 1 to 15, wherein the entries of the prediction matrices of the matrix-based intra-prediction modes (2051-205n) in the list of matrix-based intra-prediction modes corresponds to the entries in table 2.
17. Apparatus (14) for encoding a predetermined block (18) of a picture using intra- prediction, configured to insert, into the data stream (12), a mode index (200), the mode index pointing to one out of a list (204) of matrix-based intra-prediction modes (2051-205n), predict samples (108) of the predetermined block (18) by computing a matrix-vector product (206) between an input vector (102) derived from reference samples (17) in a neighborhood of the predetermined block (18) and a prediction matrix (19) associated with the matrix-based intra-prediction mode (k) pointed to by the mode index (200) and associating components (210) of an output vector (208) obtained by the matrix-vector product (206) onto sample positions (104) of the predetermined block, wherein, for each matrix-based intra-prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) are represented by a fixed point representation (190) of a predetermined bit-depth (192), the predetermined bit-depth (192) being equal for the matrix-based intra-prediction modes (2051-205n), wherein the apparatus (14) is configured to, for each matrix-based intra-prediction mode (2051-205n), compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) by performing, for each component of the output vector, a right shift (209) at a number (211) of bits which is equal for the matrix-based intra-prediction modes (2051- 205n).
18. Apparatus (14) of claim 17, wherein the number of matrix-based intra-prediction modes (2051-205n) in the list of matrix-based intra-prediction modes is 12, 16 or 32.
19. Apparatus (14) of claim 17 or 18, configured so that the matrix-based intra-prediction modes (2051-205n) in the list of matrix-based intra-prediction modes have associated therewith 6, 8 or 16 different matrices.
20. Apparatus (14) of any previous claim 17 to 19, wherein the apparatus (14) is configured to compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in fixed point arithmetic with applying the right shift (209) onto an intermediate result (108’) obtained by the matrix-vector product (206) for each component (210) of an output vector (208).
21. Apparatus (14) of any of claims 17 to 20, wherein the apparatus (14) is configured to, prior to computing the matrix-vector product (206), offset, for each matrix-based intra- prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) by an offset value which is equal for the matrix-based intra-prediction modes (2051-205n).
22. Apparatus (14) of any of previous claims 17 to 21, wherein the apparatus (14) is configured to store, for each matrix-based intra-prediction mode (2051-205n), for each entry of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i), the fixed point representation (190) in the predetermined bit-depth (192).
23. Apparatus (14) of any of previous claims 17 to 22, wherein the apparatus (14) is configured to encode the picture in 10-bit resolution, store, for each matrix-based intra-prediction mode (2051-205n), a magnitude of the entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in a 7-bit precision, and use 6 bits as the number (211) of bits.
24. Apparatus (14) of claim 23, wherein the apparatus (14) is configured to store, for each matrix-based intra-prediction mode (2051-205n), the entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in 8-bit sign-magnitude representation or wherein the entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) are of the same sign and the apparatus (14) is configured to, prior to computing the matrix-vector product (206), offset, for each matrix-based intra- prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) by an offset value which is equal for the matrix-based intra-prediction modes (2051-205n) wherein, for each matrix-based intra- prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) are representable by a signed 8-bit representation.
25. Apparatus (14) of any of previous claims 17 to 24, wherein the apparatus (14) is configured to compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) in fixed point arithmetic with applying the right shift (209) onto an intermediate result (108’) obtained by the matrix-vector product (206) for each component (210) of an output vector (208), and being represented at bit precision which is twice as high as a bit precision at which the entries of the prediction matrix (19) associated with the matrix-based intra-prediction modes (2051-205n) are stored.
26. Apparatus (14) of any of previous claims 17 to 25, configured to wherein the list (204) of matrix-based intra-prediction modes (2051-205n) comprises one or more pairs (212) of matrix-based intra-prediction modes, and, for each pair of matrix-based intra-prediction modes, the prediction matrix (19) associated with a first matrix-based intra- prediction mode of the respective pair of matrix-based intra-prediction modes is equal to the prediction matrix (19) associated with a second matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes, and the apparatus (14) is configured so that, if the matrix-based intra-prediction mode pointed to by the mode index is the first matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes, an association of the reference samples (17) in the neighborhood of the predetermined block with components (214) of the input vector (112) and of the sample positions (104) of the predetermined block (18) with the components (210) of the output vector (208) is transposed relative to the association in case of the matrix-based intra- prediction mode pointed to by the mode index being the second matrix-based intra-prediction mode of the respective pair of matrix-based intra-prediction modes.
27. Apparatus (14) of any of previous claims 17 to 26, configured to use the list of matrix-based intra-prediction modes (2051-205n) for a plurality of block dimensions.
28. Apparatus (14) of any of previous claims 17 to 27, wherein the apparatus (14) is configured to predict samples of the predetermined block which are offset from the sample positions with which the components (210) of the output vector (208) are associated, by up-sampling and/or interpolation on the basis of the output vector or on the basis of the output vector and the reference samples in the neighborhood of the predetermined block.
29. Apparatus (14) of any of previous claims 17 to 28, wherein the apparatus (14) is configured to
Derive the input vector (102) from the reference samples (17) in the neighborhood of the predetermined block (18) by down-sampling and/or pooling.
30. Apparatus (14) of any of previous claims 17 to 29, wherein the reference samples (17) in the neighborhood of the predetermined block (18) comprise first reference samples above the predetermined block and second reference samples to the left of the predetermined block, wherein the apparatus (14) is configured to derive the input vector (102) from the reference samples (17) in the neighborhood of the predetermined block (18) by deriving first intermediate components from the first reference samples by down- sampling and/or pooling, deriving second intermediate components from the second reference samples by down-sampling and/or pooling, concatenating the first intermediate components and the second intermediate components to derive a preliminary input vector, and forming the input vector out of the preliminary input vector.
31. Apparatus (14) of claim 30, configured to encode the picture in B-bit resolution; form the input vector out of the preliminary input vector by subtraction of 2B-1 from a first component of the preliminary input vector so as to obtain a first component of the input vector and subtracting the first component of the preliminary input vector from further components of the preliminary input vector so as to obtain further component of the input vector, or subtracting a first component of the preliminary input vector from further components of the preliminary input vector so that the input vector is formed out of the further components, and correct the output vector by component-wise addition of the first component of the preliminary input vector.
32. Apparatus (14) of any of previous claims 17 to 31, wherein the entries of the prediction matrices of the matrix-based intra-prediction modes (2051-205n) in the list of matrix-based intra-prediction modes corresponds to the entries in table 2.
33. Apparatus (310) for training prediction matrices of a list (204) of matrix-based intra- prediction modes (2051-205n) among which one is ought to be selected for a predetermined block for predicting samples (108) of the predetermined block (18) by computing a matrix- vector product (206) between an input vector (102) derived from reference samples (17) in a neighborhood of the predetermined block (18) and one of the prediction matrices (19) which is associated with the matrix-based intra-prediction mode (k) selected for the predetermined block and associating components (210) of an output vector (208) obtained by the matrix- vector product (206) onto sample positions (104) of the predetermined block, wherein the apparatus (310) is configured to train the prediction matrices of the list (204) of matrix-based intra-prediction modes (2051- 205n) by, using a gradient descent approach, optimizing representative values for entries of the prediction matrices of the list (204) of matrix-based intra-prediction modes, which are represented in floating point representation, using a cost function which depends on a prediction distortion measure associated with setting the entries of the prediction matrices (19) to intermediate values onto which the representative values are mapped using a differentiable function a domain and a codomain of which is defined by the floating point representation, an image (300) of which has a predetermined dynamic range, and which is equal for the matrix-based intra-prediction modes (2051-205n), quantize the intermediate values onto a fixed point representation (190) so that, for each matrix-based intra-prediction mode (2051-205n), the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i) has all entries represented by a fixed point representation (190) of a predetermined bit-depth (192) so that the predetermined bit-depth (192) is equal for the matrix-based intra-prediction modes (2051-205n), and so that, for each matrix-based intra-prediction mode (2051-205n), the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205,) is computable by performing, for each component of the output vector, a right shift (209) at a number (211) of bits which is equal for the matrix-based intra-prediction modes (2051-205n).
34. Apparatus (310) according to claim 33,
Wherein the differentiable function has slope 1 at the origin, is strictly monotonically increasing and has horizontal asymptotes at the upper and lower bound of the image.
35, Apparatus (310) according to claim 33 or 34, wherein wherein α, β, γ and δ are real numbers that depend on the predetermined dynamic range and λ is a non-negative integer.
36. Apparatus (310) according to any of previous claims 33 to 35, wherein the differentiable function is parametrizable by a shift parameter in terms of a shift of the image within the codomain, and the apparatus (310) is configured to subject the shift parameter to optimization using the gradient descent approach, and to derive an offset value which is equal for the matrix-based intra-prediction modes (2051-205n) from the shift parameter so as to be used, prior to the computation of the matrix-vector product (206), to offset, for each matrix-based intra-prediction mode (2051-205n), all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (205; 205i).
37. Method for decoding a predetermined block (18) of a picture using intra-prediction, comprising read, from the data stream (12), a mode index (200), the mode index pointing to one out of a list (204) of matrix-based intra-prediction modes, predict samples (108) of the predetermined block (18) by computing a matrix-vector product (206) between an input vector (102) derived from reference samples (17) in a neighborhood of the predetermined block (18) and a prediction matrix (19) associated with the matrix-based intra-prediction mode (k) pointed to by the mode index (200) and associating components (210) of an output vector (208) obtained by the matrix-vector product (206) onto sample positions (104) of the predetermined block, wherein, for each matrix-based intra-prediction mode, all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode are represented by a fixed point representation (190) of a predetermined bit-depth (192), the predetermined bit-depth (192) being equal for the matrix-based intra-prediction modes, wherein to the method comprises, for each matrix-based intra-prediction mode, computing the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (k) by performing, for each component of the output vector, a right shift (209) at a number (211) of bits which is equal for the matrix-based intra-prediction modes.
38. Method for encoding a predetermined block (18) of a picture using intra-prediction, comprising insert, into the data stream (12), a mode index (200), the mode index pointing to one out of a list (204) of matrix-based intra-prediction modes, predict samples (108) of the predetermined block (18) by computing a matrix-vector product (206) between an input vector (102) derived from reference samples (17) in a neighborhood of the predetermined block (18) and a prediction matrix (19) associated with the matrix-based intra-prediction mode (k) pointed to by the mode index (200) and associating components (210) of an output vector (208) obtained by the matrix-vector product (206) onto sample positions (104) of the predetermined block, wherein, for each matrix-based intra-prediction mode, all entries of the prediction matrix (19) associated with the respective matrix-based intra-prediction mode are represented by a fixed point representation (190) of a predetermined bit-depth (192), the predetermined bit-depth being equal for the matrix-based intra-prediction modes, wherein to the method comprises, for each matrix-based intra-prediction mode, compute the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (k) by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix-based intra-prediction modes.
39. Method for training prediction matrices of a list (204) of matrix-based intra-prediction modes among which one is ought to be selected for a predetermined block for predicting samples (108) of the predetermined block (18) by computing a matrix-vector product (206) between an input vector (102) derived from reference samples (17) in a neighborhood of the predetermined block (18) and one of the prediction matrices (19) which is associated with the matrix-based intra-prediction mode (k) selected for the predetermined block and associating components (210) of an output vector (208) obtained by the matrix-vector product (206) onto sample positions (104) of the predetermined block, the method comprising train the prediction matrices of the list (204) of matrix-based intra-prediction modes by, using a gradient descent approach, optimizing representative values for entries of the prediction matrices of the list (204) of matrix-based intra-prediction modes, which are represented in floating point representation, using a cost function which depends on a prediction distortion measure associated with setting the entries of the prediction matrices (19) to intermediate values onto which the representative values are mapped using a differentiable function a domain and a codomain of which is defined by the floating point representation, an image (300) of which has a predetermined dynamic range, and which is equal for the matrix-based intra-prediction modes, quantize the intermediate values onto a fixed point representation (190) so that, for each matrix-based intra-prediction mode, the prediction matrix (19) associated with the respective matrix-based intra-prediction mode has all entries represented by a fixed point representation (190) of a predetermined bit-depth so that the predetermined bit-depth is equal for the matrix- based intra-prediction modes, and so that, for each matrix-based intra-prediction mode, the matrix-vector product (206) between the input vector (102) and the prediction matrix (19) associated with the respective matrix-based intra-prediction mode (k) is computable by performing, for each component of the output vector, a right shift at a number of bits which is equal for the matrix-based intra-prediction modes.
40. Data stream generated by a method according to claim 38.
41. Computer program having a program code for performing, when running on a computer, a method according to any of claims 37, 38 and 39.
AU2020397503A 2019-12-06 2020-12-04 Matrix based intra prediction with mode-global settings Active AU2020397503B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2024200696A AU2024200696A1 (en) 2019-12-06 2024-02-05 Matrix based intra prediction with mode-global settings

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP19214201.6 2019-12-06
EP19214201 2019-12-06
PCT/EP2020/084691 WO2021110943A1 (en) 2019-12-06 2020-12-04 Matrix based intra prediction with mode-global settings

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2024200696A Division AU2024200696A1 (en) 2019-12-06 2024-02-05 Matrix based intra prediction with mode-global settings

Publications (2)

Publication Number Publication Date
AU2020397503A1 true AU2020397503A1 (en) 2022-06-23
AU2020397503B2 AU2020397503B2 (en) 2023-11-09

Family

ID=69024094

Family Applications (2)

Application Number Title Priority Date Filing Date
AU2020397503A Active AU2020397503B2 (en) 2019-12-06 2020-12-04 Matrix based intra prediction with mode-global settings
AU2024200696A Pending AU2024200696A1 (en) 2019-12-06 2024-02-05 Matrix based intra prediction with mode-global settings

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2024200696A Pending AU2024200696A1 (en) 2019-12-06 2024-02-05 Matrix based intra prediction with mode-global settings

Country Status (11)

Country Link
US (1) US20230036509A1 (en)
EP (1) EP4070542A1 (en)
JP (1) JP2023504726A (en)
KR (1) KR20220121827A (en)
CN (1) CN115066896A (en)
AU (2) AU2020397503B2 (en)
BR (1) BR112022010988A2 (en)
CA (1) CA3163771A1 (en)
MX (1) MX2022006797A (en)
TW (1) TWI789653B (en)
WO (1) WO2021110943A1 (en)

Also Published As

Publication number Publication date
TWI789653B (en) 2023-01-11
WO2021110943A1 (en) 2021-06-10
MX2022006797A (en) 2022-09-12
CN115066896A (en) 2022-09-16
TW202130173A (en) 2021-08-01
AU2020397503B2 (en) 2023-11-09
EP4070542A1 (en) 2022-10-12
US20230036509A1 (en) 2023-02-02
JP2023504726A (en) 2023-02-06
AU2024200696A1 (en) 2024-02-22
CA3163771A1 (en) 2021-06-10
BR112022010988A2 (en) 2022-08-16
KR20220121827A (en) 2022-09-01

Similar Documents

Publication Publication Date Title
US11503339B2 (en) Intra predictions using linear or affine transforms with neighbouring sample reduction
US9344744B2 (en) Apparatus for intra predicting a block, apparatus for reconstructing a block of a picture, apparatus for reconstructing a block of a picture by intra prediction
TWI771679B (en) Block-based prediction
WO2012115420A2 (en) Intra-prediction method using filtering, and apparatus using the method
WO2012087034A2 (en) Intra prediction method and apparatus using the method
JPWO2012042646A1 (en) Moving picture coding apparatus, moving picture coding method, moving picture coding computer program, moving picture decoding apparatus, moving picture decoding method, and moving picture decoding computer program
KR20220036935A (en) Coding using quadratic transforms and intra-prediction based matrix
JP7455869B2 (en) Coding using intra prediction
JP7130708B2 (en) Image encoding and decoding method, image encoding and decoding device, and corresponding computer program
JP2023021214A (en) Arithmetic encoder, arithmetic decoder, video encoder, video decoder, encoding method, decoding method, and computer program
US20110310975A1 (en) Method, Device and Computer-Readable Storage Medium for Encoding and Decoding a Video Signal and Recording Medium Storing a Compressed Bitstream
AU2020397503B2 (en) Matrix based intra prediction with mode-global settings
AU2020354500B2 (en) Efficient implementation of matrix-based intra-prediction
EP4128754A1 (en) Mip for all channels in the case of 4:4:4-chroma format and of single tree
JP7477538B2 (en) Coding using matrix-based intra prediction and quadratic transforms

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)