WO2020053262A1 - Hadamard piecewise linear approximation - Google Patents

Hadamard piecewise linear approximation Download PDF

Info

Publication number
WO2020053262A1
WO2020053262A1 PCT/EP2019/074207 EP2019074207W WO2020053262A1 WO 2020053262 A1 WO2020053262 A1 WO 2020053262A1 EP 2019074207 W EP2019074207 W EP 2019074207W WO 2020053262 A1 WO2020053262 A1 WO 2020053262A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
node
piecewise linear
transform
linear function
Prior art date
Application number
PCT/EP2019/074207
Other languages
French (fr)
Inventor
Jacob STRÖM
Per Wennersten
Jack ENHORN
Du LIU
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Publication of WO2020053262A1 publication Critical patent/WO2020053262A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/17Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/145Square transforms, e.g. Hadamard, Walsh, Haar, Hough, Slant transforms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Definitions

  • HTDF is proposed to be applied to the reconstructed video data to reduce noise. It therefore occupies the same place in the video decoding chain as the bilateral filter described by Wennersten et al. [2] did in the joint exploration model, JEM, and it is proposed to replace, rather than be used in conjunction with, the bilateral filter.
  • the HTDF uses the Hadamard transform to convert the pixels into the transform domain.
  • a pixel intensity value also known as a sample value i 0
  • the surrounding intensity valules i t , i 2 and i 3 are also used:
  • HTDF provides 0.50% of bitrate saving with increased complexity of 5% (encode) and 4% (decode) for random access compared to VTM 1.0.
  • R(i) is the spectrum component of the Hadamard transform domain, i.e., R( 0) should be identified with R0 above, R(l) with Rl , etc., of Equation 0.
  • the threshold (“THR”) is set to 128, and s may be provided as one of the following:
  • Equation 1 Equation 1
  • Equation 5 In addition to introducing the inner minus sign in— Lt/7(— K( ⁇ ), s), Equation 5 also changes the place where the threshold occurs, i.e., the top line uses Abs THR rather than Abs (i?(i)) > THR as in Equation 1.
  • using Equation 1 would make it necessary to store 129 values (0, 1 , 2, ..., 128) in the LETT.
  • the LUT in [1] has two dimensions, where one dimension corresponds to different qp s and the other dimension corresponds to different transform coefficient values.
  • the qp values range from 18 to 63, and the filtering is applied to values in [0,127] Accordingly, the LUT may consist of 46 rows with 128 values in each row.
  • the LUT-row used for filtering may be as follows:
  • LUT37 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9, 10, 10, 11, 12, 13, 14, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 82, 83, 84, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 102, 103, 104, 105, 106, 107, 108]
  • each LUT entry is of an int32 type, meaning that it requires four bytes. The total number of bytes needed for the LUT therefore becomes 23552 bytes. However, even if a single byte is used for each entry, this still amounts to 5888 bytes.
  • a third problem is the obstacle of an efficient software implementation. It is known to a person skilled in the art that, for code to be able to run efficiently on a CPU, it is important to make it compatible with single instruction, multiple data (“SIMD”) instructions that are available on modem CPUs. The reason is that a SIMD instruction can carry out many parallel operations in a single instruction. As an example, a regular CPU-instruction may be able to add two numbers together in one clock cycle. In contrast, a SIMD-instruction may be able to add eight numbers to eight other numbers in one clock cycle. This means that it may be possible to make the code run eight times faster.
  • SIMD single instruction, multiple data
  • SIMD instructions are not particularly well suited for look-up table operations, at least not if the look-up table is large. If the look-up table is small enough to fit in a single SIMD register, then efficient LUT operations are possible. As an example, if the SIMD registers are 128 bits wide, then it is possible to fit 16 eight -bit numbers in a SIMD register. It may then be possible to do, say, eight parallel look-ups from this small look-up table. However, if the look-up table is 32 eight-bit numbers, two such SIMD operations may be needed. If the table length is 128 eight-bit numbers, eight such SIMD operations may be needed.
  • Certain aspects of the present disclosure and their embodiments may provide solutions to the aforementioned problems.
  • One aspect of the proposed solution is to have a significant reduction of items that have to be stored.
  • Another one is a highly efficient SIMD implementation.
  • a method for filtering of a sample comprises obtaining a quantization parameter qp associated with said sample.
  • the method comprises generating transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample.
  • the method further comprises obtaining, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n 3 2 pieces.
  • the method comprises generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients.
  • the method comprises obtaining a filtered version of said sample based on at least one of said transformed samples.
  • the node comprises processing means operable to obtain a quantization parameter qp associated with said sample.
  • the node comprises processing means operable to generate transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample.
  • the node comprises processing means operable to obtain, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n 3 2 pieces.
  • the node comprises processing means operable to generate transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients.
  • the node comprises processing means operable to obtain a filtered version of said sample based on at least one of said transformed samples.
  • a computer program for filtering of a sample.
  • the computer program comprises code means which, when run on a computer, causes the computer to obtain a quantization parameter qp associated with said sample.
  • the computer program comprises code means which, when run on a computer, causes the computer to generate transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample.
  • the computer program comprises code means which, when run on a computer, causes the computer to obtain, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n 3 2 pieces.
  • the computer program comprises code means which, when run on a computer, causes the computer to generate transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients.
  • the computer program comprises code means which, when run on a computer, causes the computer to obtain a filtered version of said sample based on at least one of said transformed samples.
  • a computer program product comprising computer readable means and a computer program according to the third aspect, stored on the computer readable means.
  • a carrier containing the computer program according to the fourth aspect is one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
  • n 2 and the piecewise linear function is given as wherein k qp and m qp depend on qp.
  • Certain embodiments may provide one or more of the following technical advantage(s).
  • Another advantage of the embodiments disclosed herein is that they allow the use of highly efficient SIMD implementations on CPUs.
  • FIG. 1 illustrates a chart showing LUT values for a quantization parameter value and an approximated piecewise linear function according to one embodiment.
  • FIG. 2 illustrates a chart showing FUT values for a quantization parameter value and an approximated piecewise linear function according to one embodiment.
  • FIG. 3 illustrates a chart showing FUT values for a quantization parameter value and an approximated piecewise linear function according to one embodiment.
  • FIG. 4 illustrates a chart showing an approximated piecewise linear function according to one embodiment.
  • FIG. 5 is a flow chart illustrating a process according to one embodiment.
  • FIG. 6 is a diagram showing functional units of a node according to one embodiment.
  • FIG. 7 is a block diagram of a node according to one embodiment.
  • the filtering of intensity values is described as an example.
  • the filtering of intensity values normally refers to the Y in YCbCr.
  • the filtering disclosed herein can also be used for chroma values such as Cb and Cr, or any other components from other color spaces such as ICTCP, Lab, Y’u’v’, among other, in alternative embodiments.
  • Equation 7a Equation 7b
  • Equation 7a and Equation 7b are used interchangeably throughout the current disclosure.
  • the function M/(b( ⁇ ), s) can be calculated for arbitrary numbers of /?(/).
  • the function 14/ (R(i), s) can be calculated in the possible range for R(Q, which is [-4092, 4092] when dealing with lO-bit data that has been transformed using the Hadamard transform. Namely, based on Equation 0 provided above, a person of ordinary skill in the art will realize that if the intensities are in the range [0, 1023], then the largest possible number is 4092 and the smallest possible number is -4092.
  • W(R(i , a are equal to LUT(R(i), a ) and there is no distinction between approximating M/(7?( ⁇ ), s) and approximating LUT(R(i), a ).
  • a method for filtering of a sample comprises a step Sl of obtaining a quantization parameter qp associated with said sample.
  • filtering of the samples may be performed for qp values ranging from 18 to 63.
  • the method further comprises a step S2 of generating transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample.
  • An example of such an area is given in paragraph [003] where the sample to be filtered is i 0 and where the surrounding samples are i t , i 2 and i 3 .
  • Equation 0 shows how four transform coefficients R0-R3 are generated by applying a
  • Hadamard transform to an area of four samples i 0 - i 3 .
  • the method further comprises a step S3 of obtaining, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n 3 2 pieces.
  • Obtaining a filtered transform coefficient from a transform coefficient x using a piecewise linear function y is equivalent to approximating a function M/(7?( ⁇ ), s) or, equivalently, W(x, s ) that is also often denoted /-function throughout the description.
  • W(x, s ) that is also often denoted /-function throughout the description.
  • the method comprises a step S4 of generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients.
  • the method further comprises a step S5 of obtaining a filtered version of said sample based on at least one of said transformed samples.
  • a filtered version of said sample may be the corresponding transformed sample itself or it may be a combination of transformed samples surrounding said transformed sample.
  • the obtaining (S3) may be applied on the transform coefficients having an absolute value smaller than a threshold THR and wherein at least one piece of the piecewise linear function y has a slope different than zero.
  • THR may be a power of 2, for example 128, as will be described below.
  • the value of threshold may be as high as the maximum value of a transform coefficient, for example 4092 in case of ten bits used for representation of samples. This basically means that filtered transform coefficients are obtained in step S3 for all the transform coefficients, regardless of their value.
  • the piecewise linear function y may be both continuous and non-continuous, as will be described below.
  • the current disclosure describes seven embodiments of a method for filtering a sample value.
  • the next two embodiments use a piecewise linear function with n pieces to obtain filtered transform coefficients, where the piecewise linear function may be non-continuous.
  • the following two embodiments also use the n piecewise linear functions, where the piecewise linear functions are connected at threshold points.
  • the seventh embodiment is an efficient way to ensure that the piecewise linear function is continuous without spending excessive bits in the calculation.
  • a two-piece piecewise linear function is used to approximate the W-function, i.e.
  • the LUT given in [1] can be expressed by LUT( , qp ), which depends on qp and the transform coefficient x.
  • the first variable for LUT i.e., x
  • the second variable for LUT i.e., qp
  • the input x may correspond to luma or chroma coefficients.
  • 63 may be equal to:
  • k qp [1.0326, 1.0354, 1.0352, 1.0418, 1.0469, 1.0509, 1.0535, 1.0595, 1.06, 1.0666, 1.0678,
  • 0.94113 0.90866, 0.87668, 0.83445, 0.788, 0.73877, 0.68586, 0.64118, 0.58756, 0.5387, 0.4912, 0.44032, 0.3957, 0.35398, 0.31849, 0.28188, 0.24798, 0.21923, 0.19435, 0.16846, 0.14658, 0.12954, 0.1131, 0.09927], and
  • m qp [-4.5179,-5.0774,-5.4391,-6.2224,-6.9851,-7.7923,-8.6039,-9.6161,-10.4341,-11.724,- 12.7417,-14.0196,-15.2738,-16.7861,-17.8652,-19.3854,-20.6147,-21.7579,-23.1179,- 24.2107,-25.3656,-25.6423,-26.2748,-26.7235,-27.5008,-27.282,-26.838,-26.1413,-25.1188,- 24.609,-23.2322,-22.1601,-20.9494,-19.2451,-17.8859,-16.5127,-15.5019,-14.0914,-12.7722,-
  • the k qp and m qp values have been obtained by minimizing the mean squared error between the nonzero elements in the LUT and the linear function.
  • the number of bins can be chosen as needed.
  • the range of x does not have to be limited to [0, 127] The third embodiment can be applied to x with a larger range, e.g., [0, 1023] if needed. For the purpose of explanation and the sake of simplicity, x is assumed to be between 0 and 127 in the following description.
  • the thresholds of each of the n bins may be denoted as to, ti, t 2 , ... t n .
  • a pair of k qp bin and m qp bin values is needed.
  • a value of x e [tq, t 2 ] may never cause k qp bini x + m qP btn to be negative.
  • the max-operation may be omitted, thereby preserving computation resources.
  • the approximation is depicted in FIG. 2.
  • the approximated piecewise linear functions shown in FIG. 2 may have 8 (possible disconnected) piecewise linear functions.
  • a fourth embodiment is similar to the third embodiment, in that an n- piece piecewise linear function is used. However, in the fourth embodiment, the approximation is applied for the entire range of x. As described above, this range of x can be [-4092, 4092] for lO-bit values that have been transformed with the Hadamard transform. In some embodiments, the range of x may be [0, 4092] if the W (x) is symmetric.
  • the third embodiment is similar to the second embodiment, the second embodiment may be discontinuous at the thresholds. Accordingly, a solution is provided as a fifth embodiment in which the piecewise linear function is continuous.
  • the value of y may differ for x ⁇ t; and x>ft.
  • the equation is modified as shown below:
  • Equation 10 n— l, y, k qP bin an d Tn qp,bin are provided by Equation 10 and Equation 11 , respectively.
  • Equation 12 and Equation 13 may be used to retrieve the LUT values.
  • LUT m _qp [0,-6,-20,-26,-34,-34,-34,-27.4667]
  • the approximation is depicted in FIG. 3.
  • the approximated piecewise linear functions have 8 connected piecewise linear functions.
  • a sixth embodiment is provided similar to the fifth embodiment.
  • the piecewise linear function with n pieces is used for the entire range of x.
  • this range of x can be [-4092, 4092] for lO-bit values that have been transformed with the Hadamard transform.
  • FIG. 4 illustrates approximating the curve W(x) (shown in solid lines) using a piecewise linear line segment (shown in dotted lines).
  • D 16 since division can be simply replaced with a right shift.
  • the approximate value W (x) may now be calculated as the value W k plus Ax steps along the slope ( W k+1 — M4)/D: (Equation 22 a)
  • Equation 22a may also be rewritten as: (Equation 22 b) .
  • W k+1 — W k is small, it may be advantageous to store that in a separate FFTT for parallel fetching with W k .
  • Wk+ Wk may be identified as the slope k value and W k — x k could be identified as the m value.
  • calculating Wk+1 Wk with enough precision may require 5 integer bits (to hold 32) and four fractional bits (to represent steps of 1/16), or 9 bits in total. This would be multiplied by x which would be a 12 bit number. Hence a 9 times 12 bit multiplication would be required, which is much more than the 5 times 4 bit multiplication described in the other embodiments above.
  • the seventh embodiment is less costly in this regard.
  • This seventh embodiment may be used to approximate the LUT up to THR or it can be used to approximate the entire function W ( ).
  • SIMD operations allow the execution of several operations simultaneously on a modern CPU. As an example, if a normal machine code instruction can add two numbers to each other, a SIMD operation can add eight numbers to eight other numbers in parallel. This can improve performance considerably.
  • the LUT37 array described as an example above has 128 items, and it would therefore be too big to implement using a single SIMD operation on current hardware. In contrast, it is easy to execute arithmetic operations used in Equation 2, for example, using SIMD instructions.
  • Equation 6 contains several executions of Equation 6, each execution including a multiplication and addition followed by a max operation.
  • execution including a multiplication and addition followed by a max operation.
  • parallel computations of Equation 6 can be carried out in just two instructions; one for the multiply and add and one for the max-operation.
  • FIG. 6 is a diagram showing functional units of a node 602 for filtering of a sample according to one embodiment.
  • Node 602 may for example be an encoder.
  • node 602 may be a decoder.
  • Node 602 includes an obtaining unit 604 for obtaining a quantization parameter qp associated with said sample.
  • Node 602 includes a generating unit 606 for generating transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample.
  • Node 602 further includes an obtaining unit 608 for obtaining, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n 3 2 pieces.
  • Node 602 further includes a generating unit 610 for generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients.
  • Node 602 includes a generating unit 610 for generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients.
  • Node 602 includes an obtaining unit 612 for obtaining a filtered version of said sample based on at least one of said transformed samples.
  • FIG. 7 is a block diagram of a node 602 for filtering of a sample according to one embodiment.
  • Node 602 may for example be an encoder. Alternatively, node 602 may be a decoder.
  • node 602 may comprise: processing circuitry (PC) 702, which may include one or more processors (P) 755 (e.g., a general purpose microprocessor and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like); a network interface 748 comprising a transmitter (Tx) 745 and a receiver (Rx) 747 for enabling node 602 to transmit data to and receive data from other nodes connected to a network 710 (e.g., an Internet Protocol (IP) network) to which network interface 748 is connected; and a local storage unit (a.k.a.,“data storage system”) 708, which may include one or more non-volatile storage devices and/or one or more volatile storage devices.
  • PC processing circuitry
  • P processors
  • P e.g., a general purpose microprocessor and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (F
  • CPP 741 includes a computer readable medium (CRM) 742 storing a computer program (CP) 743 comprising computer readable instructions (CRI) 744.
  • CRM 742 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like.
  • the CRI 744 of computer program 743 is configured such that when executed by PC 702, the CRI causes node 602 to perform steps and the embodiments described herein (e.g., steps described herein with reference to the flow charts).
  • node 602 may be configured to perform steps described herein without the need for code. That is, for example, PC 702 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

There are provided mechanisms for filtering of a sample. The method comprises obtaining a quantization parameter qp associated with said sample. The method comprises generating transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample. The method further comprises obtaining, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ≥ 2 pieces. The method comprises generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients. The method comprises obtaining a filtered version of said sample based on at least one of said transformed samples.

Description

HADAMARD PIECEWISE LINEAR APPROXIMATION
TECHNICAL FIELD
[001] Disclosed are embodiments related to video compression and filtering.
BACKGROUND
[002] Hadamard transform domain filter (HTDF) has been proposed in Stepin et al.
[1], as a filtering step in video encoding and decoding. HTDF is proposed to be applied to the reconstructed video data to reduce noise. It therefore occupies the same place in the video decoding chain as the bilateral filter described by Wennersten et al. [2] did in the joint exploration model, JEM, and it is proposed to replace, rather than be used in conjunction with, the bilateral filter.
[003] First, the HTDF uses the Hadamard transform to convert the pixels into the transform domain. To filter a pixel intensity value, also known as a sample value i0, the surrounding intensity valules it, i2 and i3 are also used:
Figure imgf000003_0002
[004] The Hadamard transform coefficients are then calculated as :
yO = iO + i2
yl = il + i3
y2 = iO - i2
y3 = il— i3
R0 = yO + yl (Equation 0)
Rl = yO - yl
R2 = y2 + y3
R3 = y2 - y3
Figure imgf000003_0001
[005] Then, the HTDF filters the transform coefficients. Finally, the HTDF transforms the filtered coefficients back to the pixel domain using an inverse Hadamard transform. It is shown in [1 ] that HTDF provides 0.50% of bitrate saving with increased complexity of 5% (encode) and 4% (decode) for random access compared to VTM 1.0.
[006] The implementation in [1] uses a look-up table (LUT) to store the filtering results. The LUT described in [1] filters a pixel according to the following equation: (Equation 1)
Figure imgf000004_0001
where (Equation 2)
Figure imgf000004_0002
Here, R(i) is the spectrum component of the Hadamard transform domain, i.e., R( 0) should be identified with R0 above, R(l) with Rl , etc., of Equation 0. In some instances, the threshold (“THR”) is set to 128, and s may be provided as one of the following:
a = 2(1+o-i26*(qp-27))^ (Equation 3) or s = 2 · 2.64 · 2°·12696?r-11). (Equation 4)
However, since using Equation 1 would change the sign for negative R(i) s, an improved version of Equation 1 can be provided as shown below:
Figure imgf000004_0004
[007] In addition to introducing the inner minus sign in— Lt/7(— K(ί), s), Equation 5 also changes the place where the threshold occurs, i.e., the top line uses Abs
Figure imgf000004_0003
THR rather than Abs (i?(i)) > THR as in Equation 1. The reason for this is that if THR is a power of two, such as THR=l28 as in [1], using Equation 1 would make it necessary to store 129 values (0, 1 , 2, ..., 128) in the LETT. Normally, it is desirable to have a power-of-two numbers of items in a LUT, and this can be achieved by using Abs (R( ) ³ THR . [008] The LUT in [1] has two dimensions, where one dimension corresponds to different qp s and the other dimension corresponds to different transform coefficient values. As an example, there may be one row of the LUT for each qp value, and each row may contain THR values mapping an unfiltered value to a filtered value. The qp values range from 18 to 63, and the filtering is applied to values in [0,127] Accordingly, the LUT may consist of 46 rows with 128 values in each row.
[009] For example, for qp = 37, the LUT-row used for filtering may be as follows:
LUT37 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9, 10, 10, 11, 12, 13, 14, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 82, 83, 84, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 102, 103, 104, 105, 106, 107, 108]
SUMMARY
[0010] In [1], the Hadamard LUT has a size of 46x128=5888 entries, where 46 is the number of different qp values and 128 is the number of different transform coefficient values. In [1] each LUT entry is of an int32 type, meaning that it requires four bytes. The total number of bytes needed for the LUT therefore becomes 23552 bytes. However, even if a single byte is used for each entry, this still amounts to 5888 bytes. For full custom ASIC implementations, many copies of the Hadamard filter may be needed in order to increase parallelism. Firstly, one needs to filter four coefficients per pixel, and this means four instantiations of the LUT if one wants to do this in parallel, meaning 23552 bytes. Furthermore, if eight pixels need to be filtered in parallel, this would amount to 23552*8 = 188416 bytes. This is costly to implement. It is therefore of interest to reduce the complexity of the filter in terms of LUT size.
[0011] Another problem is that the filter given in Equations 1 and 5 is discontinuous at the THR threshold point. For example, assuming a THR value of 128, this would mean that when R(i') < 128, the filtered coefficient is smaller than R(i). However, for R(i) > 128, the output is always R(Q. Exactly before the discontinuity, at R(i) = 127, we have F(i, s ) =
whereas right after the discontinuity, at R(i) = 128, we have
Figure imgf000005_0001
(127)3
F(i, s ) = R(i) = 128. For larger qp values, this gap (127)2 +s2 - 128 gets larger. This could introduce a discontinuity in the pixel domain and the effect of the discontinuity may be visible.
[0012] A third problem is the obstacle of an efficient software implementation. It is known to a person skilled in the art that, for code to be able to run efficiently on a CPU, it is important to make it compatible with single instruction, multiple data (“SIMD”) instructions that are available on modem CPUs. The reason is that a SIMD instruction can carry out many parallel operations in a single instruction. As an example, a regular CPU-instruction may be able to add two numbers together in one clock cycle. In contrast, a SIMD-instruction may be able to add eight numbers to eight other numbers in one clock cycle. This means that it may be possible to make the code run eight times faster.
[0013] SIMD instructions, however, are not particularly well suited for look-up table operations, at least not if the look-up table is large. If the look-up table is small enough to fit in a single SIMD register, then efficient LUT operations are possible. As an example, if the SIMD registers are 128 bits wide, then it is possible to fit 16 eight -bit numbers in a SIMD register. It may then be possible to do, say, eight parallel look-ups from this small look-up table. However, if the look-up table is 32 eight-bit numbers, two such SIMD operations may be needed. If the table length is 128 eight-bit numbers, eight such SIMD operations may be needed. But if we need eight instructions to carry out eight parallel look-ups, we may not have gained much compared to carrying out eight regular (i.e., non-SIMD) instructions, each of which can do a LUT from 128 numbers. It may therefore not be possible to speed up the code using SIMD operations.
[0014] Certain aspects of the present disclosure and their embodiments may provide solutions to the aforementioned problems. One aspect of the proposed solution is to have a significant reduction of items that have to be stored. Another one is a highly efficient SIMD implementation.
[0015] The proposed solutions disclosed herein for at least the problems noted above approximate the filtering equation that is currently implemented by a LUT in [ 1 ] by qp - dependent piecewise linear functions, such that the lookup table can be reduced significantly or even removed completely. [0016] According to a first aspect of the embodiments, there is provided a method for filtering of a sample. The method comprises obtaining a quantization parameter qp associated with said sample. The method comprises generating transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample. The method further comprises obtaining, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ³ 2 pieces. The method comprises generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients. The method comprises obtaining a filtered version of said sample based on at least one of said transformed samples.
[0017] According to a second aspect of the embodiments, there is provided a node
(encoder or decoder) for filtering of a sample. The node comprises processing means operable to obtain a quantization parameter qp associated with said sample. The node comprises processing means operable to generate transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample. The node comprises processing means operable to obtain, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ³ 2 pieces. The node comprises processing means operable to generate transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients. The node comprises processing means operable to obtain a filtered version of said sample based on at least one of said transformed samples.
[0018] According to a third aspect of the embodiments, there is provided a computer program, for filtering of a sample. The computer program comprises code means which, when run on a computer, causes the computer to obtain a quantization parameter qp associated with said sample. The computer program comprises code means which, when run on a computer, causes the computer to generate transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample. The computer program comprises code means which, when run on a computer, causes the computer to obtain, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ³ 2 pieces. The computer program comprises code means which, when run on a computer, causes the computer to generate transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients. The computer program comprises code means which, when run on a computer, causes the computer to obtain a filtered version of said sample based on at least one of said transformed samples.
[0019] According to a fourth aspect of the embodiments, there is provided a computer program product comprising computer readable means and a computer program according to the third aspect, stored on the computer readable means.
[0020] According to a fifth aspect of the embodiments, there is provided a carrier containing the computer program according to the fourth aspect. The carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
[0021] According to an embodiment, n = 2 and the piecewise linear function is given as
Figure imgf000008_0001
wherein kqp and mqp depend on qp.
[0022] Certain embodiments may provide one or more of the following technical advantage(s). One advantage of the proposed solution is a significant reduction of items that need to be stored. For example, when n =2, only two values need to be stored ( kqp and mqp), instead of 128. Accordingly, for qp ranging from 18 to 63, using piecewise linear function consisting of two linear pieces, the number of values needed to be stored is (63-l7)*2 = 92. In addition, if every kqp and mqp need 10 bits respectively, the total amount of bytes needed is 92*10/8=115 bytes, which is a reduction by (5888-115)/5888 = 98% as compared to [1]
Another advantage of the embodiments disclosed herein is that they allow the use of highly efficient SIMD implementations on CPUs.
[0023] Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and/or is implied from the context in which it is used. All references to a/an/the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and/or where it is implicit that a step must follow or precede another step. Any feature of any of the
embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Fikewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features, and advantages of the enclosed embodiments will be apparent from the following description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
[0025] FIG. 1 illustrates a chart showing LUT values for a quantization parameter value and an approximated piecewise linear function according to one embodiment.
[0026] FIG. 2 illustrates a chart showing FUT values for a quantization parameter value and an approximated piecewise linear function according to one embodiment.
[0027] FIG. 3 illustrates a chart showing FUT values for a quantization parameter value and an approximated piecewise linear function according to one embodiment.
[0028] FIG. 4 illustrates a chart showing an approximated piecewise linear function according to one embodiment.
[0029] FIG. 5 is a flow chart illustrating a process according to one embodiment.
[0030] FIG. 6 is a diagram showing functional units of a node according to one embodiment.
[0031] FIG. 7 is a block diagram of a node according to one embodiment.
DETAIFED DESCRIPTION
[0032] Throughout this current disclosure, the filtering of intensity values is described as an example. The filtering of intensity values normally refers to the Y in YCbCr. However, this is not required and the filtering disclosed herein can also be used for chroma values such as Cb and Cr, or any other components from other color spaces such as ICTCP, Lab, Y’u’v’, among other, in alternative embodiments.
[0033] The general goal of the current disclosure is to approximate the function: (Equation 7a)
Figure imgf000010_0001
which the LUT tabulates for the first 128 values. Equation 7a can also be written as: (Equation 7b)
Figure imgf000010_0002
Equation 7a and Equation 7b are used interchangeably throughout the current disclosure.
While the LETT only tabulates up to 128 values, the function M/(b(ί), s) can be calculated for arbitrary numbers of /?(/). For example, the function 14/ (R(i), s) can be calculated in the possible range for R(Q, which is [-4092, 4092] when dealing with lO-bit data that has been transformed using the Hadamard transform. Namely, based on Equation 0 provided above, a person of ordinary skill in the art will realize that if the intensities are in the range [0, 1023], then the largest possible number is 4092 and the smallest possible number is -4092.
[0034] In some embodiments, for values of R(i ) between 0 and 127, W(R(i , a are equal to LUT(R(i), a ) and there is no distinction between approximating M/(7?(ί), s) and approximating LUT(R(i), a ).
[0035] According to one aspect, a method for filtering of a sample is provided, as shown in FIG. 5. The method comprises a step Sl of obtaining a quantization parameter qp associated with said sample. As already mentioned, filtering of the samples may be performed for qp values ranging from 18 to 63. The method further comprises a step S2 of generating transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample. An example of such an area is given in paragraph [003] where the sample to be filtered is i0 and where the surrounding samples are it, i2 and i3.
Equation 0 shows how four transform coefficients R0-R3 are generated by applying a
Hadamard transform to an area of four samples i0- i3.
[0036] The method further comprises a step S3 of obtaining, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ³ 2 pieces. Obtaining a filtered transform coefficient from a transform coefficient x using a piecewise linear function y is equivalent to approximating a function M/(7?(ί), s) or, equivalently, W(x, s ) that is also often denoted /-function throughout the description. Thus, using a piecewise linear function y with n > 2 pieces on a transform coefficient x is supposed to approximate the function W pc). Therefore, the terms“approximating the function W(x )” and“obtaining a filtered transform coefficient from a transform coefficient x using a piecewise linear function y” are interchangeably used throughout the rest of the application.
[0037] The method comprises a step S4 of generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients. The method further comprises a step S5 of obtaining a filtered version of said sample based on at least one of said transformed samples. For example, a filtered version of said sample may be the corresponding transformed sample itself or it may be a combination of transformed samples surrounding said transformed sample.
[0038] According to some embodiments, the obtaining (S3) may be applied on the transform coefficients having an absolute value smaller than a threshold THR and wherein at least one piece of the piecewise linear function y has a slope different than zero. The value of THR may be a power of 2, for example 128, as will be described below. The value of threshold may be as high as the maximum value of a transform coefficient, for example 4092 in case of ten bits used for representation of samples. This basically means that filtered transform coefficients are obtained in step S3 for all the transform coefficients, regardless of their value.
[0039] The piecewise linear function y may be both continuous and non-continuous, as will be described below.
[0040] The current disclosure describes seven embodiments of a method for filtering a sample value. The first two embodiments use a continuous piecewise linear function with n=2 pieces to obtain filtered transform coefficients or, equivalently, to approximate the ML function. The next two embodiments use a piecewise linear function with n pieces to obtain filtered transform coefficients, where the piecewise linear function may be non-continuous. The following two embodiments also use the n piecewise linear functions, where the piecewise linear functions are connected at threshold points. The seventh embodiment is an efficient way to ensure that the piecewise linear function is continuous without spending excessive bits in the calculation. [0041] In a first embodiment, a two-piece piecewise linear function is used to approximate the W-function, i.e. to obtain filtered transform coefficients having values between 0 and THR-l = l27 (i.e., THR=l28). For transform coefficient values larger than 127 filtering is not applied. Accordingly, this indicates that the two-piece linear function is analogous to approximating the LUT. The LUT given in [1] can be expressed by LUT( , qp ), which depends on qp and the transform coefficient x. In the context of the current disclosure, the first variable for LUT, i.e., x, denotes the column and the second variable for LUT, i.e., qp, denotes the row, which is opposite to how it is indicated in, for instance, MATLAB. In some embodiments, the input x may correspond to luma or chroma coefficients.
[0042] A two-piece piecewise linear function with nonnegative values may be provided by yqp = ma {kqpx + mqp, O). This function is an example of a continuous function. Given the LUT, it is possible to find a linear function that fits the LUT for each qp value. To illustrate this, FIG. 1 depicts an example for LUT( ,37) with qp = 37 and an approximated linear function in using parameters k37 = 1.0194 and m37 =—24.2107 with yqp = max(kqpx +
[0043] In the first embodiment, the values of kqp and mqp for different qp s from 18 to
63 may be equal to:
kqp = [1.0326, 1.0354, 1.0352, 1.0418, 1.0469, 1.0509, 1.0535, 1.0595, 1.06, 1.0666, 1.0678,
I .0703, 1.0712, 1.0728, 1.0682, 1.0658, 1.0585, 1.0471, 1.036, 1.0194, 1.0002, 0.97052,
0.94113, 0.90866, 0.87668, 0.83445, 0.788, 0.73877, 0.68586, 0.64118, 0.58756, 0.5387, 0.4912, 0.44032, 0.3957, 0.35398, 0.31849, 0.28188, 0.24798, 0.21923, 0.19435, 0.16846, 0.14658, 0.12954, 0.1131, 0.09927], and
mqp = [-4.5179,-5.0774,-5.4391,-6.2224,-6.9851,-7.7923,-8.6039,-9.6161,-10.4341,-11.724,- 12.7417,-14.0196,-15.2738,-16.7861,-17.8652,-19.3854,-20.6147,-21.7579,-23.1179,- 24.2107,-25.3656,-25.6423,-26.2748,-26.7235,-27.5008,-27.282,-26.838,-26.1413,-25.1188,- 24.609,-23.2322,-22.1601,-20.9494,-19.2451,-17.8859,-16.5127,-15.5019,-14.0914,-12.7722,-
I I .6584,-10.7233,-9.4997,-8.536,-7.8398,-7.0491,-6.4188], respectively. [0044] In some embodiments, the kqp and mqp values have been obtained by minimizing the mean squared error between the nonzero elements in the LUT and the linear function. In some embodiments, the kqp and mqp values may be stored with a fixed point representation. As an example, for nine bits of fractional resolution for k, it would be sufficient to store k in steps of 1/512. Accordingly, 10 bits would be sufficient to cover the entire range, since the maximum number stored 1023 would represent 1023/512 = 1.9980 which is larger than all the k values in the list provided above. Likewise, if five bits of fractional resolution is sufficient for m, then the m values may be stored in steps of 1/32. Without counting the sign bit, 10 bits would then be sufficient to store— m since the largest value would be 27.5008 which is smaller than the largest representable value 1023/32 = 31.96875.
[0045] While the THR is set as 128 in the above description of the first embodiment, this is not required; the THR value may be set to different values in different embodiments. If a better result can be obtained at THR = 256 or THR = 53, the THR value may be set accordingly. There is a tradeoff, however, as a higher THR will reduce the problem with discontinuity, but at the same time the approximated function W will not be as accurate for values lower than the higher THR.
[0046] Another aspect of the current disclosure is that the THR does not need to be the same for every qp value. Namely, it may be advantageous to have different values of THR for different qp values according to some embodiments. For example, for low qp values, such as qp = 18, the s is very small and hence the function W ( x ) is close to a straight line W ( ) « x, especially for large values of x. Hence for qp =18 it may make sense to use a small value of THR such as THR = 32. However, for high qp values, such as qp =63, the difference between W(x ) and x will be big unless x is large. In this case it may make sense to use a larger THR, such as THR = 2048.
[0047] In a second embodiment, a two-piece piecewise linear function yqp =
Figure imgf000013_0001
+ mqp, 0) is used for all values of x, not only for the values up to THR as described above in the first embodiment. This has the advantage that there is no discontinuity at x =
THR (i.e., x = 128), so problems that arise due to discontinuity are avoided. This is equivalent of setting THR = 4093 in the previous embodiment. [0048] In a third embodiment, there is provided a variant of embodiment 1 , where a piecewise linear function with n pieces is used for x up to THR— 1, such as THR— 1 = 127. Since x is between 0 and 127, the range of input x is divided into n bins with each bin covering a range of 128 /n. A linear function is used to approximate each bin. The number of bins may be small enough such that efficient SIMD operations can be used. As an example, if the SIMD architecture allows for look-up from a 128-bit register, and linear values k and m are stored using 8 bits each, it may be good to use n = 16 i.e., using 8 different pieces in the piecewise linear function. This way one SIMD operation can be used to obtain k and another SIMD operation can be used to obtain m. Alternatively, n = 8 may be used and both k and m values may be obtained in a single SIMD operation. In some embodiments, the number of bins can be chosen as needed. In some embodiments, the range of x does not have to be limited to [0, 127] The third embodiment can be applied to x with a larger range, e.g., [0, 1023] if needed. For the purpose of explanation and the sake of simplicity, x is assumed to be between 0 and 127 in the following description.
[0049] The thresholds of each of the n bins may be denoted as to, ti, t2, ... tn.
Accordingly, [to, ti, t2,..., tn] = [0, 128/h, 2* 128/h, ..., n* l28/n]. For a given qp value, a linear function is used to approximate the LUT within each bin, as shown below in Equation 9:
Figure imgf000014_0002
Thus, for each bin, a pair of kqp bin and mqp bin values is needed. In some embodiments, a value of x e [tq, t2 ] may never cause kqp binix + mqP btn to be negative. In such embodiments, the max-operation may be omitted, thereby preserving computation resources. With the LUT(qp, x) in [1 ], we can compute that
Figure imgf000014_0001
Figure imgf000015_0001
for i = 0, ... n— 1. (Equation 10)
Accordingly,
Figure imgf000015_0002
(Equation 1 1).
[0050] The values of k and m can be stored into LUTk_ qp and LUTm_ qp,
bits
Figure imgf000015_0003
right, as shown below in Equations 10 and 11 :
(Equation 12)
(Equation 13)
Figure imgf000015_0004
[0051] For each qp value, 2 n values need to be stored. For qp values ranging from 18 to 63, (63— 17) * 2 n = 92 n values need to be stored. If n = 8, this gives 736 bytes that need to be stored, which is significantly smaller than 5888 bytes.
[0052] Taking n = 8 and qp =37 as an example, the values of LEiTk_^ and LETTm _qp may be LUTk_¾p = [0.066667,0.46667,0.86667,1 ,1.1333,1.1333,1.1333,1.0667] and LUTm_¾p = [0, -6.4667, -19.7333, -26, -34.5333, -34.6667, -34.8, -27.4667] The approximation is depicted in FIG. 2. In some embodiments, the approximated piecewise linear functions shown in FIG. 2 may have 8 (possible disconnected) piecewise linear functions.
[0053] A fourth embodiment is similar to the third embodiment, in that an n- piece piecewise linear function is used. However, in the fourth embodiment, the approximation is applied for the entire range of x. As described above, this range of x can be [-4092, 4092] for lO-bit values that have been transformed with the Hadamard transform. In some embodiments, the range of x may be [0, 4092] if the W (x) is symmetric.
[0054] While the third embodiment is similar to the second embodiment, the second embodiment may be discontinuous at the thresholds. Accordingly, a solution is provided as a fifth embodiment in which the piecewise linear function is continuous. In Equation 9, the value of y may differ for x<t; and x>ft. In the fifth embodiment, the equation is modified as shown below:
Figure imgf000016_0002
Thus, for ί = 0, ... , n— 2, y is provided as:
LUTjqp ,ti+1)- LUTjqp ,tj )
y = LUT(qp, t£) + (x - t£) (Equation 15) and the k and m value may be expressed as (Equation 16)
Figure imgf000016_0001
For ί = n— l, y, kqP bin and Tnqp,bin are provided by Equation 10 and Equation 11 , respectively.
[0055] Similarly, we store the values of k and m with LEiTk_qp and LETT _qp, respectively. Equation 12 and Equation 13 may be used to retrieve the LUT values.
[0056] Taking again n=8 and qp=37 as an example, the values of LUTk_qp and
LUT m_qp may be LUTk_qp = [0.0625,0.4375,0.875,1 ,1.125,1.125,1.125,1.0667], and
LUTm_qp = [0,-6,-20,-26,-34,-34,-34,-27.4667] The approximation is depicted in FIG. 3. In some embodiments, the approximated piecewise linear functions have 8 connected piecewise linear functions.
[0057] A sixth embodiment is provided similar to the fifth embodiment. In the sixth embodiment, the piecewise linear function with n pieces is used for the entire range of x. As described above, this range of x can be [-4092, 4092] for lO-bit values that have been transformed with the Hadamard transform.
[0058] In a seventh embodiment, there may be some cases where it may be
advantageous not to calculate the piecewise linear approximation according to y = kx + m, since it may require high resolution in fix point implementations. As an example, let us assume that the curve W(x) shown in FIG. 4 needs to be approximated using piecewise linear approximation. FIG. 4 illustrates approximating the curve W(x) (shown in solid lines) using a piecewise linear line segment (shown in dotted lines).
[0059] Assume, for simplicity, that the difference between xk and xk+1 is constant D, xk+ 1— xk = D. In some embodiments, this constraint may be relaxed, thereby allowing for denser line segments near zero, but for the purpose of explanation, the difference is assumed as constant.
[0060] The function W(x ) has been tabulated in the endpoints of the line segments.
Hence for x = x0, x1, x2, ... , xk, ... the function values are known as W(x0), 147 (x- , W(x2), ..., M7(k), ... etc.
[0061] Let us now assume that the approximate value W (x) needs to be calculated in the general position x. First, the index is calculated for the largest value xk that is smaller than x:
k = x div D (Equation 17)
where div performs division and rounding down. Also, xk = k * A.
[0062] In practice, it is advantageous to use a D that is a power of two, for example,
D = 16, since division can be simply replaced with a right shift. In the context of the current disclosure, D = 16 is used as an example, but this is not required and different values of D may be used for different qp values in alternative embodiments. In this instance, the calculation simplifies to
k = x » 4, (Equation 18)
xk = k « 4 (Equation 19)
where » denotes rightwards shift and « denotes leftwards shift. The value for 144 = 14 (x/f) and for VI4+1 = W ( + 1) may now be obtained by indexing a look-up table with k:
Wk = LUTw{k )
+1 = LUTw(k + 1) (Equation 20)
The difference between x and xk may also be calculated as: Ax = x— xk (Equation 21)
[0063] The approximate value W (x) may now be calculated as the value Wk plus Ax steps along the slope ( Wk+1— M4)/D: (Equation 22 a)
Figure imgf000018_0001
Equation 22a may also be rewritten as:
Figure imgf000018_0002
(Equation 22 b) .
Ax
[0064] For sufficient precision,— may be represented using steps of 1/16. This requires four fractional bits. This is multiplied by a difference Wk+1— Wk which in theory may be very large, but since W (x) is close to x this difference is always positive and should never be bigger than two times D, in this case 32. This means that only five bits are needed to represent the difference.
[0065] In practice the following operations may take place:
Figure imgf000018_0003
(Equation 23)
W(x) = Wk + diff (Equation 24).
Accordingly, there may be no need for any big multiplications. Furthermore, since Wk+1— Wk is small, it may be advantageous to store that in a separate FFTT for parallel fetching with Wk .
[0066] In summary, the entire calculation may be as the following:
k = x » 4
xk = k « 4
Ax = x— xk
Wk = LUTw(k )
Wk+ 1 Wk = LUT other (k )
Figure imgf000018_0004
W(x) = Wk + diff [0067] This seventh embodiment may be viewed as equivalent with the other embodiments described above because Equation 22a may be provided as:
Figure imgf000019_0001
and then further rewritten as:
Figure imgf000019_0002
where Wk+ Wk may be identified as the slope k value and Wk
Figure imgf000019_0003
xk could be identified as the m value. However, calculating Wk+1
Figure imgf000019_0004
Wk with enough precision may require 5 integer bits (to hold 32) and four fractional bits (to represent steps of 1/16), or 9 bits in total. This would be multiplied by x which would be a 12 bit number. Hence a 9 times 12 bit multiplication would be required, which is much more than the 5 times 4 bit multiplication described in the other embodiments above. Hence, the seventh embodiment is less costly in this regard. This seventh embodiment may be used to approximate the LUT up to THR or it can be used to approximate the entire function W ( ).
[0068] One advantage of the proposed solution is a significant reduction of items that need to be stored. For example, when n =2, only two values need to be stored ( kqp and mqp), instead of 128. Accordingly, for qp ranging from 18 to 63, using piecewise linear function consisting of two linear pieces, the number of values needed to be stored is (63-l7)*2 = 92. In addition, if every kqp and mqp need 10 bits respectively, the total amount of bytes needed is 92*10/8=115 bytes, which is a reduction by (5888-115)/5888 = 98% as compared to [1] Another advantage of the embodiments disclosed herein is that they allow the use of highly efficient SIMD implementations on CPUs.
[0069] Another advantage of the embodiments disclosed herein is that they allow the use of highly efficient SIMD implementations on CPUs. SIMD operations allow the execution of several operations simultaneously on a modern CPU. As an example, if a normal machine code instruction can add two numbers to each other, a SIMD operation can add eight numbers to eight other numbers in parallel. This can improve performance considerably. There are SIMD operations for performing table look-ups. However, such SIMD operations need the entire LUT to be stored in a single SIMD register. Such registers are typically of the size of 128 bits. If 8-bit values are used, this means the largest number of items that such an operation can handle is 128/8 = 16 items. The LUT37 array described as an example above has 128 items, and it would therefore be too big to implement using a single SIMD operation on current hardware. In contrast, it is easy to execute arithmetic operations used in Equation 2, for example, using SIMD instructions.
[0070] More specifically, while the k37 and m37 would be obtained from a LUT, this happens once per block and is therefore not in the inner loop where SIMD optimization is necessary. The inner loop instead contains several executions of Equation 6, each execution including a multiplication and addition followed by a max operation. On many modem CPUs it is possible to perform multiplication followed by addition in parallel, meaning that, say eight, parallel computations of Equation 6 can be carried out in just two instructions; one for the multiply and add and one for the max-operation.
[0071] FIG. 6 is a diagram showing functional units of a node 602 for filtering of a sample according to one embodiment. Node 602 may for example be an encoder. Alternatively, node 602 may be a decoder.
[0072] Node 602 includes an obtaining unit 604 for obtaining a quantization parameter qp associated with said sample. Node 602 includes a generating unit 606 for generating transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample. Node 602 further includes an obtaining unit 608 for obtaining, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ³ 2 pieces. Node 602 further includes a generating unit 610 for generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients. Node 602 includes a generating unit 610 for generating transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients. Node 602 includes an obtaining unit 612 for obtaining a filtered version of said sample based on at least one of said transformed samples. [0073] FIG. 7 is a block diagram of a node 602 for filtering of a sample according to one embodiment. Node 602 may for example be an encoder. Alternatively, node 602 may be a decoder.
[0074] As shown in FIG. 7, node 602 may comprise: processing circuitry (PC) 702, which may include one or more processors (P) 755 (e.g., a general purpose microprocessor and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like); a network interface 748 comprising a transmitter (Tx) 745 and a receiver (Rx) 747 for enabling node 602 to transmit data to and receive data from other nodes connected to a network 710 (e.g., an Internet Protocol (IP) network) to which network interface 748 is connected; and a local storage unit (a.k.a.,“data storage system”) 708, which may include one or more non-volatile storage devices and/or one or more volatile storage devices. In embodiments where PC 702 includes a programmable processor, a computer program product (CPP) 741 may be provided. CPP 741 includes a computer readable medium (CRM) 742 storing a computer program (CP) 743 comprising computer readable instructions (CRI) 744. CRM 742 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like. In some embodiments, the CRI 744 of computer program 743 is configured such that when executed by PC 702, the CRI causes node 602 to perform steps and the embodiments described herein (e.g., steps described herein with reference to the flow charts). In other embodiments, node 602 may be configured to perform steps described herein without the need for code. That is, for example, PC 702 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software.
[0075] While various embodiments of the present disclosure are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context. [0076] Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.
REFERENCES
[1] V. Stepin et al.:“CE2 related: Hadamard Transform Domain Filter”, Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, July 2018, document JVET-K0068-v3.
[2] J. Strom et al.:“CE2 related: Reduced complexity bilateral filter”, Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, July 2018, document: JVET-K0274-v4.

Claims

1. A method for filtering of a sample, the method comprising:
obtaining (Sl) a quantization parameter qp associated with said sample;
generating (S2) transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample;
obtaining (S3), based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ³ 2 pieces; generating (S4) transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients; and
obtaining (S5) a filtered version of said sample based on at least one of said transformed samples.
2. The method of claim 1 , wherein the obtaining (S3) is applied on the transform coefficients having an absolute value smaller than a threshold THR and wherein at least one piece of the piecewise linear function y has a slope different than zero .
3. The method of any of claims 1-2, wherein the piecewise linear function y is continuous.
4. The method of claim 3, wherein n = 2 and the piecewise linear function is given as
Figure imgf000023_0001
wherein kqp and mqp depend on qp.
5. The method of claim 4, wherein the obtaining (S3) is applied on the transform coefficients in the entire range of their values.
6. The method of claim 4, wherein the threshold THR = 128.
7. The method of any of claims 1-6, wherein the method is performed by an encoder.
8. The method of any of claims 1-7, wherein the method is performed by a decoder.
9. A node (602) for filtering of a sample, the node configured to:
obtain a quantization parameter qp associated with said sample;
generate transform coefficients by applying a Hadamard transform to an area comprising said sample and at least one sample surrounding said sample;
obtain, based on qp, a filtered transform coefficient from a transform coefficient x using a piecewise linear function y with n ³ 2 pieces;
generate transformed samples by applying an inverse Hadamard transform on the filtered transform coefficients; and
obtain a filtered version of said sample based on at least one of said transformed samples.
10. The node (602) of claim 9, wherein the node is configured to obtain filtered transform coefficients from the transform coefficients having an absolute value smaller than a threshold THR and wherein at least one piece of the piecewise linear function y has a slope different than zero.
11. The node (602) of any of claims 9-10, wherein the piecewise linear function y is continuous.
12. The node (602) of claim 11 , wherein n = 2 and the piecewise linear function is given
Figure imgf000024_0001
wherein kqp and mqp depend on qp.
13. The node (602) of claim 12, wherein the node is configured to obtain filtered transform coefficients in the entire range of the values of the transform coefficients.
14. The node (602) of claim 12 wherein the threshold THR = 128.
15. The node (602) of any of claims 9-14, wherein the node is an encoder.
16. The node (602) of any of claims 9-14, wherein the node is a decoder.
17. A computer program comprising instructions which when executed by processing circuity of a node causes the node to perform the method of any one of embodiments 9-14.
18. A carrier containing the computer program of claim 17, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium.
PCT/EP2019/074207 2018-09-13 2019-09-11 Hadamard piecewise linear approximation WO2020053262A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862730877P 2018-09-13 2018-09-13
US62/730,877 2018-09-13

Publications (1)

Publication Number Publication Date
WO2020053262A1 true WO2020053262A1 (en) 2020-03-19

Family

ID=68084759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/074207 WO2020053262A1 (en) 2018-09-13 2019-09-11 Hadamard piecewise linear approximation

Country Status (1)

Country Link
WO (1) WO2020053262A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07250262A (en) * 1994-03-08 1995-09-26 Sharp Corp Noise reducing device for image signal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07250262A (en) * 1994-03-08 1995-09-26 Sharp Corp Noise reducing device for image signal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HIROTA TORU: "Machine translation of JPH07250262", 26 September 1995 (1995-09-26), XP055650300, Retrieved from the Internet <URL:www.epo.org> [retrieved on 20191206] *
J. STROM ET AL.: "CE2 related: Reduced complexity bilateral filter", JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, July 2018 (2018-07-01)
V. STEPIN ET AL.: "CE2 related: Hadamard Transform Domain Filter", JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, July 2018 (2018-07-01)
VICTOR STEPIN ET AL: "CE2 related: Hadamard Transform Domain Filter", 11TH MEETING: LJUBLJANA, SI, 10-18 JULY 2018, 10 July 2018 (2018-07-10), pages 1 - 5, XP055650161, Retrieved from the Internet <URL:http://phenix.it-sudparis.eu/jvet/doc_end_user/documents/11_Ljubljana/wg11/JVET-K0068-v4.zip> [retrieved on 20191205] *

Similar Documents

Publication Publication Date Title
JP7273916B2 (en) Method and apparatus for deblocking filtering of pixel blocks
US11431982B2 (en) Video decoder with reduced dynamic range transform with inverse transform shifting memory
WO2014107263A1 (en) Method and apparatus of reducing random noise in digital video streams
US10880558B2 (en) Efficient LUT implementation of luma adjustment
JP2022130642A (en) Adaptive Bilateral (BL) Filtering for Computer Vision
WO2021093582A1 (en) Risc-v vector extension instruction-based encoding processing method and device, and storage medium
Nnolim Design and implementation of novel, fast, pipelined HSI2RGB and log-hybrid RGB2HSI colour converter architectures for image enhancement
WO2020053262A1 (en) Hadamard piecewise linear approximation
US20230023387A1 (en) Low complexity image filter
US9854242B2 (en) Video decoder with reduced dynamic range transform with inverse transform clipping
EP4074034A1 (en) Adaptive loop filtering
WO2020043710A1 (en) Filtering of image data
AU2021203402B2 (en) Video decoder with reduced dynamic range transform with inverse transform shifting memory
WO2020007748A1 (en) Bilateral filter with lut avoiding unnecessary multiplication and minimizing the lut
US20120183048A1 (en) Video decoder with reduced dynamic range transform with multiple clipping
US20120183044A1 (en) Video decoder with reduced dynamic range transform with memory storing
Albarahany et al. Modern Digital Signal Processing in Reference to Image Compression
JP2003264704A (en) Image decoder, image decoding program, image encoder and image encoding program
아니쉬 Approximate Calculation of DCT for HEVC and JPEG Hardware Encoders
KR20000048137A (en) Receiver, device, and method for digital filtering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19779373

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19779373

Country of ref document: EP

Kind code of ref document: A1