GB2323493A

GB2323493A - Quantisation for compression systems

Info

Publication number: GB2323493A
Application number: GB9703831A
Authority: GB
Inventors: Oliver Hartwig Werner
Original assignee: British Broadcasting Corp
Current assignee: British Broadcasting Corp
Priority date: 1997-02-25
Filing date: 1997-02-25
Publication date: 1998-09-23
Also published as: GB9703831D0

Abstract

In compression encoding of a digital signal, such as MPEG 2, transform coefficients are quantised with the lower bound of each interval being controlled by a parameter #. In the MPEG 2 reference coder, for example, # = 0.75. Because the quantised coefficients are variable length coded, improved quality or reduced bit rates can be achieved by controlling # so as to vary dynamically the bound of each interval with respect to the associated representation level. The parameter # can vary with coefficient amplitude, with frequency, with quantisation step size. In a transcoding operation, # can also vary with parameters in the initial coding operation.

Description

QUANTISATION FOR COMPRESSION SYSTEMS This invention relates to compression.

Known video compression techniques such as MPEG-2 (ISO/IEC 13818-2:1996) do not allow a perfect reconstruction of the original signal; this is due to the presence of quantisation which maps a range of original amplitudes onto the same representation level. The quantisation process is therefore irreversible. MPEG-2 (in common with other compression standards such as MPEG-1 (ISO/IEC 111172-2:1993), JPEG (ISO/IEC 10918), CCITT/ITU-T Rec.H.261 and ITU-T Rec.H.263) defines representation levels and leaves undefined the manner in which the original amplitudes are mapped onto a given set of representation levels.

It is an object of one aspect of the present invention to provide an improved quantiser and quantisation method.

Accordingly, the present invention consists in one aspect in a method for partitioning the amplitude range of values to be quantised into a set of adjacent intervals whereby each interval is mapped onto a representation level such that the lower bound of each interval is controlled by a parameter lambda X that varies dynamically.

The invention will now be described by way of example with reference to the accompanying figure which is a diagram illustrating the relationships between representation levels, decision levels and the value of A, In the mentioned compression standards, the original amplitude x results from a discrete cosine transform (DCT) and is thus related to a horizontal frequency index tho, and a vertical frequency index offer Whilst this approach is taken as an example in what follows, the invention is not restricted in this regard.

In general, a quantiser describes a mapping from an original amplitude x of frequencies fher and fw onto an amplitude y = Q(x). The mapping performed by the quantiser is fully determined by the set of representation levels W and by the corresponding decision levels ld} as illustrated in the figure. All original amplitudes in the range s x < d(41) are mapped onto the same representation level y = Q(x) = rr As can be seen from the accompanying figure, consecutive decision levels are related by the quantisation step size q: and for a given representation level r1, the corresponding decision level is dy, = d, + q (1) calculated as: di = ri - #/2 . q (2) The quantiser is fully specified by the quantisation step-size q and the parameter # for a given set of representation levels {ri}. Therefore, a quantiser that complies with equations (1) and (2) can be referred to as a (q,) quantiser. Note that in general q and # can each be a function of the amplitude to be quantised. This more general quantiser is also referred to as a (q, B) quantiser.

Currently proposed quantisers, as described in the reference coders for the H.261, H.263, MPEG-1 and MPEG-2 standards, all apply a special type of (q,) quantiser in that a fixed value of X is used: for example X = 0.75 in the MPEG-2 reference coder or # = 1.0 in the MPEG-1 reference coder for quantisation of intra-DCT-coefficients.

According to one aspect of this invention, X is not constant but is a function that depends on the horizontal frequency index f,0" the vertical frequency index fver the quantisation step-size q and the amplitude x: = #(fhor, fver, q, x) (3) Examples of ways in which the function may usefully be derived to improve picture quality in video compression at a given bit-rate - or to reduce the required bit-rate at a given picture quality - will be set out below.

The invention extends also to the case of transcoding when a first generation amplitude y1 = 1(x) is mapped onto a second generation amplitude y2 = Q2(y1) to further reduce the bit-rate from the first to the second generation without having access to the original amplitude x. In this case the first generation quantiser O, and the second generation quantiser Q2are described as a (q1,A1)-type quantiser and a (q2,#2)-type quantiser, respectively. The second generation #2 value is described as a function: #2 = #2(fhor, fver, q1, #1, q2, #2,ref y1) (4) The parameter #2,ref that appears in eq. (4) is applied in a reference (q2,#2,ref)-type quantiser. This reference quantiser bypasses the first generation and directly maps an original amplitude x onto a second generation reference amplitude y2,ref = Q2,ref (X).

The functional relationship of eq. (4) can be used to minimise the error (Y2 - yzxr) or the error (Y2 - x). In the first case, the resulting second generation quantiser may be called a maximum a-posteriori (MAP) quantiser. In the second case, the resulting second generation quantiser may be called a mean squared error (MSE) quantiser. Examples of the second generation q2,#2,MAP)-type and (q2,#2,MSE)-type quantisers are given below. For a more detailed explanation of the theoretical background, reference is directed to the Appendix hereto.

More detailed consideration will next be given to the single stage coding case.

A class of MPEG-2 compatible quantisers is introduced in the Appendix for intra frame coding. Ncn-negative original dct-coefficients x are mapped onto the representation levels as y = Q(x) = [x/q + #/2] . q (5) The floor function lal extracts the integer part of the given argument a Negative values are mirrored: y = -Q (|x|) (6) The amplitude range of the quantisation step-size q in eq. (1) is standardised; q has to be transmitted as side information in every MPEG-2 bit stream. This does not hold for the parameter X in eq. (1). This parameter is not needed for reconstructing the dct-coefficients from the bit stream, and is therefore not transmitted. However, the value controls the mapping of the original dct-coefflcients x onto the given set of representation levels =1. q (7) According to eq. (1), the (positive) x-axis is partitioned by the decision levels di = (I - #/2) . q / = 1, 2 (8) Each x E [d1, di+1 ) is mapped onto the representation level y = r,.

As a special case, the interval [0, d1 ) is mapped onto y = 0.

The parameter # can be adjusted for each quantisation step-size q, resulting in a distortion rate optimised quantisation: the mean-squared-error D = E [(x - y)] (9) is minimised under a bit rate constraint imposed on the coefficients y. In order to simplify the analysis, the first order source entropy

of the coefficients y instead of the MPEG-2 codeword table is taken to calculate the bit rate. It has been verified in the Appendix that the entropy H can be used to derive a reliable estimate for the no. of bits that result from the MPEG-2 codeword table. In eq. (10), P, denotes the probability for the occurrence of the coefficient y=rP The above constrained minimisation problem can be solved by applying the Lagrange multiplier method, introducing the Lagrange multiplier . One then gets the basic equation to calculate the quantisation parameter k: #D #H + . = 0 (11) ## ## Note, that the solution for # that one obtains from eq. (11) depends on the value of ii. The value of p is determined by the bit rate constraint H # H0 (12) where Ho specifies the maximum allowed bit rate for encoding the coefficients y. In general, the amplitude range of the Lagrange multiplier is 0 # # #. In the special case of H0 # #, one obtains # 0. Conversely for H0 @ 0, one obtains in general ll p It is shown in the Appendix that the Laplacian probability density function (pdf) is an appropriate model for describing the statistical distribution of the amplitudes of the original dct-coefflcients. This model is now applied to evaluate analytically eq. (11). One then obtains a distortion-rate optimised quantiser characteristic by inserting the resulting value for # in eq. (5).

Due to the symmetric quantiser characteristic for positive and negative amplitudes in eqs. (5) and (6), we introduce a pdf p for describing the distribution of the absolute original amplitudes I x | . The probability P0 for the occurrence of the coefficient y = 0 can then be specified as

Similarly, the probability P, for the coefficient lyl = Z z q becomes

With eqs. (13) and (14), the partial derivative of the entropy H of eq. (10) can be written after a straightforward calculation as

From eq. (9) one can first deduce

and further from eq. (16)

It can be seen from eq. (17) that #D #0 if 0###1 (18) Thus, when # is increased from zero to one, the resulting distortion D is monotonically decreasing until the minimum value is reached for # = 1.

The latter is the solution to the unconstrained minimisation of the meansquared-error, however, the resulting entropy H will in general not fulfil the bit rate constraint of eq. (12).

Under the assumption of P, 2 Ph, in eq. (15), we see that aH(aX 2 0.

Thus, there is a monotonic behaviour: when X is increased from zero to one, the resulting distortion D monotonically decreases, at the same time the resulting entropy H montonically increases. Immediately, an iterative algorithm can be derived from this monotonic behaviour. The parameter # is initially set to # = 1, and the resulting entropy H is computed. If H is larger than the target bit rate H0, the value of # is decreased in further iteration steps until the bit rate constraint, eq. (12), is fulfilled. While this iterative procedure forms the basis of a simplified distortion-rate method proposed for transcoding of I-frames, we continue to derive an analytical solution for #.

Eqs. (15) and (17) can be evaluated for the Laplacian model pdf proposed in the Appendix, p(A) = . α . e-ax if x # di = (1 - 2)-q (19) After inserting the model pdf of eq. (19) in eqs. (15) and (17), it can be shown that the basic equation (11) leads then to the analytical solution for #,

# = 1 - . #h(z) + (1 - z). log2# P0 ## (20) q 1 - P0 with z = e-αq and the 'z'-entropy "(2) = -zIog2 z - (1-2) . log2(1 - 2) (21) Eq. (20) provides only an implicit solution for #, as the probability P0 on the right hand side depends on X according to eq. (13). In general, the value of P0 can be determined only for known # by applying the quantiser characteristic of eqs. (5) and (6) and counting the relative frequency of the event y = 0.

However, eq. (20) is a fixed-point equation for # which becomes more obvious if the right hand side is described by the function

P0 g(#)=1- . h(z) + (1-z) . log2 (22) q 1 - P0 resulting in the classical fixed-point form X = g(X). Thus, it follows from the famous fixed point theorem of Stefan Banach that the solution for X can be found by an iterative procedure with = g(#j) (23) in the (j + 1)-th iteration step. The iteration of (23) converges towards the solution for an arbitrary initial value X0 if the function g is 'self-contracting', i.e. Lipschitz-continuous with a Lipschitz-constant smaller than one. As an application of the mean theorem for the differential calculus, it is not difficult to prove that g is always 'self-contracting' if the absolute value of the partial derivative is less than one. This yields the convergence condition

#g 1 α 1 > = . . (1 - z) . (24) ## 2 . ln(2) q P0 A distortion-rate optimised quantisation method is now derived based on the results obtained above. As an example, a technique is outlined for quantising the AC-coefficients of MPEG-2 intra frames. It is straightforward to modify this technique for quantising the dct-coefficients of MPEG-2 inter frames, i.e. P- and B-frames.

Firstly, one has to take into account that the 63 AC-coefficients of an 8x8 dct-block do not share the same distribution. Thus, an individual Laplacian model pdf according to eq. (19) with parameter a1 is assigned to each AC-frequency index i. This results in an individual quantiser characteristic according to eqs. (5) and (6) with parameter 4 Furthermore, the quantisation step-size qi depends on the visual weight wi and a frequency-independent qscale parameter as qi = wi . qscale (25) 16 For a given step-size q@ the quantisation results in a distortion D,A) and a bit rate H,) for the AC-coefficients of the same frequency index i.

As the dct is an orthogonal transform, and as the distortion is measured by the mean-squared-error, the resulting distortion D in the spatial (sample/pixel) domain can be written as with some positive normalising constant c.

Similarly, the total bit rate H becomes

For a distortion rate optimised quantisation, the 63 parameters X, have to be adjusted such that the cost function D+ .H . H (28) is minimised. The non-negative Lagrange multiplier p is determined by the bit rate constraint Hs H (29) Additionally, the qscale parameter can be changed to meet the bit rate constraint of eq. (25). In principle, the visual weights w, offer another degree of freedom but for simplicity we assume a fixed weighting matrix as in the MPEG-2 reference decoder. This results in the following distortion rate optimised quantisation technique which can be stated in a 'C'- language-like form: /* Begin of quantising the AC-coefficients in MPEG-2 intra frames*/ Dmin= for (qscale = qmin; qscale # qmax; qscale = qscale + 2)/* linear qscale table*/ p = 0; do { Step 1: determine #1,#2, ..., #63 by minimising D + . H; Step 2: calculate H = # Hi(#i); = + #; /*# to be selected appropriately*/ }while (H > H0); Step 3: calculate D = c I Di (#i); if (D < Dmin){ qscaleopt = qscale; for (i = 1 j # 63; i = i + 1) #i,opt = #j; Dmin = Di} } for (i = 1; i # 63; i = i + 1) { Wi . qscaleopt q@@@ = 16 quantise all AC-coefficients of frequency-index i by |x| #i,opt y = Qi(x) = + . qi,opt . sgn (x) qi,opt 2 /*End of quantising the AC-coefficients in MPEG-2 intra frames*/ There are several options for performing Step 1 - Step 3: 1. Options for performing Step 1 The parameters B" , #2, ..., #63 can be determined a) analytically by applying eqs. (20)-(23). b) iteratively by dynamic programming of D + p H, where either of the options described in the next points can be used to calculate D and H.

2. Options for performing Step 2 H = # Hi (#i) can be calculated a) by applying the Laplacian model pdf, resulting in

where h(P0,) and h(Z). are the entropies as defined in eq. (21) of P0, (eq.

(13)) and Zi = e-αi q,, respectively. Note that P0,i in eq.(32) can be determined by counting for each dct-frequency index withe relative frequency of the zero-amplitude y = Qj(x) = 0. Interestingly, eq. (32) shows that the impact of the quantisation parameters #i on the resulting bit rate H only consists in controlling the zero-amplitude probabilities P0,i b) from a histogram of the original dct-coefficients, resulting with eqs. (10), (13) and (14) in [1 66e

c) by applying the MPEG-2 codeword table 3. Options for performing Step 3 D = c Z D, (hi) can be calculated a) by applying the Laplacian model pdf of eq. (19) and evaluating eq. (16). b) by calculating D = E [ (x - 021 directly from a histogram of the original dct-coefficients x.

Depending on which options are chosen for Step 1 - Step 3, the proposed method results in a single pass encoding scheme if the Laplacian model pdf is chosen or in a multi pass scheme if the MPEG-2 codeword table is chosen. Furthermore, the method can be applied on a frame, macro block or on a 8x8-block basis, and the options can be chosen appropriately. The latter is of particular interest for any rate control scheme that sets the target bit rate H0 either locally on a macroblock basis or globally on a frame basis.

Furthermore, we note that the proposed method skips automatically high-frequency dct-coefficients if this is the best option in the rate-distortion sense. This is indicated if the final quantisation parameter #i,opt has a value close to one for low-frequency indices i but a small value, e.g. zero, for high-frequency indices.

A distortion-rate optimised quantisation method for MPEG-2 compatible coding has been described, with several options for an implementation. The invention can immediately be applied to standalone (first generation) coding. In particular, the results help designing a sophisticated rate control scheme.

The quantiser characteristic of eqs. (5) and (6) can be generalised to y = Q(x) = r(x) + # #### + #### . q(x) (34) for non-negative amplitudes x. The floor-function LaJ in eq. (34) returns the integer part of the argument a. Negative amplitudes are mirrored, y = -Q (|x|) (35) The generalisation is reflected by the amplitude dependent values B(x), q(x), r(x) in eq. (34). For a given set of representation levels ... < r" < r, < rl+1 < ... and a given amplitude x, the pair of consecutive representation levels is selected that fulfils rl-1 # x < rl (36) The value of the local representation level is then set to = rl-1 (37) The value of the local quantisation step-size results from q(x) = ql = rl - rl-1 (38) A straightforward extension of the rate-distortion concept detailed above yields for the local lambda parameter, very similar to eq. (20), Pl-1 #(x) = #l = 1 - . log2 (39) qi Pl L = 1, ......, L) Similar to eqs. (13), (14), the probabilities in eq. (39) depend on the lambda parameters,

and

Therefore, eq. (39) represents a system of non-linear equations for determining the lambda parameters #1, ..., #L, In general, this system can only be solved numerically.

However, eq. (39) can be simplified if the term log2(Pl-1/Pl) is interpreted as the difference Il - ll-1 = log2#Pl-1# (42) Pl of optimum codeword lengths = -log2Pl il-1 = -log2Pl-1 (43) associated with the representation levels r,, ri-1, Instead of using variable codeword lengths that depend on the current probabilities according to eq. (43), a fixed table of variable codeword lengths CO, ..., CL can be applied to simplify the algorithm. The values of C0,..., C, can be determined in advance by designing a single variable length code, ie. a Huffman code, for a set of training signals and bit rates. Then, eq. (39) changes to #(x) = #l = 1 - ( Cl - Cl-1) ,(l = 1, ..., L) (44) ql The resulting distortion-rate optimised quantisation algorithm is essentially the same as detailed previously except that the lambda parameters are calculated either from eq. (39) or eq. (44) for each pair of horizontal and vertical frequency indices.

Additionally, the invention is applicable to transcoding and switching.

The question will now be addressed of a two stage-quantiser. This problem is addressed in detail in the Appendix which sets out the theory of so-called maximum a-posteriori (MAP) and the mean squared error (MSE) quantisers. By way of further exemplification there will now be described an implementation of the MAP and MSE quantiser for transcoding of MPEG-2 [MPEG-2] intra AC-coefficients that result from an 8x8 discrete cosine transform (dct).

In this section the basic equations are recalled from the Appendix. A class of MPEG-2 compatible quantisers has been introduced for intra-frame coding. Firstly, the quantisation step-size q, is calculated from the visual weight w1 and the qscale1-value, w1.qscale1 q1 = 16 (45) Secondly, non-negative original dct-coefficients x are mapped onto the first generation coefficients y, as x #1 y1 = Q1(x) = + . q1 (46) q1 2 The floor-function l...] extracts the integer part of the argument.

Thirdly, negative values are mirrored, Y1 = -Q1(|x|) (47) The class of the first generation quantisers y1 = Q(x) specified by these equations is spanned by the quantisation step-size q1 and the parameter #1 with the amplitude range 0##1#2. For convenient notation, such a quantiser is called (q1,#1)-type quantiser.

In the transcoder, the first generation coefficients y, are mapped onto the second generation coefficients y2 = Q2(yi) to further reduce the bit rate.

Under the assumption of a (q1,#1)-type quantiser in the first generation, eg.

MPEG-2 reference coder TM5 , it follows from the results set out in the Appendix that the MAP quantiser Q2,map p and the MSE quantiser Q2,mse can be implemented as a (q2,#2,map)-type and a (q2,#2,mse)-type quantiser, respectively. For both, the MAP and the MSE quantiser, the second generation step-size q2 is calculated from the second generation parameters w2 and qscale2.

However, there are different equations for calculating #2,map and )2,mse With the results of the Appendix, it follows that #2,map can be calculated as q1 #2,map = #2,ref + ( map - #1) . (48) q2 and #2,mse as #2,mse = = 1 + ( mse ) q1 (49) The parameter #2,ref can be changed in the range 0##2,ref#1 for adjusting the bit rate and the resulting signal-to-noise-ratio. This gives an additional degree for freedom for the MAP quantiser compared with the MSE quantiser. The parameter Pmep and the parameter Pm are calculated from the first generation quantisation step-size q, and a z-value, 2 2 map = In(zq1) . In#1 +zq1# (50) mse = - 2 . 1 - (1 - In (z)) . zq1 (51) In (zq1) 1 - zq1 where In(a) returns the natural logarithm of the argument a. The amplitude range of these values can be limited to the range 0# map, mse#2. Similarly, the amplitude range of the resulting values of eqs. (48) and (49) can be limited to 0##2,map#2,mse#2.

The z-value has a normalised amplitude range, ie. 0#z#1, and can be calculated either from the first generation dct-coefficients y, or from the original dct-coefficients x as described in the Appendix. In the latter case, the z-value is transmitted as additional side information, eg. user data, along with the first generation bit stream so that no additional calculation of z is required in the transcoder. Altematively, a default z-value may be used. An individual z-value is assigned to each pair of horizontal and vertical frequency indices. This results in 63 different values for the AC coefficients of an 8x8 dct. As a consequence of the frequency dependent zvalues, the parameters #2,map and #2,mse are also frequency dependent, resulting in 63 (q2,#2,map)-type quantisers and 63 (q2,#2,mse)-type quantisers, respectively. Additionally, there are different parameter sets for the luminance and the chrominance components. The default z-values for the luminance and chrominance components are shown in Table 1 and Table 2 respectively.

TABLE 1 Normalised z-values, eg. 256 x z, for luminance (default)

0 1 2 3 4 5 6 7 thor 0 | 252 | 250 | 247 | 244 | 239 | 233 | 232 1 251 T 249 T 247 T 244 | 240 T 235 T 233 l 231 2 249 247 245 242 238 236 234 233 3 247 245 243 240 236 235 234 233 4 246 242 239 237 235 233 232 235 5 242 238 235 234 230 229 231 233 6 237 231 226 225 222 222 226 231 7 222 211 210 205 202 208 214 222 f) TABLE 2 Normalised z-values, ie. 256 x z, for chrominance (default)

0 1 2 3 4 5 6 7 hcr 0 248 242 230 212 176 isa 179 1 246 240 233 219 193 154 156 177 2 239 233 224 209 180 148 154 173 3 229 221 211 196 163 141 150 166 4 219 208 198 181

Claims

CLAIMS 1. A method for partitioning the amplitude range of values to be quantised into a set of adjacent intervals whereby each interval is mapped onto a representation level such that the lower bound of each interval is controlled by a parameter lambda # that varies dynamically.
2. A quantisation method according to Claim 1, wherein the value to be quantised is a transform coefficient and X is a function of the quantity represented by the coefficient.
3. A quantisation method according to Claim 2, wherein the value to be quantised is a DCT coefficient and X is a function of horizontal and vertical frequency.
4. A quantisation method according to any one of the preceding claims, wherein # is a function of the quantisation step size.
5. A quantisation method according to any one of the preceding claims1 wherein # is a function of the value to be quantised.
6. A quantisation method according to any one of the preceding claims having quantisation step size q = q2 and a value of # = Az. in which the value to be quantised has previously been quantised using a quantisation step size q = q1 and a value of # = #1.
7. A quantisation method according to Claim 6, wherein # is a function of q, and h .
8. A quantisation method according to Claim. 6 or Claim 7, wherein # is a function of 4", where 4" is the value of X that would have been selected in a method according to Claim 1 operating with a quantisation step size q = upon the value prior to quantisation with the quantisation step size q = q1.
9. A quantisation method according to any one of the preceding claims, wherein the quantisation step size q is independent of the input value, otherwise than for the zero quantisation level.
10. A (q,) quantiser operating on a set of amplitudes xk representative of respective parameters fk in which # is dynamically controlled in dependence upon the values of hand andfk
11. A quantiser according to Claim 10, wherein the parameters fk are frequency indices.
12. A quantiser according to Claim 10 or Claim 11, in which # is dynamically controlled to minimise a cost function D + pH where D is a measure of the distortion introduced by the quantisation in the uncompressed domain and H is a measure of compressed bit rate.
13. In a compression transcoder, operating on a compressed signal quantised in first (q1,4-type quantiser, a second (q2,#2)-type quantiser in which the second generation value is controlled as a function: = #2(fhor, fver, q1, #1, q2, #2,ref, y1) where the parameter #2,ref represents a notional reference (q2,#2,ref)-type quantiser which bypasses the first generation coding and directly maps an original amplitude onto a second generation reference amplitude.