8x8 TRANSFORM AND QUANTIZATION
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority from provisional application No. 60/505,575, filed 09/24/2003 The following co-assigned pending patent application discloses related subject matter: provisional Appl.No.: 60/524,831 , filed 11/25/2003.
BACKGROUND OF THE INVENTION The present invention relates to digital image and video signal processing, and more particularly to block transformation and/or quantization plus inverse quantization and/or inverse transformation. Various applications for digital video communication and storage exist, and corresponding international standards have been and are continuing to be developed. Low bit rate communications, such as video telephony and conferencing, plus large video file compression, such as motion pictures, led to various video compression standards: H.261 , H.263, MPEG-1, MPEG-2, AVS, and so forth. These compression methods rely upon the discrete cosine transform (DCT) or an analogous transform plus quantization of transform coefficients to reduce the number of bits required to encode. DCT-based compression methods decompose a picture into macroblocks where each macroblock contains four 8x8 luminance blocks plus two 8x8 chrominance blocks, although other block sizes and transform variants could be used. Figure 2 depicts the functional blocks of DCT-based video encoding. In order to reduce the bit-rate, 8x8 DCT is used to convert the 8x8 blocks (luminance and chrominance) into the frequency domain. Then, the 8x8 blocks of DCT-coefficients are quantized, scanned into a 1-D sequence, and coded by using variable length coding (VLC). For predictive coding in which motion compensation (MC) is involved, inverse-quantization and IDCT are needed for the feedback loop. Except for MC, all the function blocks in Figure 2 operate on an 8x8 block basis. The rate-control unit in Figure 2 is responsible for generating
the quantization step (qp) in an allowed range and according to the target bit-rate and buffer-fullness to control the DCT-coefficients quantization unit. Indeed, a larger quantization step implies more vanishing and/or smaller quantized coefficients which means fewer and/or shorter codewords and consequent smaller bit rates and files. There are two kinds of coded macroblocks. An INTRA-coded macrobiock is coded independently of previous reference frames. In an INTER-coded macrobiock, the motion compensated prediction block from the previous reference frame is first generated for each block (of the current macrobiock), then the prediction error block (i.e. the difference block between current block and the prediction block) are encoded. For INTRA-coded macroblocks, the first (0,0) coefficient in an INTRA- coded 8x8 DCT block is called the DC coefficient, the rest of 63 DCT-coefficients in the block are AC coefficients; while for INTER-coded macroblocks, all 64 DCT- coefficients of an INTER-coded 8x8 DCT block are treated as AC coefficients. The DC coefficients may be quantized with a fixed value of the quantization step, whereas the AC coefficients have quantization steps adjusted according to the bit rate control which compares bit used so far in the encoding of a picture to the allocated number of bits to be used. Further, a quantization matrix (e.g., as in MPEG-4) allows for varying quantization steps among the DCT coefficients. In particular, the 8x8 two-dimensional DCT is defined as: r-./ (2x + l)uπ (2y + l)vπ F(u,v)
C0S
1 o
cos A 1, o, where fix,y) is the input 8x8 sample block and F(u,v) the output 8x8 transformed block where u,v,x,y = 0, , ..., 7; and C /-(u \), C n(v \) = \ ~J~Tz2 for w,v = 0 1 otherwise
The transform is performed in double precision, and the final transform coefficients are rounded to integer values. Next, define the quantization of the transform coefficients as
F(u,v) QF(u,v) = QP where QP is the quantization factor computed in double precision from the quantization step as: QP = 2
qp,β with the quantization step in a range: qp = 0,l, ..., 51. The quantized coefficients are rounded to integer values. Then the inverse quantization becomes: F(u,v) = QF(u,v)*QP with double precision values rounded to integer values. And the inverse transformation (reconstructed sample block) is:
again with double precision values rounded to integer values. The 32-bit AVS simplifies the double precision method by using integer transforms as follows. First, define an 8x8 integer transform matrix,
8x8, as
1 1 1 1 1 1 1 1 10 9 6 2 -2 -6 -9 -10 2 1 -1 -2 -2 -1 1 2 9 -2 -10 -6 6 10 2 -9 8*8 1 -1 -1 1 1 -1 -1 1 6 -10 2 9 -9 -2 10 -6 1 -2 2 -1 -1 2 -2 1 2 -6 9 -10 10 -9 6 -2
Now, Iet
8χ8 and F be the input 8x8 sample block and the output 8x8 transform- coefficients block, respectively. Thus the forward 8x8 integer transform is defined as:
where "x" denotes 8x8 matrix multiplication, and the 8x8 matrix T
xS is the transpose of the 8x8 matrix T
SxS . The quantization of the transformed coefficients proceeds as follows. First, let
8χ8 = {F
t/. i,j= 1, 2, ..., 7} denote the 8x8 DCT-coefficients block, and
let O
8x8 = {QFi/. i,j = l, 2, ..., 7} denote the quantized DCT-coefficients block, then the integer quantization is defined as
QFi = (Fij * AVS_Q_tabsfø%6][/*8+y] + α* 220+*p/6) » (21+ qp/6) if = 1, 2, ..., 7 where α has a value in the range 0.3 ~ 0.5 (and thus only rounds up from 0.75 ~ 0.85 to 1 rather than from 0.5 to 1), qp = 0, 1, 2, ..., 51 is the quantization step, and the tables AVS_Q_tabs are essentially six scaling matrices (one for each possible remainder of the integer division qp/6) and are defined as: int AVS_Q_tabs[6][64]={ {
98256, 13203, 62138, 13203, 98256, 13203, 62138, 13203, 13203, 1832, 8377, 1832, 13203, 1832, 8377, 1832, 62138, 8377, 39331, 8377, 62138, 8377, 39331, 8377, 13203, 1832, 8377, 1832, 13203, 1832, 8377, 1832, 98256, 13203, 62138, 13203, 98256, 13203, 62138, 13203, 13203, 1832, 8377, 1832, 13203, 1832, 8377, 1832, 62138, 8377, 39331, 8377, 62138, 8377, 39331, 8377, 13203, 1832, 8377, 1832, 13203, 1832, 8377, 1832,
}, {
87495, 11793, 55348, 11793, 87495, 11793, 55348, 11793, 11793, 1570, 7475, 1570,11793, 1570, 7475, 1570, 55348, 7475, 34975, 7475, 55348, 7475, 34975, 7475, 11793, 1570, 7475, 1570,11793, 1570, 7475, 1570, 87495, 11793, 55348, 11793, 87495, 11793, 55348, 11793, 11793, 1570, 7475, 1570,11793, 1570, 7475, 1570, 55348, 7475, 34975, 7475, 55348, 7475, 34975, 7475, 11793, 1570, 7475, 1570,11793, 1570, 7475, 1570,
{
77943, 10471, 49254, 10471, 77943, 10471, 49254, 10471, 10471, 1374, 6656, 1374, 10471, 1374, 6656, 1374, 49254, 6656,31213, 6656,49254, 6656,31213, 6656, 10471, 1374, 6656, 1374, 10471, 1374, 6656, 1374, 77943, 10471, 49254, 10471, 77943, 10471, 49254, 10471, 10471, 1374, 6656, 1374, 10471, 1374, 6656, 1374, 49254, 6656,31213, 6656,49254, 6656,31213, 6656, 10471, 1374, 6656, 1374, 10471, 1374, 6656, 1374,
},
{
69399, 9343, 43934, 9343, 69399, 9343, 43934, 9343, 9343, 1293, 5925, 1293, 9343, 1293, 5925, 1293, 43934, 5925,27745, 5925,43934, 5925,27745, 5925, 9343, 1293, 5925, 1293, 9343, 1293, 5925, 1293, 69399, 9343, 43934, 9343, 69399, 9343, 43934, 9343, 9343, 1293, 5925, 1293, 9343, 1293, 5925, 1293, 43934, 5925,27745, 5925,43934, 5925,27745, 5925, 9343, 1293, 5925, 1293, 9343, 1293, 5925, 1293,
{
61851, 8319,39131, 8319,61851, 8319,39131, 8319, 8319, 1099, 5281, 1099, 8319, 1099, 5281, 1099, 39131, 5281,24741, 5281,39131, 5281,24741, 5281, 8319, 1099, 5281, 1099, 8319, 1099, 5281, 1099, 61851, 8319,39131, 8319,61851, 8319,39131, 8319, 8319, 1099, 5281, 1099, 8319, 1099, 5281, 1099, 39131, 5281,24741, 5281,39131, 5281,24741, 5281, 8319, 1099, 5281, 1099, 8319, 1099, 5281, 1099, }, {
55098, 7406, 34862, 7406, 55098, 7406, 34862, 7406, 7406, 999, 4672, 999, 7406, 999, 4672, 999, 34862, 4672, 22048, 4672, 34862, 4672, 22048, 4672, 7406, 999, 4672, 999, 7406, 999, 4672, 999, 55098, 7406,34862, 7406,55098, 7406,34862, 7406, 7406, 999, 4672, 999, 7406, 999, 4672, 999, 34862, 4672,22048, 4672,34862, 4672,22048, 4672, 7406, 999, 4672, 999, 7406, 999, 4672, 999,
};
Of course, the round-off could also be modified to round negative numbers down by first taking the absolute value of F,j, next multiplying by the corresponding element of AVS_Q_tabsføρ%6], then rounding off, and lastly reinserting the sign of Fjj. Further, the round-off parameter, α, could be varied according to the type of block. Note that an increase of qp by 6 leaves AVS_Q_tabs[qp%6][i*8+j] invariant and reduces QFtj by a factor of 2 due to the right shift. Compute the inverse quantization of QFSx&= {QFif. i,j = l, 2, ..., 1), the 8x8 quantized DCT-coefficients block, to give 8x8= {F',/. i,j = 1, 2, ..., 1), the inverse quantized DCT-coefficients block, by:
F'i = (QF,j *AVS_ IQ tώstøp%6][**8 +f] ) « (qp/6) if = 1, 2, ..., 1 where qp = 0, 1, 2, ...,51 is the quantization step, qp/6 is integer division (no remainder), and the tables AVSJQ abs are defined as: int AVS_IQ_tabs[6][64]={
{ 683, 92, 432, 92, 683, 92, 432, 92, 92, 12, 58, 12, 92, 12, 58, 12, 432, 58, 273, 58, 432," 58, 273, 58, 92, 12, 58, 12, 92, 12, 58, 12, 683, 92, 432, 92, 683, 92, 432, 92, 92, 12, 58, 12, 92, 12, 58, 12, 432, 58, 273, 58, 432, 58, 273, 58, 92, 12, 58, 12, 92, 12, 58, 12,
{ 767, 103, 485, 103, 767, 103, 485, 103, 103, 14, 65, 14, 103, 14, 65, 14, 485, 65, 307, 65, 485, 65, 307, 65, 103, 14, 65, 14, 103, 14, 65, 14, 767, 103, 485, 103, 767, 103, 485, 103, 103, 14, 65, 14, 103, 14, 65, 14, 485, 65, 307, 65, 485, 65, 307, 65, 103, 14, 65, 14, 103, 14, 65, 14,
}, { 861, 116, 545, 116, 861, 116, 545, 116, 116, 16, 73, 16, 116, 16, 73, 16, 545, 73, 344, 73, 545, 73, 344, 73, 116, 16, 73, 16, 116, 16, 73, 16, 861, 116, 545, 116, 861, 116, 545, 116, 116, 16, 73, 16, 116, 16, 73, 16, 545, 73, 344, 73, 545, 73, 344, 73, 116, 16, 73, 16, 116, 16, 73, 16, }, { 967, 130, 611, 130, 967, 130, 611, 130, 130, 17, 82, 17, 130, 17, 82, 17, 611, 82, 387, 82, 611, 82, 387, 82, 130, 17, 82, 17, 130, 17, 82, 17, 967, 130, 611, 130, 967, 130, 611, 130,
130, 17, 82, 17, 130, 17, 82, 17, 611, 82, 387, 82, 611, 82, 387, 82, 130, 17, 82, 17, 130, 17, 82, 17,
}, { 1085, 146, 686, 146, 1085, 146, 686, 146, 146, 20, 92, 20, 146, 20, 92, 20, 686, 92, 434, 92, 686, 92, 434, 92, 146, 20, 92, 20, 146, 20, 92, 20, 1085, 146, 686, 146, 1085, 146, 686, 146, 146, 20, 92, 20, 146, 20, 92, 20, 686, 92, 434, 92, 686, 92, 434, 92, 146, 20, 92, 20, 146, 20, 92, 20,
}, { 1218, 164, 770, 164, 1218, 164, 770, 164, 164, 22, 104, 22, 164, 22, 104, 22, 770, 104, 487, 104, 770, 104, 487, 104, 164, 22, 104, 22, 164, 22, 104, 22, 1218, 164, 770, 164, 1218, 164, 770, 164, 164, 22, 104, 22, 164, 22, 104, 22, 770, 104, 487, 104, 770, 104, 487, 104, 164, 22, 104, 22, 164, 22, 104, 22,
}; Lastly, compute the inverse transform: let
8x8 and F
8x8 denote the output 8x8 reconstructed sample block and the input 8x8 inverse quantized DCT- coefficients block, respectively. Then 8x8 integer inverse transform is defined as: δx8
= [ ? 8x8 X -^ 8x8 X ?8x8 ] l 1 1 where "> n" means matrix right rounding shift by n, that is, for each matrix element add 2"
"1 and then right shift by n (i.e., divide by 2" and discard remainder). The following table summarizes the data precision and bit-shifting of the 32-bit AVS method. The total table size used here is 8x8 (transform matrix) + 64x24 (quantization tables) + 64x12 (inverse-quantization tables) + 8x8 (inverse- transform matrix) = 2432 bytes.
Note that each entry in the "input data precision" column has two numbers: the first is the precision of the input data (e.g., 8-bit RGB or luminance pixel data increased to 9 bits from motion compensation), and the second number is either the precision of the 8x8 transform matrix (4 bit magnitude plus a sign bit) or the precision of the AVS_Q_tabs entries (positive integers of 17 bits) or the AVS_IQ_tabs entries (positive integers of 11 bits). However, even the 32-bit AVS method has high computational complexity.
SUMMARY OF THE INVENTION The present invention provides low-complexity transformation plus inverse transformation and/or quantization plus inverse quantization with an inverse quantization using adaptive right shifting and small lookup tables. The preferred embodiment methods provide for 16-bit operations with close to 32-bit accuracy for 8x8 DCT-type transformation plus quantization and inverses which are useful in video coding with motion compensation.
BRIEF DESCRIPTION OF THE DRAWINGS Figures 1a-1b are flow diagrams. Figure 2 illustrates a motion compensation video compression with DCT- transformation and quantization. Figure 3 shows method comparisons.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
1. Overview The preferred embodiment low-complexity methods provide both an 8x8 block transform/inverse transform and quantization/inverse quantization; the quantization has a range of quantization steps and the inverse quantization uses a shift which adapts to the quantization step. The quantization/inverse quantization provide high performance with low-complexity 16-bit arithmetic and small table size. The methods have application to video compression which operates on 8x8 blocks of (motion-compensated) pixels with DCT transformation and quantization of the DCT-coefficients where the quantization can vary widely. As illustrated in Figure 2, fullness feedback from the bitstream buffer may determine the quantization factpr, which typically varies in the range from 1 to 200-500. Preferred embodiment systems perform preferred embodiment methods with digital signal processors (DSPs) or general purpose programmable processors or application specific circuitry or systems on a chip (SoC) such as both a DSP and RISC processor on the same chip with the RISC processor controlling. In particular, digital still cameras (DSCs) with video clip capabilities or cell phones with video capabilities could include the preferred embodiment methods. A stored program could be in an onboard ROM or external flash EEPROM for a DSP or programmable processor to perform the signal processing of the preferred embodiment methods. Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, and modulators and demodulators (plus antennas for air interfaces) provide coupling for transmission waveforms.
2. First preferred embodiment In order to reduce the transform and quantization complexity and avoid IDCT mismatch on the decoder side, the first preferred embodiments provide
both a 16-bit based integer transform/inverse transform and a quantization/- inverse quantization with an adaptive shifting in the inverse quantization, (a) forward transform Define an integer 8x8-transform matrix as
"1448 1448 1448 1448 1448 1448 1448 1448 2009 1703 1138 400 -400 -1138 -1703 -2009 1892 784 -784 -1892 -1892 -784 784 1892 1703 -400 -2009 -1138 1138 2009 400 -1703 1448 -1448 -1448 1448 1448 -1448 -1448 1448 1138 -2009 400 1703 -1703 -400 2009 -1138 784 -1892 1892 -784 -784 1892 -1892 784 400 -1138 1703 -2009 2009 -1703 1138 -400
Let/8x8 and 8x8 be the input 8x8-sample block and the output 8x8 DCT- coefficient block, respectively. The forward 8x8-integer transform is defined as: F*a = {[(r8x8x/8x8) r> 8 ] x 7*8x8} > 16 again where "x" denotes 8x8 matrix multiplication, 7 x8 is the transposed matrix of TSχs, and "t> n denotes matrix right rounding shift by V bits (n = 8 and 16 in this case) which is defined as follows. Let 8xg = {my: i,j = 1, 2, ..., 7} be the resulting matrix after operating > « on the matrix 8x8 = {Mt . i,j = 1, 2, ..., 7}, then the matrix right rounding shift m8x8 = M % > n is defined as: my = (My + 2 n ) » n for if = 1, 2, ..., 7. Of course, the rounding could be in both positive and negative directions by taking the absolute value of My, adding the 2 n~ shifting, and then applying the sign of My. The forward integer transform ( sx8to Fi %) performs the horizontal transform first (with matrix right rounding shift of 8 bits), followed by the vertical transform (with matrix right rounding shift of 16 bits). In contrast, the 32-bit AVS method performs both matrix multiplications with no rounding or shifting, (b) quantization The preferred embodiment 16-bit quantization of the 8x8 DCT-coefficient block 8x8 to give the 8x8 quantized DCT-coefficient block β 8x8 is defined as:
TI-37150 Page 10
QFij = ( y * QTAB[qp] + α* 215) » 16 ij = 1, 2, ..., 7 where α has a value in the range 0.3 ~ 0.5, qp = 0, 1, 2, ..., 51 is the quantization step, and the QTAB table is defined as:
Note that this quantization is scalar multiplication contrary to 32-bit AVS which has an 8x8 scaling matrix for each qp/6 residue. Also contrary to the periodicity of AVS_ IQ_tabsføp%6][)
'*8 +/J with respect to qp, preferred embodiment QTAB[qp] is not periodic in qp; in fact, it approximately decreases by a factor of 2 when qp is increased by 6. Of course, QTAB[qp] is a much smaller table than AVS_ IQ absføp%6][£j. (c) inverse quantization The preferred embodiment 16-bit inverse quantization of the 8x8 quantized DCT-coefficients block, >F
8x8 = {QFi/. ij ~ 1, 2, ..., 7}, to give the inverse quantized DCT-coefficients block, 7A
8X8= {Fy ij = 1, 2, ..., 7}, uses an adaptive shifting and is defined as F'ij = (QF
u * IQTAB[qp]+ 2
SHIF™-
l) » SHIFT[qp\ ij = l, 2, ..., 7 again where qp=0,l,2, ...,51 is the quantization step, and the IQTAB and SHIFT tables are defined as:
Note that IQTAB[qp] is a 16-bit positive integer (no sign bit) with a most significant bit (MSB) equal to 1 for all qp, and SHIFT[qp] is in the range from 7 to 15. Also, contrary to the left shift of the 32-bit AVS inverse quantization, the preferred embodiment inverse quantization shift is a right shift and may include round-off such as the term + 2
SHIFIlgp]~l of the preferred embodiment. The preferred embodiments use this right shifting which adapts to qp to maintain the highest fidelity under the constraints of the 16-bit arithmetic. Of course, the IQTAB table could be adjusted to accommodate a different range of qp values; in the foregoing every increment/decrement of 6 in qp results in roughly a factor of 2 change in the quantization, (d) inverse transform Let
8x8 and F
8x8 be the output 8x8 reconstructed sample block and the input 8x8 inverse quantized DCT-coefficients block, respectively. The 8x8 integer inverse transform is defined as:
ft* = {[(7*8x8X ^8x8) > 8 ] X 78x8} > 16 The inverse integer transform performs the horizontal inverse transform first (with matrix right rounding shift of 8 bits), followed by the vertical inverse transform (with matrix with matrix right rounding shift of 16 bits).
The following table summarizes the data precision and bit-shift used in the 16-bit preferred embodiment method. The total table size used here is 8x8x2 (transform) + 52x2 (quantization) + 52x3 (inverse-quantization) + 8x8x2 (inverse- transform) = 516 bytes.
Again, each entry in the "input data precision" column has two numbers: the first is the precision of the input data (8-bit RGB pixel data increased to 9 bits from motion compensation), and the second number is either the precision of the 8x8 transform matrix (11 bit magnitude plus a sign bit) or the precision of the QTAB or the IQTAB entries (positive integers of 16 bits). The inverse quantization with a lookup table for both a scalar, IQTAB, and a right shift, SHIFT, provides for lower complexity with good performance for 16-bit arithmetic.
4. Experimental results Performance comparison is done both for the preferred embodiment 16-bit based transform and quantization versus both the double-precision method and the AVS 32-bit based method. Figure 3 illustrates the block diagram used for the performance comparison between the double-precision and 16-bit based transform and quantization. To compare the performance, the two methods are applied to the same 8x8 residual blocks. After transform, quantization, inverse-quantization
and inverse-transform, the PSNR values between the reconstructed blocks and the original residual blocks are computed for the two methods (PSNRD(QP) and PSNR16(QP) in Figure 3) separately. The double-precision based transform and quantization method were described in the background, while the 16-bit based preferred embodiment method uses the foregoing equations. The residual blocks are random data in the range of [-255, 255]. For each qp, 6000 random 8x8 blocks are used. The results are listed in the following table.
The foregoing table reveals that the preferred embodiment 16-bit transform and quantization method has the almost the same performance as the double- precision method, except for the qp = 0 (QP = 1 lossless coding) there is a 0.46 dB loss. The performance comparison between the 16-bit based and the AVS 32- bit based transform and quantization was also conducted. The same comparison approach shown in Figure 3 (i.e., just replace the double-precision method in
Figure 3 with the 32-bit AVS method described by the equations in the background is used. One exception is that the AVS uses QP = 2.5 * 2qpl6 as the quantization scale. The preferred embodiment 16-bit method is adjusted to the AVS QPs to ensure the fair comparison. The results are listed in the following table.
The foregoing table reveals that the preferred 16-bit method outperforms the 32- bit based AVS transform and quantization in terms of PSNR values over the typical operating bit-rates (qp = 16 -28). At this stage, the Rate-Distortion comparison between those two is not clear because of the lack of 32-bit AVS reference software. It is expected that the preferred embodiment method has comparable coding efficiency to the AVS one. The fact is that the preferred embodiment 16-bit method is almost identical to the double-precision implementation in terms of coding efficiency, and the 32-bit AVS transform and
quantization was also derived from the double-precision 8x8 DCT and quantization method described in the background. In short, the preferred embodiment 16-bit based 8x8 transform and quantization method for enhanced video coding shares comparable coding quality with the double-precision implementation and the 32-based transform and quantization used in the AVS standard, but requires much smaller table size (516 bytes vs. 2432 bytes) and computation complexity. It completely avoids use of 32-bit memory accesses and multiplies.
3. Modifications The preferred embodiment methods can be modified in various ways while retaining one or more of the features of the 16-bit integer 8x8 transformation plus inverse transformation and/or small table quantization plus inverse quantization with adaptive right shifting. That is, the transformation/inverse transformation and the quantization/inverse quantization can be used independently. For example, the scale from quantization step qp to quantization factor QP could be changed from qp = log2QP / 6 to qp = log2QP / n for another integer n with corresponding changes in the tables QTAB and IQTAB/SHIFT. Even scalings other than log-linear could be used. The transform/inverse transform matrices could be varied, and entries in the quantization/inverse quantization tables could be varied and even made adaptive to the type of 8x8 block being processed, and the round-off could also be made adaptive.