WO1997002535A1

WO1997002535A1 - Image processing circuit and method

Info

Publication number: WO1997002535A1
Application number: PCT/EP1996/002816
Authority: WO
Inventors: Patrick Clement; Fathy Fouad Yassa
Original assignee: Motorola Inc.
Priority date: 1995-07-01
Filing date: 1996-06-27
Publication date: 1997-01-23
Also published as: GB2303012B; GB2303012A; GB9513431D0; JPH10505445A; HK1013699A1; EP0799453A1

Abstract

An image processing circuit for processing DCT coefficients includes an input latch (20) for receiving a DCT coefficient and associated coordinates. A multiplying stage (30) receives an unsigned value of the DCT coefficient and multiplies it by a number of DCT functions to produce a number of products. A decoder (50) provides control signals based on the coordinates and the sign of the DCT coefficient. A multiplexer (60) selects a number of pairs of the products in dependence upon the control signals. An accumulator (80) is arranged to store partial picture element values and adds or substracts each element of the pair to a selected one of the partial picture element values in dependence upon the control signals.

Description

IMAGE PROCESSING CIRCUIT AND METHOD

Field ofthe Invention

This invention relates to image processing and particularly but not exclusively to compression and decompression of video images.

Background of the Invention

Image compression and decompression techniques, which compress images in order to facilitate efficient transmission are well known. A number of standards exist which define particular methods. For example, the JPEG standard for still images, and the MPEGl and MPEG2 standards for moving images.

As part of the compression method, typically, Discrete Cosine Transform (DCT) functions are used to transform an image into a number of DCT coefficients. The image is typically sub-divided into a number of nominal blocks containing picture elements, and the functions are performed on the picture elements of each block. Corresponding inverse DCT (I-DCT) functions are used for subsequent decompression of the DCT coefficients to reconstruct the picture elements of the blocks and hence the image.

A dedicated processor which performs the decompression process typically handles the coefficients using butterfly computations or a row-column method to reconstruct the picture elements. These methods require a number of cascaded computations, each of which has specific combinations of operands, which are then multiplied by cosine functions. The first computation processes the coefficients together in first specific combinations to provide intermediate results. These are then subject to a second computation, according to a second specific combination, in order to provide further intermediate results, and so on. The intermediate results must be stored after each computation.

A problem with the above method is that the intermediate results must be re¬ ordered in order to achieve the different specific combinations required by each computation, giving rise to processing delay. Furthermore, the multiplication of the cosine functions require the use of at least one full multiplier within the dedicated processor, which reduces the speed of the processing even further.

This invention seeks to provide an image processing circuit and method which mitigate the above mentioned disadvantages.

Summary of the Invention

According to a first aspect of the present invention there is provided an image processing circuit for processing transform function coefficients to generate an image, the circuit comprising: an input latch for receiving a transform function coefficient and an associated set of predetermined co-ordinates; a multiplying stage coupled to receive an unsigned value of the transform function coefficient and for multiplying the unsigned value by a plurality of transform functions to produce a plurality of products; a decoder coupled to receive the predetermined coordinates and the sign ofthe transform function coefficient from the latch, for providing control signals; a multiplexer coupled to receive the plurality of products, for selecting a number of pairs of products in dependence upon the control signals; and, an accumulator arranged to store partial picture element values and coupled to receive the number of pairs and with respect to each ofthe number, for adding or subtracting each element of the pair to a selected one of the partial picture element values in dependence upon the control signals.

Preferably the multiplying stage comprises a number of shifters, a number of adder/subtractors and a number of multipliers in order to provide the plurality of products. The decoder preferably further comprises a plurality of cells, each cell providing a partial control signal.

According to a second aspect of the invention there is provided a method for processing a set of transform function coefficients to produce picture elements of an image, comprising the steps of: receiving a transform function coefficient and an associated set of predetermined coordinates; multiplying an unsigned value of the transform function coefficient by a plurality of transform functions to produce a plurality of products; decoding the predetermined coordinates and the sign of the transform function coefficient to provide control signals; selecting a number of pairs from the plurality of products in dependence upon the control signals; adding or subtracting each element of the pair to a selected one of the partial picture element values in dependence upon the control signals, whereby processing the last transform function coefficient of the set produces the picture elements from the partial picture element values.

Preferably the transform is an inverse discrete cosine transform. Alternatively the transform is preferably an inverse discrete sine transform.

In this way the processing delay associated with the prior art is substantially avoided.

Brief Description of the Drawings

An exemplary embodiment of the invention will now be described with reference to the drawing in which:

FIG.1 shows a preferred embodiment of an image processing circuit in accordance with the invention.

FIG.2 shows a multiplying stage forming part of the circuit of FIG.1

FIG.3 shows a decoder cell forming part of the circuit of FIG.1.

Detailed Description of a Preferred Embodiment

The inverse discrete cosine transform function (I-DCT) may be used to transform DCT coefficients into a two-dimensional image. Such coefficients use less space when stored or transmitted electronically. Typically the transform function is performed on a block of DCT coefficients, according to the following equation:-

f(j,k) = - ∑ ∑ C(u) CO) F(u,v) cos ^(2j^^)uπ cos ^(2k^^)vπ Equation 1 u = 0v = 0 where f(j,k) is a reconstructed picture element for each j and k,

N is the size ofthe block on which the I-DCT is performed, u and v are the transform domain coordinates ofthe DCT coefficients, j and k are the spatial coordinates of the picture elements,

F(u,v) is a DCT coefficient for each u and v, and

C(u) and C(v) are constants, being:

Taking N=8 as the block size, according to the MPEG2 standard, and using trigonometric manipulation, equation 1 becomes:-

7

«.,»-! Σ ∑C(u)C(v)F(u,v) u = 0v = 0

Equation 2

COS- (^2j+l)u z +— (2k+l)v _π+C0S- (-2j^J+l)u-(2k+l) Lv-jr

16 lb

This may also be written as:-

f(j,k) = - ∑ ∑ f(-l)^sf (-l)^saza muxa + (-l)^sbzb muxbl] Equation 3 u = 0v = 0

where sf, sa, sb, za, and zb are either 0 or 1, as functions of u, v, j and k,

and where

muxa = b2a-bla-( ip |F(u,v)| + ip-|F(u,v)|cos —

16

+ b2a ^• bla ^■ I ip |F(u, v)|cos 1- ip ^• |F(u, v)|cos —

16 16

Equation 4 4? 5-T

+ b2a- bla j ιp |F(u,v)|cos l-ip-|F(u,v)|cos —

+ b2a bla [ ip |F(u,v)|cos hip-|F(u,v)|cos —

16 16

muxb = b2b ^■ bib ip |F(u,v)| + ip-|F(u,v)|cos — v 16

+ bib- bib (ip |F(u,v)|cos— + ϊp-|F(u,v)|cos —

Equation 5 / 4? π

+ b2b blb ip |F(u,v)|cos — + ip-|F(u,v)|cos—

+ b2b-blb ip |F(u,v)|cos — + ip-|F(u,v)|cos —

where b2a, bla, b2b, bib = 0 or 1, as functions of u, v, j and k, and ip = 1 when u and v have opposite parities and ip = 0 when u and v have like parities, and I F(u,v) I represents the unsigned value ofa DCT coefficient.

Therefore the I-DCT may be performed with the computation of just seven products:-

|F(u,v)|-cos^ with n=l to 7.

F(u,v) is used in all seven cases and the cosine functions are fixed, therefore the seven multiplications may be replaced by a small number of additions and subtractions.

Expressing the cosine functions with binary weights and limiting to 6 bits, we have :-

|F(u, v)| ^■ cosjj = |F(u. v)| - l^F("^v)l Equation 6 |F(u,v)| - --£feϊ Equation ?

n-_{v M} 3π |F(u,v)| |F(u,v)| |F(u,v)| |F(u,v)| _, ^ _

|F(u,v)| - cos^ = Li—-^ + ^-_r^ ₊ L^_^ ₊ i-^-_r^ Equation 8

E .-,qua_.t_i■on _« 9

^ ^• cos^ 16 ^ ^ Equation 10

|F(_U,v)| . cos ₌ J^ ₊ M^)l Equation 11

|F(u,v)|. _COs^ = ^ ₊ ^ Equation 12

It will be appreciated that more or less than 6 bits may be used, depending on the accuracy required.

Referring to FIG.1, there is shown an image processing circuit 10. A latch arrangement 20 of the circuit 10 comprises a first latch 22 coupled to receive first and second coordinate values via input terminals 14 and 16 respectively, and a second latch 24, coupled to receive transform function values via an input terminal 12.

A supermultipHer block 30 is coupled to receive the latched transform function values from the second latch 24, for providing a number of weighted output values for each transform function value I F(u,v) I .

Referring now also to FIG.2, the supermultiplier 30 is shown in greater detail. An input 100 of the supermultiplier 30 is coupled to receive the digital value I F(u,v) I . In digital processing, I F(u,v) I is provided in binary form. Divisions by powers of two are therefore easily performed by shifting the corresponding digital word to the right. Five shifters 102, 104, 106, 108 and 110 are coupled to the input terminal and are arranged to perform the powers of two divisions. The seven required multiplications of equations 6-12 above may then be simply replaced by seven additions and two subtractions, which are performed in the supermultiplier 30 by seven adders 122, 124, 126, 128, 130, 142 and 144, and two subtractors 120 and 140.

A control input terminal 160 provides an ip value corresponding to the ip coefficient of equations 4 and 5. Four multiplexers 150, 152, 154 and 156 are coupled to receive output values from the adders and subtractors 120-144 and also to receive the ip value from the control input terminal 160.

The subtractor 120 is coupled to receive the I F(u,v) I value from the input teπninal 100 and a shifted value of I F(u,v) I from the shifter 110, which corresponds to a division of 2⁶. The multiplexer 150 receives the I F(u,v) I value from the input terminal 100 and the calculated value from the subtractor 120. Under the control of the ip value, the multiplier 150 is therefore able to provide the value I F(u,v) I and the value of equation 6, in the combination required by the expression in parentheses on the first lines of equations 4 and 5.

Similarly the multiplexers 152, 154 and 156 provide the appropriate combinations of derived I F(u,v) I values in order to achieve the remaining expressions in parentheses of equations 4 and 5.

A register 40 of FIG. 1 is coupled to receive the combinations of derived I F(u,v) I values from the supermultiplier 30, for providing the values under the control of a clocking signal.

A multiplexer block 60 is coupled to receive the clocked values from the register 40, and to receive control values to be further described below, for multiplexing the values in accordance with the expressions in parentheses with the control values in order to provide muxa and muxb values in accordance with equations 4 and 5. The control values are the calculated values b2b, bib, b2a and bla in accordance with equations 4 and 5.

An adder/subtractor block 70 is coupled to receive the muxa and muxb values from the multiplexer 60, and to receive control values from the control block 50, for performing the addition or subtraction of the muxa and the muxb calculations according to the expression in square parentheses of equation 3. The control values received by the adder/subtractor block 70 correspond to the values sf, sa, sb, za and zb.

A control block 50 provides control values to the supermultiplier 30, the multiplexer 60 and the adder/subtractor 70. The u and v values may be expressed as:-

u = u2uluθ and v = v2vχvo where UQ and VQ are the least significant bits of u and v.

Then the ip value may be calculated as:-

ip = uθ XOR VQ Equation 13

A parity and sign register 58 of the control block 50 is coupled to receive the u and v values from the u and v latch 22, and a value from the latch 24 in order to derive the ip values according to equation 13, to be used by the supermultiplier 30.

A cosine decoder 52 is also coupled to receive the u and v values from the latch 22, for providing the control values for the multiplexer 60 and the adder/subtractor 70. The cosine decoder 52 comprises a number of cells 54, to be further described below. The control values for the multiplexer 60 consist of calculated values b2b, bib, b2a and bla in accordance with equations 4 and 5. The control values for the adder/subtractor 70 consist of the values sf, sa, sb, za and zb.

The control values b2b, bib, b2a and bla are related to C(u), C(v) and to the expressions (2j+l)u + (2k+l)v and (2j+l)u - (2k+l)v. As the cosine function is periodic with a period of 2π, the above expressions have only to be computed modulo 32.

The cosine decoder 54 thus performs the following computations:-

(2j+l)u_c + (2k+Dv_c for the range j=0 to 7 and k=0 to 7 Equation 14

(2j+l)uc - (2k+l)v_c for the range j=0 to 7 and k=0 to 7 Equation 15 where uc = u for u ≠ 0, and uc = 4 for u = 0 and vc = v for v ≠ 0, and v_c = 4 for v = 0

The substitutions of u_c and v_c for u and v are to take account of the effect of C(u) and CO).

Due to trigonometric properties, it is necessary only to perform the calculations of equations 14 and 15 in the ranges j = 0, 1, 2 and 3 and k = 0, 1, 2 and 3, which results in 32 expressions :-

u_c + v_c 3u_c + v_c 5u_c + v_c 7u_c + v_c u_c + 3v_c 3u_c + 3v_c 5uc + 3vc 7u_c + 3v_c uc + 5v_c 3uc + 5vc 5uc + 5v_c 7u_c + 5v_c u_c + 7v_c 3uc + 7vc 5u_c + 7v_c 7u_c + 7v_c

uc- v_c 3u_c - v_c 5u_c - v_c 7u_c - v_c u_c - 3v_c 3u_c - 3v_c 5u_c - 3v_c 7u_c - 3v_c uc - 5v_c 3u_c - 5v_c 5u_c - 5v_c 7u_c - 5v_c uc - 7v_c 3u_c - 7v_c 5u_c - 7v_c 7u_c - 7v_c

Referring now also to FIG.3, a cell 54 of the cosine decoder 52 is shown in more detail. Each cell performs one of the 32 expressions above. Input terminals 202 and 204 are coupled to receive the u and v values from the latch 22. An adder 210 calculates the expression, the result being a 5 bit word. Using the terminology expressed above, the five bit word contains bits b4, b3, b2, bl and bO, where bO is the least significant bit.

A two's complement block 220 is coupled to receive the 5 bit word, and calculates the two's complement of that word, which will be referred to as the word composed of bits b4c, b3c, b2c, ble and bOc.

A multiplexer 230 is coupled to receive the two's complemented word, and the 5 bit word from the adder 210, for providing the following products at outputs 254 and 256 respectively:-

(b2c ^• b3) + (b2 • bl) Equation 16

(ble • b3) + (bl ^■ bl) Equation 17 Logic gates 240 and 242 are arranged to receive the two's complemented word, the 5 bit word from the adder 210, and a further input from input terminal 244, for providing the following products at outputs 250 and 252 respectively: -

sa = b4 XOR b3 Equation 18 za = b3 ^• b3c Equation 19

In this case input terminal 244 receives a zero value. Other cells in the cosine decoder 52 are arranged to receive values appropriate for the calculations of that cell, and to provide the other values sb, zb.

A cosine register 56 is arranged to receive and hold the control values output from the cosine decoder 54, for providing the control values to the multiplexer 60 and the adder/subtractor 70 under the control of the clock signal.

An adder/accumulator 80 receives the muxa, muxb values according to the expressions in square parentheses of equation 3 from the adder/subtractor 70. The values output from the adder/subtractor 70 are required to be added and accumulated over the range u = 0 to 7 and v = 0 to 7.

However, this would require the adder/accumulator 80 to be very large. In order to reduce the number of adder stages within the adder/accumulator 80, four partial sums are performed instead:-

f_a(j,k) = -^ ∑ ∑ {(-l)^sf [(-l)^saza muxa + (-l)^sbzb muxb]}Equation 20 u even v even f_b(j,k) = ^ ∑ ∑ {(-l)^sf [(-l)^sa za muxa + (-l)^sbzb muxb]}Equation 21 ° u even v odd f (J_>k) = | ∑ ∑ ((-l)^sff(-l)^saza muxa + (-l)^sbzb muxb]JEquation 22 u odd v even f , (j,k) = - ∑ ∑ ((-l)^sf f(-l)^saza muxa + (-l)^sbzb muxb]]Equation 23 ⁸ u odd v odd ^{1 L JJ}

These partial calculations are completed once all 64 DCT coefficients, F(u,v) of one block have been processed, the partial results being the final values stored in the elements of the adder/accumulator 80. Then the partial results are output to an output stage 90.

The output stage 90 comprises a partial sums latch 92 and an output adder 96. The partial sums latch latches the partial results received from the adder/accumulator 80, at which point the adder/accumulator 80 may be reset, ready to receive the muxa, muxb values from the next block to be processed.

The output adder 96 is arranged to combine the partial sums stored in the latch 92, according to their j and k coordinates as follows:-

f(j,k)=f_a(j,k) + fbO^'.k) + f_c(j,k) + fd(j,k) for j=0 to 3 and k=0 to 3 f(j,k)=fa(j,7-k) + fb(j,7-k) + f_c(j.7-k) + fd(j,7-k) for j=0 to 3 and k=4 to 7 f(j,k)=fa(7-j,k) + f (7-j,k) + f_c(7-j,k) + fd(7-j,k) for j=4 to 7 and k=0 to 3 f(j,k)=f_a(7-j,7-k) + fb(7-j,7-k) + f_c(7-j,7-k) + fd(7-j,7-k) for j=4 to 7 and k=4 to 7

In this way the DCT coefficients ofthe block are processed according to an inverse transform method, to reconstruct the picture elements of the block.

The clock signal controls the clocking of the first and second latches 22 and 24, the register 40 and the adder/accumulator 80. The provision of registers and latches of the circuit 10 allow for pipelined operations within the circuit 10, which increases the speed and efficiency of the image processing.

It will be appreciated by a person skilled in the art that alternative embodiments to the one described above are possible. For example, the circuit is not limited to inverse discrete cosine transform functions. It is envisaged that the circuit 10 could be simply adapted to process images according to other functions, such the inverse discrete sine transform function.

Furthermore, the supermultiplier 30 may be arranged to perform substantially the same function, but using different elements, such as the use of fewer addition and subtraction blocks, in exchange for more mux blocks.

Finally, it will be appreciated that this method may be applied to block sizes different of N=8, assuming that N remains a power of 2.

Claims

1. An image processing circuit, for processing transform function coefficients to generate an image, the circuit comprising: an input latch for receiving a transform function coefficient and an associated set of predetermined coordinates; a multiplying stage coupled to receive an unsigned value ofthe transform function coefficient and for multiplying the unsigned value by a plurality of transform functions to produce a plurality of products; a decoder coupled to receive the predetermined coordinates and the sign of the transform function coefficient from the latch, for providing control signals; a multiplexer coupled to receive the plurality of products, for selecting a number of pairs of products in dependence upon the control signals; and, an accumulator arranged to store partial picture element values and coupled to receive the number of pairs and with respect to each of the number, for adding or subtracting each element of the pair to a selected one of the partial picture element values in dependence upon the control signals.

2. A circuit as claimed in claim 1 wherein the multiplying stage comprises a number of shifters, a number of adder/subtractors and a number of multiplexers in order to provide the plurality of products.

3. A circuit as claimed in claim 1 or claim 2 wherein the decoder further comprises a plurality of cells, each cell providing a partial control signal.

4. A circuit as claimed in any preceding claim wherein the transform function is an inverse discrete sine transform function.

5. A circuit as claimed in any of the claims 1 to 3 wherein the transform function is an inverse discrete cosine transform function.

6. A method for processing a set of transform function coefficients to produce picture elements of an image, comprising the steps of: receiving a transform function coefficient and an associated set of predetermined coordinates; multiplying an unsigned value of the transform function coefficient by a plurality of transform functions to produce a plurality of products; decoding the predeteπnined coordinates and the sign of the transform function coefficient to provide control signals; selecting a number of pairs from the plurality of products in dependence upon the control signals; adding or subtracting each element of the pair to a selected one of the partial picture element values in dependence upon the control signals, whereby processing the last transform function coefficient of the set produces the picture elements from the partial picture element values.

7. A method as claimed in claim 6 wherein the transform function is an inverse discrete cosine transform function.

8. A method as claimed in claim 6 wherein the transform function is an inverse discrete sine transform function.

9. An image processing circuit substantially as hereinbefore described and with reference to the drawings.

10. An image processing method substantially as hereinbefore described and with reference to the drawings.