EP1018082A1 - Variable block size 2-dimensional inverse discrete cosine transform engine - Google Patents

Variable block size 2-dimensional inverse discrete cosine transform engine

Info

Publication number
EP1018082A1
EP1018082A1 EP98942197A EP98942197A EP1018082A1 EP 1018082 A1 EP1018082 A1 EP 1018082A1 EP 98942197 A EP98942197 A EP 98942197A EP 98942197 A EP98942197 A EP 98942197A EP 1018082 A1 EP1018082 A1 EP 1018082A1
Authority
EP
European Patent Office
Prior art keywords
idct
butterfly
stages
serial
transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP98942197A
Other languages
German (de)
French (fr)
Inventor
Kenneth D. Easton
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of EP1018082A1 publication Critical patent/EP1018082A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform

Definitions

  • the present invention relates to digital signal processing. More particularly, the present invention relates to a novel and improved variable block size 2-dimensional (2-D) inverse discrete cosine transform (IDCT) engine.
  • 2-D 2-dimensional inverse discrete cosine transform
  • the 2-dimensional discrete cosine transform (IDCT) and inverse discrete cosine transform (IDCT) are important signal processing operations in digital image compression.
  • One such digital image compression application is in the area of high definition television (HDTV).
  • HDTV high definition television
  • ADC analog-to- digital converter
  • the resultant sampled data is then digitally processed to minimize the amount of data which must be transmitted and/or stored while retaining high picture quality.
  • a key element of the compression process is the 2-D discrete cosine transform wherein an NxN block of sampled data, or an image, is transformed from the time domain to the frequency domain.
  • the transformed data can be further processed with block codes, such as Huffman code, run length codes, and /or error correcting codes, such as convolutional codes and Reed-Solomon codes.
  • block codes such as Huffman code, run length codes, and /or error correcting codes, such as convolutional codes and Reed-Solomon codes.
  • An exemplary HDTV image compression scheme is disclosed in U.S. Patent No. 5,452,104, U.S. Patent No. 5,107,345, and U.S. Patent No. 5,021,891, all three entitled “ADAPTIVE BLOCK SIZE IMAGE COMPRESSION METHOD AND SYSTEM", and U.S. Patent No. 5,576,767, entitled • TNTERFRAME VIDEO ENCODING AND DECODING SYSTEM", all four patents are assigned to the assignee of the present invention and incorporated by reference herein.
  • the digitally encoded video waveform is transmitted and/or stored.
  • the reverse of the digital signal processing is performed to reconstruct the pixels of the original image.
  • the recovered image is provided to a digital-to-analog converter (DAC) which converts the reconstructed image back to an analog video waveform which can be displayed on a monitor or television.
  • DAC digital-to-analog converter
  • An important element in the decoding process is the inverse discrete cosine transform which transforms the frequency domain data back to the time domain.
  • the IDCT engine is required to run at a high output rate to reconstruct the original image at real time. Furthermore, since the IDCT engine is typically located within a consumer product, cost is a major consideration. The IDCT engine needs to be designed to operate at high speed with minimal complexity.
  • the digital image compression system typically processes the video signal on a frame-by-frame basis. Each video frame is further partitioned into NxN blocks. In most compression systems, the block size is fixed by the system design to simplify the implementation of the DCT and the IDCT engines.
  • variable block sizes can enhance the performance of the compression system under certain conditions to allow for optimal compression of the image and/or to improve the quality of the reconstructed image.
  • Variable block sizes can be used to take advantages of certain characteristics of the image.
  • variable block size DCT and IDCT engines are designed with banks of transform processors of various sizes. Each processor computes a different block size transform on the same data block. The transforms from the various processors are then combined into the desired composite transformed block. This approach can be unwieldy because of the large amount of required hardware and the complexity in coordinating the various hardware blocks.
  • the present invention is a novel and improved variable block size 2- dimensional (2-D) inverse discrete cosine transform (IDCT) engine.
  • the NxN data block is transformed by columns by a first 1-D IDCT processor.
  • the intermediate results from the first IDCT processor are temporarily stored in a transposition memory. Once all the columns have been processed, the intermediate results are transformed by rows by a second 1-D IDCT processor.
  • the output from the 2nd IDCT processor comprises the transformed output of the IDCT engine.
  • each data block is either a 16x16 transform or a mix of any combinations of 8x8 transforms, 4x4 transforms, and/or 2x2 transforms.
  • a 21-bit control signal precisely describes the desired partition and informs the IDCT engine to compute the proper combination of transforms.
  • different combinations of transforms can be easily performed by correctly ordering the input data, selectively combining the data before the butterfly stages, and controlling the additions and multiplications at each stage of butterfly. The unnecessary butterflies are placed in the bypass mode.
  • Serial adders and bit-serial multipliers greatly simplify the design since the computation is performed on only one bit of data at a time.
  • Serial computations also greatly simplify the routing crossbars between successive stages of butterflies. Because of the pipelined structure of the IDCT engine of the present invention, the throughput rate is maintained at the rate of one transformed point or pixel per clock cycle. This is the same throughput rate as with parallel computations. Only the processing delay is increased because of the serial nature of the computations.
  • the data block is first transformed by columns by the 1-D IDCT processor and the intermediate results are temporarily stored by columns in a transposition memory.
  • the second 1-D transform is performed only after all the columns have been transformed.
  • the intermediate results are concurrently written to memory by columns and read from memory by rows.
  • the memory is transposed, or alternates between column major and row major, over successive NxN blocks.
  • the intermediate result is read from a memory location and a new result is written to the same memory location within the same clock cycle.
  • the transposition memory reduces the memory requirement to one bank of memory of the same size as one NxN data block.
  • FIG. 1A, IB, and 1C are diagrams of an exemplary NxN image, a partitioned image, and a tree diagram corresponding to the partitioned image of the present invention, respectively;
  • FIG. 2 is a block diagram of an exemplary variable block-size 2-D IDCT engine of the present invention
  • FIGS. 3A-D are exemplary diagrams of a 2-point IDCT trellis, a 4-point IDCT trellis, an 8-point IDCT trellis, and a 16-point IDCT trellis of the present invention, respectively;
  • FIG. 4 is a block diagram of an exemplary 1-D IDCT processor of the present invention.
  • FIG. 5A and 5B are graphical diagram of a serial butterfly and block diagram of an exemplary implementation of the serial butterfly of the present invention, respectively;
  • FIG. 6A and 6B are block diagrams of an exemplary bit-serial multiplier of the present invention in word-wide representation and bit- wide representation, respectively;
  • FIG. 7 is a block diagram of an exemplary serial adder of the present invention.
  • FIG. 8 is a block diagram of an exemplary I/O buffer of the present invention.
  • Discrete cosine transform DCT
  • IDCT inverse discrete cosine transform
  • DCT transform is typically performed on the sampled data as one of a series of digital signal processing operations. Other operations are performed on the transformed data, including quantization, data compression, and error correcting coding.
  • the IDCT transforms the data from the frequency domain back into the time domain according to the following equation:
  • the DCT and IDCT transforms are separable transforms. This means that a 2-D transform can be broken down into two 1-D transforms.
  • a 2-D IDCT transform can be performed on a data block by first performing a 1-D IDCT transform on the columns of the data block. The intermediate results from the first IDCT transform are temporarily stored in a memory element. A second IDCT transform is then performed on the rows of intermediate results. The output from the second IDCT transform comprises the reconstructed pixels of the original image.
  • FIG. 1A illustrates a diagram of an exemplary data block.
  • N is equal to 16 although the present invention can be easily extended to other values of N.
  • FIG. 2 illustrates an exemplary block diagram of the 2-D IDCT engine 10 of the present invention.
  • the input data block comprising of IDCT coefficients
  • IDCT processors 20a and 20b are identical 1-D IDCT processors which perform the IDCT transform on the input data according to equation (2).
  • the intermediate results from IDCT processor 20a are provided to memory element 22 where they are temporarily stored by columns.
  • the intermediate results are then provided by rows to IDCT processor 20b.
  • IDCT processor 20b performs the 1-D IDCT transform and provides the transformed output, or the reconstructed image, to the subsequent digital signal processing block (not shown in FIG. 2).
  • the input data block is provided to IDCT processor 20a by columns and the intermediate results are provided to IDCT processor 20b by rows.
  • the data block can be provided to IDCT processor 20a by rows and to IDCT processor 20b by columns.
  • IDCT processors 20a and 20b are pipelined such that both IDCT processors 20 are active at the same time.
  • An important property of the IDCT is that a larger transform can be created by arranging the input data points, computing the sum on selective combinations of data points, and performing serial butterfly on the output of two smaller transforms. Serial butterfly is an operation which is described in detail below.
  • a 16-point IDCT is a butterfly of two 8-point IDCTs
  • each 8-point IDCT is a butterfly of two 4-point IDCTs
  • each 4-point IDCT is a butterfly of two 2-point IDCTs.
  • This property of the IDCT is well known in the art and is best illustrated by a trellis diagram.
  • a trellis diagram of a 2- point IDCT is shown in FIG. 3A
  • a 4-point IDCT is shown in FIG. 3B
  • an 8-point IDCT is shown in FIG. 3C.
  • the 2-point IDCT comprises a single butterfly stage. As shown in FIGS.
  • the 4-point IDCT comprises a stage of two 2-point IDCTs, a stage of serial add before the 2-point IDCT stage, and a butterfly stage after the 2-point IDCT stage.
  • the 8-point IDCT comprises a stage of two 4-point IDCTs, a stage of serial add before the 4-point IDCT stage, and a butterfly stage after the 4-point IDCT stage.
  • FIG. 3D The diagram of the 16-point IDCT trellis 100 of the present invention is illustrated in FIG. 3D.
  • Trellis 100 is derived by B.G. Lee and described in detail in a book by K.R. Rao, entitled “Discrete Cosine Transform : Algorithms, Advantages, and Applications", Academic Press, 1990.
  • the 16- point IDCT trellis 100 comprises three stages of serial add and four stages of serial butterfly, with each stage of butterfly comprising eight serial butterflies.
  • the interconnects between successive stages are fixed, thus limiting the IDCT processor to performing only 16-point IDCT transforms.
  • the stages are interconnected using reconfigurable trellis cross-connects.
  • 16-point IDCT 110 is a butterfly of two 8-point IDCTs 108, with the data points to the lower 8-point IDCT selectively combined.
  • Two 8-point IDCTs 108 is a butterfly of four 4-point IDCTs 106, with the data points to the lower 4-point IDCTs selectively combined.
  • Four 4-point IDCTs 106 is a butterfly of eight 2-point IDCTs 104, with the data points to the lower 2-point IDCTs selectively combined.
  • the reconfigurable trellis cross-connects in combination with the bypass mode of the serial butterflies allow the 2-D IDCT engine 10 of the present invention to compute any arbitrary mix of transforms within an NxN block.
  • IDCT processor 20 can perform two 8-point transforms, eight 2-point transforms, one 8-point transform and two 4-point transforms, or one 8-point, one 4-point and two 2-point transforms.
  • IDCT processor 20 can perform two 8-point transforms, eight 2-point transforms, one 8-point transform and two 4-point transforms, or one 8-point, one 4-point and two 2-point transforms.
  • the serial butterflies operate on two input bit streams and provides two output bit streams.
  • the serial butterfly comprises one greatly simplified bit-serial multiplier and two serial adders.
  • the serial structure of IDCT processors 20 permits the routing crossbars between successive stages of serial butterfly to be implemented using only 1- bit wide data buses.
  • IDCT engine 10 computes the transform of the 16x16 block in 256 clock cycles. For each clock cycle, one IDCT coefficient is supplied to IDCT engine 10 and one output pixel is extracted from IDCT engine 10.
  • IDCT processors 20a and 20b are pipelined such that both processors are active concurrently. Each IDCT processor 20 receives one input data point and provides one transformed data point for each clock cycle.
  • I/O buffers 52 can be implemented as a parallel-to-serial shift registers as described below.
  • Serial adders 56 receive the data bits and perform serial additions of the bits in the manner described below.
  • Serial adders 56 are enabled by ADD_ENABLE which, in the exemplary embodiment, comprises 7-bits and corresponds to the first three stages of add shown in trellis 100 in FIG. 3D. Each serial add is represented by encircled dots 112 (only one encircled dot is labeled for simplicity). In the first stage, there are seven serial adds 112 which require four bits to enable/disable. In the second stage, there are two sets of three serial adds 112 which require two bits to control.
  • serial adders 56 can be controlled to compute the serial adds 112 as required by the first three stages of trellis 100 in FIG. 3D.
  • the bank of 16 serial adders 56 in FIG. 4 symbolically represents the functionality required by the first three stages of trellis 100.
  • serial adders 56 are provided to the first stage 58 of eight serial butterflies which perform the functions shown within eight 2- point IDCTs 104 in FIG. 3D.
  • Each serial butterfly is shown by the block diagram in FIG. 5B.
  • An exemplary implementation of the serial butterfly is described in detail below.
  • the first stage 58 of serial butterfly is always enabled such that IDCT processor 20 performs at least 2-point transforms.
  • the outputs from the first stage 58 are provided to the second stage 62 of serial butterfly through routing crossbar 60. Routing crossbar 60 interconnects the first two stages as shown by IDCT trellis 100.
  • the second stage 62 of serial butterfly can be selectively enabled to provide 4-point transforms. Since there are four sets of butterfly (see FIG. 3D), a 4-bit control is necessary to individually enable each set. The 4-bit control is part of the control signal labeled as MAP in FIG. 4.
  • the outputs from the second stage 62 of serial butterfly are provided to the third stage 66 through routing crossbar 64.
  • the third stage 66 comprises two sets of four serial butterflies as shown within two 8-point IDCTs 108 in FIG. 3D. Each set can be individually enabled by a 2-bit control.
  • the outputs from the third stage 66 are provided to the fourth stage 70 through routing crossbar 68.
  • the fourth stage 70 comprises one set of eight serial butterflies as shown within 16-point IDCTs 110 in FIG. 3D.
  • the serial butterflies can be selectively enabled by a 1-bit control.
  • the serial transformed data from the fourth stage 70 comprises the output from IDCT processor 20.
  • the 1-bit serial transformed data from the fourth stage 70 is routed to a bank of serial-to-parallel output buffers.
  • IDCT processor 20 provides the IDCT output in word serial fashion such that one output word is provided for each clock cycle.
  • the output buffers can be combined with the input buffers to form I/O buffer 52 as described in detail below.
  • controller 26 provides the control signals to IDCT processors 20a and 20b and memory element 22. These control signals synchronize IDCT processors 20a and 20b and memory element 22 and determine the reconstructed composite image. Controller 26 receives the ADDRESS input and the PQR input. The ADDRESS input informs controller 26 of the start of the data block. The PQR input comprises the three commands P, Q, and R which inform controller 26 of the desired block partition.
  • R equals to "1" indicates that the 16x16 block is to be divided into smaller 8x8 transform blocks
  • Q indicates that the 8x8 block is to be divided into smaller 4x4 transform blocks
  • P indicates that the 4x4 block is to be divided into smaller 2x2 transform blocks.
  • each block can be individually divided without regard to other blocks in the image.
  • 1-bit control is required for R since there is only one 16x16 transform block within the 16x16 data block
  • 4-bit control is required for Q since there can be four 8x8 transform blocks within the 16x16 data block
  • 16-bit control is required for P since there can be sixteen 4x4 transform blocks within the 16x16 data block.
  • FIG. IB An exemplary partition of a data block 4 is shown in FIG. IB and an exemplary graphical illustration, e.g. tree diagram 6, of the PQR control corresponding to the image partition is shown in FIG. IC.
  • the 21-bit control PQR can be provided to controller 26 serially or in parallel.
  • the PQR input is a 2-D representation of the desired block partition.
  • Controller 26 parses the PQR input into 1-D column and row control signals. These column and row control signals are then used to generate the control signals to command IDCT processors 20a and 20b to perform the proper mix of transforms.
  • controller 26 commands IDCT processor 20a to perform two 4-point transforms and one 8- point transform for the first four columns of data.
  • controller 26 commands IDCT processor 20a to perform one 4-point transform, two 2-point transforms, and one 8-point transform. The process continues until all columns are processed.
  • the intermediate results from IDCT processor 20a are stored by columns in memory element 22.
  • controller 26 commands IDCT processor 20b to perform the proper mix of transforms on the rows of intermediate results from memory element 22. After all columns have been processed by IDCT processor 20a, controller 26 commands IDCT processor 20b to perform two 4- point transforms and one 8-point transform for the first four rows of intermediate results. For the next two rows, controller 26 commands IDCT processor 20b to perform one 4-point transform, two 2-point transforms, and one 8-point transform. Again, the process continues until all rows are processed.
  • control signals generated by controller 26 for IDCT processors 20a and 20b include WRITE_ENABLE, READ_ENABLE, ADD_ENABLE, and MAP.
  • WRITE-ENABLE controls the writing of the input data points into the proper I/O buffers 52 such that the input data points are arranged in the correct order (see FIG. 3D).
  • READ_ENABLE controls the order in which the transformed data is read from IDCT processor 20. In the exemplary embodiment, the transformed data can be sequentially read from IDCT processor 20.
  • ADD_ENABLE controls the first set of serial adders 56 which perform the adds in the first three stages of trellis 100. ADD_ENABLE is dependent on the desired mix of transforms and is generated in accordance with the PQR input.
  • MAP controls the last three stages 62, 66 and 70 of serial butterfly to produce the desired mix of transforms. MAP is also generated in accordance with the PQR input.
  • Four control bits are needed for the second stage 62 to individually enable or disable each of the four sets of butterfly (see FIG. 3D).
  • two control bits are needed for the third stage 66 and one control bit is needed for the fourth stage 70.
  • no control signal is needed for the first stage 58 since IDCT processor 20 always performs at least a 2- point transform.
  • a control signal can be generated to provide bypass of the first stage 58 if this is desired or necessary. Since the 2-D transform of the present invention is performed serially using two 1-D transforms, controller 26 delays the control signals to IDCT processor 20b, relative to IDCT processor 20a, to synchronize the control signals with the input data.
  • Controller 26 can be implemented as a combination of combinatory logic and state machine. Alternatively, controller 26 can be implemented with a micro-controller or a micro-processor running a microcode. Different implementations of controller 26 to perform the functions as described herein are within the scope of the present invention.
  • memory element 22 can be implemented as a transposition memory.
  • a 2-D transform is achieved by performing a 1-D transform on the columns of the input data block, storing the intermediate results, and performing a 1-D transform on the rows of the intermediate results.
  • the 1-D transform on the rows cannot be performed until all columns have been transformed.
  • the two 1-D transforms are pipelined such that both operate concurrently.
  • Memory element 22 can be implemented as a block of memory as illustrated in FIG. 1A. Assume that the intermediate results from IDCT processor 20a are initially written by columns to memory element 22. IDCT processor 20b cannot operate on the rows of intermediate results until all columns are operated on by IDCT processor 20a. Once the last column of memory element 22 is filled, the intermediate results are provided to IDCT processor 20b in rows. However, because of the pipelined structure, IDCT processor 20a provides one column of data for each row of data retrieved by IDCT processor 20b. This column of data cannot be written over a previous column since some data points in the previous column are still needed by IDCT processor 20b. To resolve this problem, the new column of intermediate results is written over the row of data just retrieved by IDCT processor 20b.
  • memory element 22 can be implemented with read- modify-write capability such that the same memory location can be read from and written to on the same clock cycle.
  • one data point can be read from one location of memory element 22 by IDCT processor 20b and that same location can be written to by IDCT processor 20a.
  • memory element 22 is transposed, or alternates between column major and row major, over successive 16x16 blocks. The transposition reduces the memory requirement to only one bank of memory.
  • the control signals to implement memory element 22 as a transposition memory are provided by controller 26. Controller 26 has the necessary timing information and can synchronize IDCT processors 20a and 20b and memory element 22 with the input data block.
  • Memory element 22 can be implemented using storage element or one of any number of memory devices that are known in the art, such as RAM memory devices, latches, or other types of memory devices.
  • FIG. 5A is a graphical illustration of the serial butterfly and FIG. 5B is a block diagram of the same serial butterfly.
  • Serial butterfly 140 operates on two inputs XI and X2. Input XI is delayed by delay element 148 to align the top and bottom signal paths.
  • serial adders 160a and 160b The outputs from delay element 148 and multipliers 150 are provided to serial adders 160a and 160b.
  • Serial adder 160a adds the output from multiplier 150 to the output from delay element 148 and serial adder 160b subtracts the output from multiplier 150 from the output from delay element 148.
  • the outputs from the serial adders 160a and 160b comprise the serial butterfly outputs Zl and Z2, respectively.
  • serial adders 160a and 160b are designed such that the adders can be turned off to allow Yl and Y2 to pass through as Zl and Z2, respectively.
  • serial butterfly 140 operates on two input bit streams and provides two output bit streams.
  • FIGS. 6A and 6B An exemplary block diagram of bit-serial multiplier 150 is shown in FIGS. 6A and 6B.
  • FIG. 6A illustrates bit-serial multiplier 150 in a word-wide representation
  • FIG. 6B illustrates the same multiplier 150 in a bit-wide representation.
  • a bit-serial multiply of X with C is achieved by successively adding C to the intermediate product term and shifting the result by one binary bit position. This is shown by the block diagram in FIG. 6A.
  • Latch 212 is cleared by the LD signal, which is enabled for one cycle out of every 16 clock cycles, to prepare latch 212 for the next multiply.
  • the LD signal also loads parallel-to-serial shift register 214 with the product term of the just completed multiply from adder 210. The product term is then shifted out serially from register 214 during the next multiply.
  • the precision of the input data X, the constant C, and the product Y is 16-bit.
  • 16-bit of precision results in less arithmetic error than that specified by "IEEE Standard 1180-1990 : Specifications for the Implementations of 8x8 Inverse Discrete Cosine Transform.”
  • the 16-bit representation can comprise 1 sign bit, 9 magnitude bits, and 6 fractional bits. Other representations of less than 16-bit or more than 16-bit can be contemplated and are within scope of the present invention.
  • adder 210, latch 212, and register 214 are all implemented with 16-bit. For each clock cycle, one bit of X is shifted into bit-serial multiplier 150, with the LSB bit first. Depending on the value of the input bit and the LD signal, the constant C is added to the intermediate product term which is stored in latch 212.
  • AND gate 204 determines whether C is to be added to the intermediate product term based on the input bit and the LD signal.
  • the intermediate product term from adder 210 is then shifted by one bit position and stored back into latch 212, in bit positions D[14..0].
  • bit-serial multiplier 150 can be implemented with the same amount of hardware as an accumulator, which is very compact for IC design.
  • Bit-serial multiplier 150 is shown in further detail in FIG. 6B.
  • Adder 210, latch 212, and register 214 are shown in bit form. Depending on the value of input bit X and the LD signal, the constant C can be added to the intermediate product term which is stored in latch 212.
  • Each adder 210 receives a carry-in (Ci) input from latch 212 of the next less significant bit and provides a carry-out (Co) output to adder 210 of the next more significant bit. This is a standard carry chain of an adder.
  • the simple truncation of the LSB bits produces a slight negative bias on the twos complement output product term.
  • This slight negative bias can be offset by adding an LSB on the second to last adder 210a, which produces a half LSB of positive offset in the output product term.
  • the overall offset can be minimized by alternating the truncation and positive offset over successive multipliers 150.
  • the offset is controlled by the ROUND signal which can be hardwired high or low depending on the desired result.
  • An exemplary block diagram of serial adder 160 is shown in FIG. 7A.
  • Serial adder 160 serially receives two inputs Yl and Y2, with the LSB bit first.
  • the bypass mode allows IDCT processor 20 of the present invention to perform different mix of transforms.
  • the inputs Yl and Y2 are provided serially to AND gate 240 and XOR gate 242, respectively.
  • ADD_EN is also provided to AND gate 240. When ADD_EN is low, the output from AND gate 240 is low and Yl is not provided to adder 244. When ADD_EN is high, Yl is provided to adder 244. INVERT signal is provided to XOR gate 242 and register 246. To perform a subtraction, the input Y2 is converted to a negative number and added to the other operand. Conversion of a twos complement number to a negative number requires inverting all the bits in the original number and adding a one to the LSB bit position.
  • the inversion of the bits is performed by XOR gate 242 when the INVERT signal is high.
  • the one is added to the LSB bit position of the input number by storing a one at the start of the serial add, when the LD signal is enabled the INVERT signal is high, and adding this value to the carry-in (Ci) of adder 244.
  • the carry-out (Co) from the prior 1-bit add is stored in register 246.
  • the carry-out is added with the next set of bits from the two inputs Yl and Y2.
  • the sum output S from adder 244 represents the output from serial adder 160.
  • the constant C can be hardwired or mask programmable. Since the first stage 58 of butterfly is always performed in the exemplary embodiment, the constant C for bit-serial multipliers 150 for this stage can be hardwired. However, for the remaining stages 62, 66, and 70 of butterfly, the constant C can be mask programmable to allow multipliers 150 to perform multiplies of the input X2 with either 1/(2C*) or 1, when serial butterfly 140 is placed in the bypass mode. Multipliers 150 can also be loaded with other values of C to perform scaling or normalization of the input X2.
  • serial adder 160 can perform adds, subtracts, or bypass of the two inputs.
  • Serial adder 160 can be modified to perform the functions as required by serial butterfly 140. For example, referring to FIG.
  • serial adder 160a only performs adds or bypass. Therefore, serial adder
  • serial adder 160b only performs subtracts or bypass. Therefore, the
  • INVERT signal in serial adder 160 can be tied to a high reference.
  • Serial adder 160 can also be used to perform serial adds and bypass as required by serial adders 56 in FIG. 4 which implement the serial adds 112 required by the first three stages of trellis 100 as shown in FIG. 3D.
  • delay element 148 can be implemented with a series of latches. The number of latches is selected to match the processing delay of multiplier 150. V. I/O Buffer
  • a bank of 16 I/O buffers 52 receives the input data and provides the transformed data.
  • the input and output from IDCT processor 20 are provided in a word serial manner, or one complete data point per clock cycle.
  • the 16 data points are loaded into the 16 I/O buffers 52 in 16 clock cycles. Once all I/O buffers 52 are loaded, the 16 data points are provided to the IDCT trellis in a bit serial manner, one bit per clock cycle.
  • I/O buffers 52 also receive the transformed data bits from the final stage 70 of serial butterfly. The transformed data is provided serially to I/O buffer 52,
  • I/O buffer 52 comprises 16-bit latch 262, 16-bit parallel-to-serial shift register 264, 16-bit latch 266, and output buffer 268.
  • the IDCT input is provided to all latches 262 within the 16 I/O buffers 52.
  • Each I/O buffer 52 latches the IDCT input when directed by the control signal WR(w).
  • WR(w) is decoded from the WRITE_ENABLE signal which originates from controller 26.
  • Latch 262 within each I/O buffer 52 is enabled for only one out of every sixteen clock cycles. After the 16 data points have been latched by latches 262, the LD signal is enabled and the values stored in latches 262 are provided to registers 264.
  • the LSB register 264a serially shifts the data point to routing crossbar 54, one bit per clock cycle with the LSB bit shifted out first.
  • one transformed data bit is serially shifted into the MSB register 264q, one bit per clock cycle with the LSB shifted in first.
  • the LD signal loads register 264 with the next data point and loads latch 266 with the transformed data point. The transformed data is stored in latch 266 until it is read out through output buffer 268.
  • Output buffer 268 is selectively enabled such that the transformed data is provided serially from the 16 I/O buffers 52, one transformed data point per clock cycle.
  • the order of the read is controlled by the RD(w) signal which is decoded from READ_ENABLE.
  • the block diagram in FIG. 8 represents one implementation of I/O buffer 52.
  • Other implementations which perform the same functions as described herein can be contemplated and are within the scope of the present invention.
  • the present invention has been described in the context of a 2-D IDCT engine, the inventive concept can be extended to other transforms such as discrete Fourier transform (DFT), inverse discrete Fourier transform (IDFT), fast Fourier transform (FFT), inverse fast Fourier transform (IFFT), discrete cosine transform (DCT), and Hadamard transform. Therefore, application of the inventive concept as described herein to other transforms are within the scope of the present invention.
  • DFT discrete Fourier transform
  • IDFT inverse discrete Fourier transform
  • FFT fast Fourier transform

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Discrete Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

A variable block size 2-D IDCT engine (10) which can compute any arbitrary mix of transforms. A first 1-D IDCT processor (20a) computes the transform of the data block by columns and stores the intermediate results in a transposition memory. A second 1-D IDCT processor (20b) computes the transform of the intermediate results by rows. Different mix of transforms can be easily performed by correctly ordering the input data, selectively combining the input data before the butterfly stages, and controlling the additions and multiplications at each stage of butterfly. The unnecessary butterflies are placed in the bypass mode. The butterflies can be implemented with serial adders (56) and bit-serial multipliers to greatly simplify the hardware design and minimize the routing requirements between successive stages of butterfly. The fully pipelined structure allows the IDCT engine to maintain a throughput rate of one pixel per clock cycle.

Description

VARIABLE BLOCK SIZE 2-DIMENSIONAL INVERSE DISCRETE COSINE TRANSFORM ENGINE
BACKGROUND OF THE INVENTION
I. Field of the Invention
The present invention relates to digital signal processing. More particularly, the present invention relates to a novel and improved variable block size 2-dimensional (2-D) inverse discrete cosine transform (IDCT) engine.
II. Description of the Related Art The 2-dimensional discrete cosine transform (IDCT) and inverse discrete cosine transform (IDCT) are important signal processing operations in digital image compression. One such digital image compression application is in the area of high definition television (HDTV). In HDTV, the analog video waveform is conditioned and digitized by an analog-to- digital converter (ADC). The resultant sampled data is then digitally processed to minimize the amount of data which must be transmitted and/or stored while retaining high picture quality. Typically, a key element of the compression process is the 2-D discrete cosine transform wherein an NxN block of sampled data, or an image, is transformed from the time domain to the frequency domain. The transformed data can be further processed with block codes, such as Huffman code, run length codes, and /or error correcting codes, such as convolutional codes and Reed-Solomon codes. An exemplary HDTV image compression scheme is disclosed in U.S. Patent No. 5,452,104, U.S. Patent No. 5,107,345, and U.S. Patent No. 5,021,891, all three entitled "ADAPTIVE BLOCK SIZE IMAGE COMPRESSION METHOD AND SYSTEM", and U.S. Patent No. 5,576,767, entitled TNTERFRAME VIDEO ENCODING AND DECODING SYSTEM", all four patents are assigned to the assignee of the present invention and incorporated by reference herein. The digitally encoded video waveform is transmitted and/or stored.
At the receiver, the reverse of the digital signal processing is performed to reconstruct the pixels of the original image. The recovered image is provided to a digital-to-analog converter (DAC) which converts the reconstructed image back to an analog video waveform which can be displayed on a monitor or television. An important element in the decoding process is the inverse discrete cosine transform which transforms the frequency domain data back to the time domain. The IDCT engine is required to run at a high output rate to reconstruct the original image at real time. Furthermore, since the IDCT engine is typically located within a consumer product, cost is a major consideration. The IDCT engine needs to be designed to operate at high speed with minimal complexity.
The digital image compression system typically processes the video signal on a frame-by-frame basis. Each video frame is further partitioned into NxN blocks. In most compression systems, the block size is fixed by the system design to simplify the implementation of the DCT and the IDCT engines.
Permitting variable block sizes can enhance the performance of the compression system under certain conditions to allow for optimal compression of the image and/or to improve the quality of the reconstructed image. Variable block sizes can be used to take advantages of certain characteristics of the image. In the prior art, variable block size DCT and IDCT engines are designed with banks of transform processors of various sizes. Each processor computes a different block size transform on the same data block. The transforms from the various processors are then combined into the desired composite transformed block. This approach can be unwieldy because of the large amount of required hardware and the complexity in coordinating the various hardware blocks.
SUMMARY OF THE INVENTION
The present invention is a novel and improved variable block size 2- dimensional (2-D) inverse discrete cosine transform (IDCT) engine. In accordance with the present invention, the NxN data block is transformed by columns by a first 1-D IDCT processor. The intermediate results from the first IDCT processor are temporarily stored in a transposition memory. Once all the columns have been processed, the intermediate results are transformed by rows by a second 1-D IDCT processor. The output from the 2nd IDCT processor comprises the transformed output of the IDCT engine.
It is an object of the present invention to provide a 2-D IDCT engine capable of computing any arbitrary mix of transforms within an NxN data block. In the exemplary embodiment, each data block is either a 16x16 transform or a mix of any combinations of 8x8 transforms, 4x4 transforms, and/or 2x2 transforms. In the exemplary embodiment, a 21-bit control signal precisely describes the desired partition and informs the IDCT engine to compute the proper combination of transforms. In the present invention, different combinations of transforms can be easily performed by correctly ordering the input data, selectively combining the data before the butterfly stages, and controlling the additions and multiplications at each stage of butterfly. The unnecessary butterflies are placed in the bypass mode.
It is another object of the present invention to simplify the design of the 2-D IDCT engine by providing for serial computations. Serial adders and bit-serial multipliers greatly simplify the design since the computation is performed on only one bit of data at a time. Serial computations also greatly simplify the routing crossbars between successive stages of butterflies. Because of the pipelined structure of the IDCT engine of the present invention, the throughput rate is maintained at the rate of one transformed point or pixel per clock cycle. This is the same throughput rate as with parallel computations. Only the processing delay is increased because of the serial nature of the computations.
It is yet another object of the present invention to minimize the memory requirement. For a 2-D IDCT, the data block is first transformed by columns by the 1-D IDCT processor and the intermediate results are temporarily stored by columns in a transposition memory. The second 1-D transform is performed only after all the columns have been transformed. Because of the pipelined structure of the IDCT engine, the intermediate results are concurrently written to memory by columns and read from memory by rows. To avoid writing over memory locations containing data which is needed later, the memory is transposed, or alternates between column major and row major, over successive NxN blocks. Using a read- modify-write cycle, the intermediate result is read from a memory location and a new result is written to the same memory location within the same clock cycle. The transposition memory reduces the memory requirement to one bank of memory of the same size as one NxN data block.
BRIEF DESCRIPTION OF THE DRAWINGS
The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein: FIG. 1A, IB, and 1C are diagrams of an exemplary NxN image, a partitioned image, and a tree diagram corresponding to the partitioned image of the present invention, respectively;
FIG. 2 is a block diagram of an exemplary variable block-size 2-D IDCT engine of the present invention;
FIGS. 3A-D are exemplary diagrams of a 2-point IDCT trellis, a 4-point IDCT trellis, an 8-point IDCT trellis, and a 16-point IDCT trellis of the present invention, respectively;
FIG. 4 is a block diagram of an exemplary 1-D IDCT processor of the present invention;
FIG. 5A and 5B are graphical diagram of a serial butterfly and block diagram of an exemplary implementation of the serial butterfly of the present invention, respectively;
FIG. 6A and 6B are block diagrams of an exemplary bit-serial multiplier of the present invention in word-wide representation and bit- wide representation, respectively;
FIG. 7 is a block diagram of an exemplary serial adder of the present invention; and
FIG. 8 is a block diagram of an exemplary I/O buffer of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) are important complementary digital signal processing operations. The DCT transforms the sampled data from the time domain into the frequency domain according to the following equation:
V/I , 2C(k) ^A , . π(2n + ϊ)k
X(k) = — ^_Z > χ(ή) • cos— i '— , (1)
N 2N
where Ν is the dimension of the transform, C(0) = 1/ Λ/2 and C(k) = 1 for k=l, 2, 3...Ν-1. DCT transform is typically performed on the sampled data as one of a series of digital signal processing operations. Other operations are performed on the transformed data, including quantization, data compression, and error correcting coding. A discussion of an exemplary digital image compression technique is described in detail in the aforementioned U.S. Patent No. 5,452,104.
The IDCT transforms the data from the frequency domain back into the time domain according to the following equation:
x(n) = ∑ C(k) • *(*) • cos * 2" * 1** . (2)
*=o N
The DCT and IDCT transforms are separable transforms. This means that a 2-D transform can be broken down into two 1-D transforms. A 2-D IDCT transform can be performed on a data block by first performing a 1-D IDCT transform on the columns of the data block. The intermediate results from the first IDCT transform are temporarily stored in a memory element. A second IDCT transform is then performed on the rows of intermediate results. The output from the second IDCT transform comprises the reconstructed pixels of the original image.
Referring to the figures, FIG. 1A illustrates a diagram of an exemplary data block. Data block 2 is of size NxN where N is a power of two, or N=2 where x is an integer 1, 2, 3... When this condition is satisfied, equations (1) and (2) can be simplified significantly. In the exemplary embodiment, N is equal to 16, although the present invention can be easily extended to other values of N.
FIG. 2 illustrates an exemplary block diagram of the 2-D IDCT engine 10 of the present invention. In the exemplary embodiment, the input data block, comprising of IDCT coefficients, is provided by columns to IDCT processor 20a. IDCT processors 20a and 20b are identical 1-D IDCT processors which perform the IDCT transform on the input data according to equation (2). The intermediate results from IDCT processor 20a are provided to memory element 22 where they are temporarily stored by columns. The intermediate results are then provided by rows to IDCT processor 20b. IDCT processor 20b performs the 1-D IDCT transform and provides the transformed output, or the reconstructed image, to the subsequent digital signal processing block (not shown in FIG. 2). In the exemplary embodiment, the input data block is provided to IDCT processor 20a by columns and the intermediate results are provided to IDCT processor 20b by rows. Alternatively, the data block can be provided to IDCT processor 20a by rows and to IDCT processor 20b by columns. In the exemplary embodiment, IDCT processors 20a and 20b are pipelined such that both IDCT processors 20 are active at the same time. An important property of the IDCT is that a larger transform can be created by arranging the input data points, computing the sum on selective combinations of data points, and performing serial butterfly on the output of two smaller transforms. Serial butterfly is an operation which is described in detail below. Thus, a 16-point IDCT is a butterfly of two 8-point IDCTs, each 8-point IDCT is a butterfly of two 4-point IDCTs, and each 4-point IDCT is a butterfly of two 2-point IDCTs. This property of the IDCT is well known in the art and is best illustrated by a trellis diagram. A trellis diagram of a 2- point IDCT is shown in FIG. 3A, a 4-point IDCT is shown in FIG. 3B, and an 8-point IDCT is shown in FIG. 3C. The 2-point IDCT comprises a single butterfly stage. As shown in FIGS. 3A-3B, the 4-point IDCT comprises a stage of two 2-point IDCTs, a stage of serial add before the 2-point IDCT stage, and a butterfly stage after the 2-point IDCT stage. Similarly, as shown in FIGS. 3B-3C, the 8-point IDCT comprises a stage of two 4-point IDCTs, a stage of serial add before the 4-point IDCT stage, and a butterfly stage after the 4-point IDCT stage.
The diagram of the 16-point IDCT trellis 100 of the present invention is illustrated in FIG. 3D. Trellis 100 is derived by B.G. Lee and described in detail in a book by K.R. Rao, entitled "Discrete Cosine Transform : Algorithms, Advantages, and Applications", Academic Press, 1990. The 16- point IDCT trellis 100 comprises three stages of serial add and four stages of serial butterfly, with each stage of butterfly comprising eight serial butterflies. In the IDCT processors of the prior art, the interconnects between successive stages are fixed, thus limiting the IDCT processor to performing only 16-point IDCT transforms. In the present invention, the stages are interconnected using reconfigurable trellis cross-connects.
As shown in FIG. 3D, 16-point IDCT 110 is a butterfly of two 8-point IDCTs 108, with the data points to the lower 8-point IDCT selectively combined. Two 8-point IDCTs 108 is a butterfly of four 4-point IDCTs 106, with the data points to the lower 4-point IDCTs selectively combined. Four 4-point IDCTs 106 is a butterfly of eight 2-point IDCTs 104, with the data points to the lower 2-point IDCTs selectively combined. The reconfigurable trellis cross-connects in combination with the bypass mode of the serial butterflies allow the 2-D IDCT engine 10 of the present invention to compute any arbitrary mix of transforms within an NxN block. By correctly ordering the input data, selectively combining the input data before the butterfly stages, and controlling the additions and multiplications at each stage of the trellis, any combination of 2-point, 4-point, 8-point, and 16-point IDCT transforms can be performed. For example, IDCT processor 20 can perform two 8-point transforms, eight 2-point transforms, one 8-point transform and two 4-point transforms, or one 8-point, one 4-point and two 2-point transforms. In the present invention, there is no need for a combining stage to assemble the different transform output into a composite transformed block since this happens automatically when IDCT engine 10 has been configured to do the appropriate mix of transforms. Any time a smaller transform is computed, the serial adders and butterflies not needed to do the higher order transforms revert to delay latches. Thus, the outputs from each IDCT processor 20 are time aligned regardless of the transform mix.
In the exemplary embodiment, the serial butterflies operate on two input bit streams and provides two output bit streams. The serial butterfly comprises one greatly simplified bit-serial multiplier and two serial adders. The serial structure of IDCT processors 20 permits the routing crossbars between successive stages of serial butterfly to be implemented using only 1- bit wide data buses.
In the exemplary embodiment, IDCT engine 10 computes the transform of the 16x16 block in 256 clock cycles. For each clock cycle, one IDCT coefficient is supplied to IDCT engine 10 and one output pixel is extracted from IDCT engine 10. IDCT processors 20a and 20b are pipelined such that both processors are active concurrently. Each IDCT processor 20 receives one input data point and provides one transformed data point for each clock cycle.
I. IDCT Processor
An exemplary block diagram of IDCT processor 20 of the present invention is shown in FIG. 4. Every 16 (N=16) clock cycles, the 16 I/O buffers 52 receive 16 input data points, one data point per clock cycle and one data point per I/O buffer 52. The order in which the data points are loaded into I/O buffers 52 depends on the mix of transforms being performed and is controlled by controller 26 through the 4-bit WRITE_ENABLE signal. Each data point comprises q-bits which are loaded in parallel into the proper I/O buffer 52 based on WRITE_ENABLE. I/O buffers 52 then serially shift the 16 data points out together, one bit per clock cycle with the LSB first, through routing crossbar 54 to serial adders 56. I/O buffers 52 can be implemented as a parallel-to-serial shift registers as described below. Serial adders 56 receive the data bits and perform serial additions of the bits in the manner described below. Serial adders 56 are enabled by ADD_ENABLE which, in the exemplary embodiment, comprises 7-bits and corresponds to the first three stages of add shown in trellis 100 in FIG. 3D. Each serial add is represented by encircled dots 112 (only one encircled dot is labeled for simplicity). In the first stage, there are seven serial adds 112 which require four bits to enable/disable. In the second stage, there are two sets of three serial adds 112 which require two bits to control. And in the third stage, there are four sets of single serial add 112 which require one bit to control. With the 7-bit ADD_ENABLE signal, serial adders 56 can be controlled to compute the serial adds 112 as required by the first three stages of trellis 100 in FIG. 3D. The bank of 16 serial adders 56 in FIG. 4 symbolically represents the functionality required by the first three stages of trellis 100.
The outputs from serial adders 56 are provided to the first stage 58 of eight serial butterflies which perform the functions shown within eight 2- point IDCTs 104 in FIG. 3D. Each serial butterfly is shown by the block diagram in FIG. 5B. The serial butterfly receives two serial stream inputs, XI and X2, and produces two serial outputs Z1=X1+C»X2 and Z2=X1-C»X2, where C is a fixed scalar which is defined according to the position of the butterfly within IDCT trellis 100. An exemplary implementation of the serial butterfly is described in detail below. In the exemplary embodiment, the first stage 58 of serial butterfly is always enabled such that IDCT processor 20 performs at least 2-point transforms. The outputs from the first stage 58 are provided to the second stage 62 of serial butterfly through routing crossbar 60. Routing crossbar 60 interconnects the first two stages as shown by IDCT trellis 100. In the exemplary embodiment, the second stage 62 of serial butterfly can be selectively enabled to provide 4-point transforms. Since there are four sets of butterfly (see FIG. 3D), a 4-bit control is necessary to individually enable each set. The 4-bit control is part of the control signal labeled as MAP in FIG. 4.
The outputs from the second stage 62 of serial butterfly are provided to the third stage 66 through routing crossbar 64. The third stage 66 comprises two sets of four serial butterflies as shown within two 8-point IDCTs 108 in FIG. 3D. Each set can be individually enabled by a 2-bit control. The outputs from the third stage 66 are provided to the fourth stage 70 through routing crossbar 68. The fourth stage 70 comprises one set of eight serial butterflies as shown within 16-point IDCTs 110 in FIG. 3D. The serial butterflies can be selectively enabled by a 1-bit control. The serial transformed data from the fourth stage 70 comprises the output from IDCT processor 20.
The 1-bit serial transformed data from the fourth stage 70 is routed to a bank of serial-to-parallel output buffers. In the exemplary embodiment, IDCT processor 20 provides the IDCT output in word serial fashion such that one output word is provided for each clock cycle. The output buffers can be combined with the input buffers to form I/O buffer 52 as described in detail below.
II. Controller
Referring to FIG. 2, controller 26 provides the control signals to IDCT processors 20a and 20b and memory element 22. These control signals synchronize IDCT processors 20a and 20b and memory element 22 and determine the reconstructed composite image. Controller 26 receives the ADDRESS input and the PQR input. The ADDRESS input informs controller 26 of the start of the data block. The PQR input comprises the three commands P, Q, and R which inform controller 26 of the desired block partition. In the exemplary embodiment, R equals to "1" indicates that the 16x16 block is to be divided into smaller 8x8 transform blocks, Q equals to "1" indicates that the 8x8 block is to be divided into smaller 4x4 transform blocks, and P equals to "1" indicates that the 4x4 block is to be divided into smaller 2x2 transform blocks. In the exemplary embodiment, each block can be individually divided without regard to other blocks in the image. Thus, 1-bit control is required for R since there is only one 16x16 transform block within the 16x16 data block, 4-bit control is required for Q since there can be four 8x8 transform blocks within the 16x16 data block, and 16-bit control is required for P since there can be sixteen 4x4 transform blocks within the 16x16 data block. An exemplary partition of a data block 4 is shown in FIG. IB and an exemplary graphical illustration, e.g. tree diagram 6, of the PQR control corresponding to the image partition is shown in FIG. IC. The 21-bit control PQR can be provided to controller 26 serially or in parallel.
The PQR input is a 2-D representation of the desired block partition. Controller 26 parses the PQR input into 1-D column and row control signals. These column and row control signals are then used to generate the control signals to command IDCT processors 20a and 20b to perform the proper mix of transforms. For the exemplary partition 4 shown in FIG. IB, controller 26 commands IDCT processor 20a to perform two 4-point transforms and one 8- point transform for the first four columns of data. For the next two columns of data, controller 26 commands IDCT processor 20a to perform one 4-point transform, two 2-point transforms, and one 8-point transform. The process continues until all columns are processed. The intermediate results from IDCT processor 20a are stored by columns in memory element 22. In the similar manner, controller 26 commands IDCT processor 20b to perform the proper mix of transforms on the rows of intermediate results from memory element 22. After all columns have been processed by IDCT processor 20a, controller 26 commands IDCT processor 20b to perform two 4- point transforms and one 8-point transform for the first four rows of intermediate results. For the next two rows, controller 26 commands IDCT processor 20b to perform one 4-point transform, two 2-point transforms, and one 8-point transform. Again, the process continues until all rows are processed.
Referring to FIG. 4, the control signals generated by controller 26 for IDCT processors 20a and 20b include WRITE_ENABLE, READ_ENABLE, ADD_ENABLE, and MAP. WRITE-ENABLE controls the writing of the input data points into the proper I/O buffers 52 such that the input data points are arranged in the correct order (see FIG. 3D). READ_ENABLE controls the order in which the transformed data is read from IDCT processor 20. In the exemplary embodiment, the transformed data can be sequentially read from IDCT processor 20. ADD_ENABLE controls the first set of serial adders 56 which perform the adds in the first three stages of trellis 100. ADD_ENABLE is dependent on the desired mix of transforms and is generated in accordance with the PQR input. MAP controls the last three stages 62, 66 and 70 of serial butterfly to produce the desired mix of transforms. MAP is also generated in accordance with the PQR input. Four control bits are needed for the second stage 62 to individually enable or disable each of the four sets of butterfly (see FIG. 3D). Similarly, two control bits are needed for the third stage 66 and one control bit is needed for the fourth stage 70. In the exemplary embodiment, no control signal is needed for the first stage 58 since IDCT processor 20 always performs at least a 2- point transform. However, a control signal can be generated to provide bypass of the first stage 58 if this is desired or necessary. Since the 2-D transform of the present invention is performed serially using two 1-D transforms, controller 26 delays the control signals to IDCT processor 20b, relative to IDCT processor 20a, to synchronize the control signals with the input data.
Controller 26 can be implemented as a combination of combinatory logic and state machine. Alternatively, controller 26 can be implemented with a micro-controller or a micro-processor running a microcode. Different implementations of controller 26 to perform the functions as described herein are within the scope of the present invention.
HI. Transposition Memory
In the exemplary embodiment, memory element 22 can be implemented as a transposition memory. A 2-D transform is achieved by performing a 1-D transform on the columns of the input data block, storing the intermediate results, and performing a 1-D transform on the rows of the intermediate results. The 1-D transform on the rows cannot be performed until all columns have been transformed. In the exemplary embodiment, the two 1-D transforms are pipelined such that both operate concurrently.
Memory element 22 can be implemented as a block of memory as illustrated in FIG. 1A. Assume that the intermediate results from IDCT processor 20a are initially written by columns to memory element 22. IDCT processor 20b cannot operate on the rows of intermediate results until all columns are operated on by IDCT processor 20a. Once the last column of memory element 22 is filled, the intermediate results are provided to IDCT processor 20b in rows. However, because of the pipelined structure, IDCT processor 20a provides one column of data for each row of data retrieved by IDCT processor 20b. This column of data cannot be written over a previous column since some data points in the previous column are still needed by IDCT processor 20b. To resolve this problem, the new column of intermediate results is written over the row of data just retrieved by IDCT processor 20b. In fact, memory element 22 can be implemented with read- modify-write capability such that the same memory location can be read from and written to on the same clock cycle. Within one clock cycle, one data point can be read from one location of memory element 22 by IDCT processor 20b and that same location can be written to by IDCT processor 20a. Implemented in this manner, memory element 22 is transposed, or alternates between column major and row major, over successive 16x16 blocks. The transposition reduces the memory requirement to only one bank of memory. The control signals to implement memory element 22 as a transposition memory are provided by controller 26. Controller 26 has the necessary timing information and can synchronize IDCT processors 20a and 20b and memory element 22 with the input data block. Memory element 22 can be implemented using storage element or one of any number of memory devices that are known in the art, such as RAM memory devices, latches, or other types of memory devices.
IV. Serial Butterfly
The serial butterfly is shown in FIG. 5A and 5B. FIG. 5A is a graphical illustration of the serial butterfly and FIG. 5B is a block diagram of the same serial butterfly. Serial butterfly 140 operates on two inputs XI and X2. Input XI is delayed by delay element 148 to align the top and bottom signal paths.
( kκ Input X2 is scaled by 1/I2C*) by bit-serial multiplier 150. C* denotes cos — .
V n ) The outputs from delay element 148 and multipliers 150 are provided to serial adders 160a and 160b. Serial adder 160a adds the output from multiplier 150 to the output from delay element 148 and serial adder 160b subtracts the output from multiplier 150 from the output from delay element 148. The outputs from the serial adders 160a and 160b comprise the serial butterfly outputs Zl and Z2, respectively. In the present invention, serial adders 160a and 160b are designed such that the adders can be turned off to allow Yl and Y2 to pass through as Zl and Z2, respectively. In the exemplary embodiment, serial butterfly 140 operates on two input bit streams and provides two output bit streams.
An exemplary block diagram of bit-serial multiplier 150 is shown in FIGS. 6A and 6B. FIG. 6A illustrates bit-serial multiplier 150 in a word-wide representation and FIG. 6B illustrates the same multiplier 150 in a bit-wide representation. A bit-serial multiply of X with C is achieved by successively adding C to the intermediate product term and shifting the result by one binary bit position. This is shown by the block diagram in FIG. 6A. Latch 212 is cleared by the LD signal, which is enabled for one cycle out of every 16 clock cycles, to prepare latch 212 for the next multiply. The LD signal also loads parallel-to-serial shift register 214 with the product term of the just completed multiply from adder 210. The product term is then shifted out serially from register 214 during the next multiply.
In the exemplary embodiment, the precision of the input data X, the constant C, and the product Y is 16-bit. 16-bit of precision results in less arithmetic error than that specified by "IEEE Standard 1180-1990 : Specifications for the Implementations of 8x8 Inverse Discrete Cosine Transform." The 16-bit representation can comprise 1 sign bit, 9 magnitude bits, and 6 fractional bits. Other representations of less than 16-bit or more than 16-bit can be contemplated and are within scope of the present invention.
In the exemplary embodiment, adder 210, latch 212, and register 214 are all implemented with 16-bit. For each clock cycle, one bit of X is shifted into bit-serial multiplier 150, with the LSB bit first. Depending on the value of the input bit and the LD signal, the constant C is added to the intermediate product term which is stored in latch 212. Within logic circuit
200, AND gate 204 determines whether C is to be added to the intermediate product term based on the input bit and the LD signal. The intermediate product term from adder 210 is then shifted by one bit position and stored back into latch 212, in bit positions D[14..0]. The LSB bit from adder 210 is discarded and the MSB bit in latch 212 is sign extended, e.g. D[15] = Co[15] where C[15] is the carry-out from the MSB of adder 210. As shown in FIG.
6A, bit-serial multiplier 150 can be implemented with the same amount of hardware as an accumulator, which is very compact for IC design.
Bit-serial multiplier 150 is shown in further detail in FIG. 6B. Adder 210, latch 212, and register 214 are shown in bit form. Depending on the value of input bit X and the LD signal, the constant C can be added to the intermediate product term which is stored in latch 212. Each adder 210 receives a carry-in (Ci) input from latch 212 of the next less significant bit and provides a carry-out (Co) output to adder 210 of the next more significant bit. This is a standard carry chain of an adder.
The simple truncation of the LSB bits produces a slight negative bias on the twos complement output product term. This slight negative bias can be offset by adding an LSB on the second to last adder 210a, which produces a half LSB of positive offset in the output product term. The overall offset can be minimized by alternating the truncation and positive offset over successive multipliers 150. The offset is controlled by the ROUND signal which can be hardwired high or low depending on the desired result. An exemplary block diagram of serial adder 160 is shown in FIG. 7A.
Serial adder 160 serially receives two inputs Yl and Y2, with the LSB bit first.
Serial adder 160 can add the two inputs (Y1+Y2), subtract one input from the other input (Y1-Y2), or bypass one input through to the output (Z=Y2). The addition or subtraction depends on the location of serial adder 160 within the IDCT trellis, e.g. whether serial adder 160 is located in the upper leg or the lower leg of the butterfly. The bypass mode allows IDCT processor 20 of the present invention to perform different mix of transforms.
The inputs Yl and Y2 are provided serially to AND gate 240 and XOR gate 242, respectively. ADD_EN is also provided to AND gate 240. When ADD_EN is low, the output from AND gate 240 is low and Yl is not provided to adder 244. When ADD_EN is high, Yl is provided to adder 244. INVERT signal is provided to XOR gate 242 and register 246. To perform a subtraction, the input Y2 is converted to a negative number and added to the other operand. Conversion of a twos complement number to a negative number requires inverting all the bits in the original number and adding a one to the LSB bit position. The inversion of the bits is performed by XOR gate 242 when the INVERT signal is high. The one is added to the LSB bit position of the input number by storing a one at the start of the serial add, when the LD signal is enabled the INVERT signal is high, and adding this value to the carry-in (Ci) of adder 244.
For each subsequent clock cycle, the carry-out (Co) from the prior 1-bit add is stored in register 246. The carry-out is added with the next set of bits from the two inputs Yl and Y2. The sum output S from adder 244 represents the output from serial adder 160.
The constant C can be hardwired or mask programmable. Since the first stage 58 of butterfly is always performed in the exemplary embodiment, the constant C for bit-serial multipliers 150 for this stage can be hardwired. However, for the remaining stages 62, 66, and 70 of butterfly, the constant C can be mask programmable to allow multipliers 150 to perform multiplies of the input X2 with either 1/(2C*) or 1, when serial butterfly 140 is placed in the bypass mode. Multipliers 150 can also be loaded with other values of C to perform scaling or normalization of the input X2.
As represented in FIG. 7, serial adder 160 can perform adds, subtracts, or bypass of the two inputs. Serial adder 160 can be modified to perform the functions as required by serial butterfly 140. For example, referring to FIG.
5B, serial adder 160a only performs adds or bypass. Therefore, serial adder
160 in FIG. 7 can be modified by providing Yl directly to the B input of adder
244, eliminating XOR gate 242, and providing Y2 to AND gate 240. The INVERT signal can be removed since adder 160a only performs adds.
Similarly, serial adder 160b only performs subtracts or bypass. Therefore, the
INVERT signal in serial adder 160 can be tied to a high reference.
Serial adder 160 can also be used to perform serial adds and bypass as required by serial adders 56 in FIG. 4 which implement the serial adds 112 required by the first three stages of trellis 100 as shown in FIG. 3D.
Referring to FIG. 5B, delay element 148 can be implemented with a series of latches. The number of latches is selected to match the processing delay of multiplier 150. V. I/O Buffer
In the exemplary embodiment, within each IDCT processor 20, a bank of 16 I/O buffers 52 receives the input data and provides the transformed data. The input and output from IDCT processor 20 are provided in a word serial manner, or one complete data point per clock cycle. The 16 data points are loaded into the 16 I/O buffers 52 in 16 clock cycles. Once all I/O buffers 52 are loaded, the 16 data points are provided to the IDCT trellis in a bit serial manner, one bit per clock cycle. For each clock cycle, I/O buffers 52 also receive the transformed data bits from the final stage 70 of serial butterfly. The transformed data is provided serially to I/O buffer 52,
An exemplary block diagram of one I/O buffer 52 is shown in FIG. 8. I/O buffer 52 comprises 16-bit latch 262, 16-bit parallel-to-serial shift register 264, 16-bit latch 266, and output buffer 268. The IDCT input is provided to all latches 262 within the 16 I/O buffers 52. Each I/O buffer 52 latches the IDCT input when directed by the control signal WR(w). WR(w) is decoded from the WRITE_ENABLE signal which originates from controller 26. Latch 262 within each I/O buffer 52 is enabled for only one out of every sixteen clock cycles. After the 16 data points have been latched by latches 262, the LD signal is enabled and the values stored in latches 262 are provided to registers 264.
For each I/O buffer 52, the LSB register 264a serially shifts the data point to routing crossbar 54, one bit per clock cycle with the LSB bit shifted out first. For each clock cycle, one transformed data bit is serially shifted into the MSB register 264q, one bit per clock cycle with the LSB shifted in first. After 16 clock cycles, all 16 data bits are shifted out to routing crossbar 54 and all 16 transformed data bits are shifted into register 264. Every 16 clock cycles, the LD signal loads register 264 with the next data point and loads latch 266 with the transformed data point. The transformed data is stored in latch 266 until it is read out through output buffer 268. Output buffer 268 is selectively enabled such that the transformed data is provided serially from the 16 I/O buffers 52, one transformed data point per clock cycle. The order of the read is controlled by the RD(w) signal which is decoded from READ_ENABLE. The block diagram in FIG. 8 represents one implementation of I/O buffer 52. Other implementations which perform the same functions as described herein can be contemplated and are within the scope of the present invention. Although the present invention has been described in the context of a 2-D IDCT engine, the inventive concept can be extended to other transforms such as discrete Fourier transform (DFT), inverse discrete Fourier transform (IDFT), fast Fourier transform (FFT), inverse fast Fourier transform (IFFT), discrete cosine transform (DCT), and Hadamard transform. Therefore, application of the inventive concept as described herein to other transforms are within the scope of the present invention.
The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. The various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A variable block size IDCT processor comprising : a bank of adders for receiving a plurality of input data points and a first control signal, said first control signal commanding said bank of adders to perform additions on selected combinations of input data points; a plurality of stages of butterfly; and a plurality of routing crossbars, one routing crossbar interposed between said bank of adders and a first stage of butterfly and one routing crossbar interposed between successive stages of butterfly; wherein said plurality of stages of butterfly receive a second control signal commanding said plurality of stages of butterfly to perform butterfly operations on selected inputs to said plurality of stages of butterfly.
2. The IDCT processor of claim 1 wherein said bank of adders and said plurality of stages of butterfly are implemented with serial adders and bit-serial multipliers.
3. The IDCT processor of claim 2 further comprising : a bank of I/O buffers for receiving said plurality of input data points in a word serial format and for providing said bank of adders with said input data points in a bit serial format.
4. The IDCT processor of claim 3 wherein said plurality of stages of butterfly are pipelined such that all stages are active concurrently.
5. The IDCT processor of claim 4 wherein said multiplicands of said bit-serial multipliers can be programmed with a mask.
6. A variable block size 2-dimensional IDCT engine comprising : a first IDCT processor, said first IDCT processor receiving input data points; a memory element connected to said first IDCT processor; a second IDCT processor connected to said memory element; and a controller connected to and providing control signals to said first
IDCT processor, said second IDCT processor, and said memory element, said controller receiving input signals and generating control signals in accordance with said input signals.
7. The IDCT engine of claim 6 wherein said IDCT processors comprise : a bank of adders for receiving said plurality of input data points and a first control signal, said first control signal commanding said bank of adders to perform additions on selected combinations of input data points; a plurality of stages of butterfly; and a plurality of routing crossbars, one routing crossbar interposed between said bank of adders and a first stage of butterfly and one routing crossbar interposed between successive stages of butterfly; wherein said plurality of stages of butterfly receive a second control signal commanding said plurality of stages of butterfly to perform butterfly operations on selected inputs to said plurality of stages of butterfly.
8. The IDCT engine of claim 7 wherein said bank of adders and said plurality of stages of butterfly are implemented with serial adders and bit-serial multipliers.
9. The IDCT engine of claim 8 wherein said IDCT processors further comprise : a bank of I/O buffers for receiving said plurality of input data points in a word serial format and for providing said bank of adders with said input data points in a bit serial format.
10. The IDCT engine of claim 9 wherein said IDCT processors are pipelined such that both IDCT processors are active concurrently.
11. The IDCT engine of claim 10 wherein said first stage of butterfly is always enabled.
12. The IDCT engine of claim 11 wherein said multiplicands of said bit-serial multipliers can be programmed with a mask.
13. The IDCT engine of claim 12 wherein said IDCT engine has a throughput rate of one output pixel per clock cycle.
14. The IDCT engine of claim 13 wherein said serial adders and bit- serial multipliers have resolution of greater than 8-bit.
15. The IDCT engine of claim 14 wherein said serial adders and bit- serial multipliers have resolution of 16-bit.
16. The IDCT engine of claim 15 wherein said memory element comprises a transposition memory.
17. An apparatus for performing a variable block size 2- dimensional IDCT transform comprising : first IDCT transform means for performing a 1-dimensional IDCT transform of a plurality of input data points; memory means for storing intermediate results from said first IDCT transform means; and second IDCT transform means for performing a 1-dimensional IDCT transform of said intermediate results; and controller means for providing control signals to said first IDCT transform means, said second IDCT transform means, and said memory means, said controller means receiving input signals and generating said control signals in accordance with said input signals.
18. The apparatus of claim 17 wherein said IDCT transform means comprises : a stage of adder means for receiving said plurality of input data points and a first control signal, said first control signal commanding said adder means to perform additions on selected combinations of input data points; a plurality of stages of butterfly means for performing butterfly operations on pairs of input data; routing means for routing signals between said stage of adder means and said plurality of stages of butterfly means; wherein said plurality of stages of butterfly means receive a second control signal commanding said plurality of stages of butterfly means to perform butterfly operations on selected pairs of input to said plurality of stages of butterfly means.
19. A transform engine comprising : a first transform processor, said first transform processor receiving input data points; a memory element connected to said first processor; a second transform processor connected to said memory memory; and a controller connected to and providing control signals to said first transform processor, said second transform processor, and said memory element, said controller receiving input signals and generating control signals in accordance with said input signals.
20. The transform engine of claim 19 wherein said transform processors comprise : a plurality of stages of butterfly; and a plurality of routing crossbars, one routing crossbar interposed between successive stages of butterfly; wherein said plurality of stages of butterfly receive a second control signal commanding said plurality of stages of butterfly to perform butterfly operations on selected inputs to said plurality of stages of butterfly.
EP98942197A 1997-08-25 1998-08-24 Variable block size 2-dimensional inverse discrete cosine transform engine Withdrawn EP1018082A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US91809097A 1997-08-25 1997-08-25
US918090 1997-08-25
PCT/US1998/017423 WO1999010818A1 (en) 1997-08-25 1998-08-24 Variable block size 2-dimensional inverse discrete cosine transform engine

Publications (1)

Publication Number Publication Date
EP1018082A1 true EP1018082A1 (en) 2000-07-12

Family

ID=25439787

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98942197A Withdrawn EP1018082A1 (en) 1997-08-25 1998-08-24 Variable block size 2-dimensional inverse discrete cosine transform engine

Country Status (5)

Country Link
EP (1) EP1018082A1 (en)
KR (1) KR20010023031A (en)
CN (1) CN1268231A (en)
AU (1) AU9030298A (en)
WO (1) WO1999010818A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003198504A (en) * 2001-12-27 2003-07-11 Mitsubishi Electric Corp Despreading processing method, despread code assignment method, terminal for moving object and base station
JP2003223433A (en) 2002-01-31 2003-08-08 Matsushita Electric Ind Co Ltd Method and apparatus for orthogonal transformation, encoding method and apparatus, method and apparatus for inverse orthogonal transformation, and decoding method and apparatus
US7096245B2 (en) 2002-04-01 2006-08-22 Broadcom Corporation Inverse discrete cosine transform supporting multiple decoding processes
US7725516B2 (en) * 2005-10-05 2010-05-25 Qualcomm Incorporated Fast DCT algorithm for DSP with VLIW architecture
US9110849B2 (en) 2009-04-15 2015-08-18 Qualcomm Incorporated Computing even-sized discrete cosine transforms
CN101605259B (en) * 2009-05-31 2012-11-21 华亚微电子(上海)有限公司 Device and method for transforming coding and decoding for multimedia data
US9069713B2 (en) 2009-06-05 2015-06-30 Qualcomm Incorporated 4X4 transform for media coding
US8762441B2 (en) 2009-06-05 2014-06-24 Qualcomm Incorporated 4X4 transform for media coding
CN101646080B (en) * 2009-06-18 2013-09-25 杭州高特信息技术有限公司 Method for fast switching parallel pipeline IDCT based on AVS and device thereof
US9081733B2 (en) 2009-06-24 2015-07-14 Qualcomm Incorporated 16-point transform for media data coding
US9075757B2 (en) 2009-06-24 2015-07-07 Qualcomm Incorporated 16-point transform for media data coding
US9118898B2 (en) 2009-06-24 2015-08-25 Qualcomm Incorporated 8-point transform for media data coding
US8451904B2 (en) 2009-06-24 2013-05-28 Qualcomm Incorporated 8-point transform for media data coding
CN102065309B (en) * 2010-12-07 2012-12-05 青岛海信信芯科技有限公司 DCT (Discrete Cosine Transform) realizing method and circuit
US9824066B2 (en) * 2011-01-10 2017-11-21 Qualcomm Incorporated 32-point transform for media data coding
US20160044314A1 (en) * 2014-08-08 2016-02-11 Qualcomm Incorporated System and method for reusing transform structure for multi-partition transform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2608808B1 (en) * 1986-12-22 1989-04-28 Efcis INTEGRATED CIRCUIT FOR DIGITAL SIGNAL PROCESSING

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9910818A1 *

Also Published As

Publication number Publication date
AU9030298A (en) 1999-03-16
KR20010023031A (en) 2001-03-26
CN1268231A (en) 2000-09-27
WO1999010818A1 (en) 1999-03-04

Similar Documents

Publication Publication Date Title
US5535288A (en) System and method for cross correlation with application to video motion vector estimator
US4777614A (en) Digital data processor for matrix-vector multiplication
EP1018082A1 (en) Variable block size 2-dimensional inverse discrete cosine transform engine
JP2630802B2 (en) Television transfer system using transform coding
EP0736205B1 (en) Method and apparatus for performing a fast hadamard transform
KR19990044305A (en) A multiplication-addition operation unit
US5867601A (en) Inverse discrete cosine transform processor using parallel processing
US5452466A (en) Method and apparatus for preforming DCT and IDCT transforms on data signals with a preprocessor, a post-processor, and a controllable shuffle-exchange unit connected between the pre-processor and post-processor
US20050004963A1 (en) Parallel adder-based DCT/IDCT design using cyclic convolution
US6574651B1 (en) Method and apparatus for arithmetic operation on vectored data
JPH08235159A (en) Inverse cosine transformation device
US20050125469A1 (en) Method and system for discrete cosine transforms/inverse discrete cosine transforms based on pipeline architecture
US6317767B2 (en) Methods and systems for performing short integer chen IDCT algorithm with fused multiply/add
JP6357345B2 (en) Data processing apparatus and method for performing conversion between spatial domain and frequency domain when processing video data
US5805482A (en) Inverse discrete cosine transform processor having optimum input structure
US6003058A (en) Apparatus and methods for performing arithimetic operations on vectors and/or matrices
US5625713A (en) Apparatus and method for increasing the throughput of an acoustic or image compression system
US6460061B1 (en) 2-dimensional discrete cosine transform using a polynomial transform
JPH07121507A (en) Discrete cosine transforming unit
US5671169A (en) Apparatus for two-dimensional inverse discrete cosine transform
EP0913778A1 (en) A commutator circuit
US5999958A (en) Device for computing discrete cosine transform and inverse discrete cosine transform
JP3575991B2 (en) Orthogonal transform circuit
KR100306745B1 (en) Half-Band Sub-Band DC / ID Circuits Using LACs and Methods
JP3046115B2 (en) Discrete cosine transformer

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000223

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20020301