CN1268231A - Variable block size 2-dimensional inverse discrete cosine transform engine - Google Patents

Variable block size 2-dimensional inverse discrete cosine transform engine Download PDF

Info

Publication number
CN1268231A
CN1268231A CN98808477A CN98808477A CN1268231A CN 1268231 A CN1268231 A CN 1268231A CN 98808477 A CN98808477 A CN 98808477A CN 98808477 A CN98808477 A CN 98808477A CN 1268231 A CN1268231 A CN 1268231A
Authority
CN
China
Prior art keywords
idct
butterfly operation
processor
serial
transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN98808477A
Other languages
Chinese (zh)
Inventor
K·D·伊斯顿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN1268231A publication Critical patent/CN1268231A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform

Abstract

A variable block size 2-D IDCT engine (10) which can compute any arbitrary mix of transforms. A first 1-D IDCT processor (20a) computes the transform of the data block by columns and stores the intermediate results in a transposition memory. A second 1-D IDCT processor (20b) computes the transform of the intermediate results by rows. Different mix of transforms can be easily performed by correctly ordering the input data, selectively combining the input data before the butterfly stages, and controlling the additions and multiplications at each stage of butterfly. The unnecessary butterflies are placed in the bypass mode. The butterflies can be implemented with serial adders (56) and bit-serial multipliers to greatly simplify the hardware design and minimize the routing requirements between successive stages of butterfly. The fully pipelined structure allows the IDCT engine to maintain a throughput rate of one pixel per clock cycle.

Description

2 dimensional inverse discrete cosine transform engines of variable block size
Background of invention
I. invention field
The present invention relates to digital signal processing.Specifically, the present invention relates to 2 dimension (2-D) reverse discrete cosine transform (IDCT) machines of a kind of novelty, improved variable block size.
II. description of Related Art
2 dimension discrete cosine transforms (IDCT) and reverse discrete cosine transform (IDCT) are signal processing operations important in the digital image compression.A kind of such digital image compression is applied to high-definition television (HDTV) field.In HDTV, the analog video waveform is by Analog-digital transducer (ADC) control and digitizing.Resulting then process digital processing of data through sampling is to reduce keeping following necessary transmission and/or data quantity stored of high image quality situation.Specifically, the key factor that compression is handled is the 2-D discrete cosine transform, will be frequency domain from spatial transform through the N * N data block or the image of sampled data wherein.Through the data of conversion can be further by such as Huffman code (Huffman code), run-length code block codes such as (run length codes), and/or handle such as convolutional code (convolutional codes) and Reed Solomon code error correcting codes such as (Reed-Solomon codes).Be the United States Patent (USP) U.S.Pat.No.5 of " adaptive method for compressing image of block size and system " at 3 parts of titles, 452,104, U.S.Pat.No.5,107,345 and U.S.Pat.No.5,021,891, and title is the United States Patent (USP) U.S.Pat.No.5 of " inter-frame video Code And Decode system ", disclosed a kind of exemplary HDTV image compression scheme in 576,767, above-mentioned 4 patents have all transferred the assignee of the present invention, and existing merger by reference is in this.
What transmit and/or store is through digitally coded video waveform.Carry out each pixel that reverse digital signal processing is rebuild original image at the receiver place.Image through reduction offers digital-analogue transducer (DAC), and this transducer just returns the image transformation of being rebuild the analog video waveform that can show on the monitor or television machine.
Key factor in the decoding processing is the reverse discrete cosine transform that the frequency domain data conversion is returned time domain.This IDCT machine need move the real-time reconstruction original image with high output rating.In addition, the IDCT machine is contained in the consumer product usually, thereby cost is the principal element of considering.The IDCT machine need be designed to high speed operation but complexity is very low.
This Digital Image Compress System is handled vision signal by mode frame by frame usually.Each frame of video further is divided into some N * N data block.In most of compressibilities, block size is fixed, to simplify the enforcement of DCT and IDCT machine by system design.
Allow variable block size can strengthen the compressibility performance under certain conditions, allowing that image is optimized compression, and/or improve institute's quality of reconstructed images.The variable data block size can be used to utilize some characteristic of image.In the prior art, use the variable DCT and the IDCT machine of transform processor group design data block size of different scales.Each processor calculates the conversion of different pieces of information block size to identical block.Then, the conversion of different processor output is combined into the complex transformation data block of expectation.Because need a large amount of hardware, and the coordination complexity between various hardware module, thereby the method can be dumb.
Summary of the invention
The present invention is 2 dimension (2-D) reverse discrete cosine transform (IDCT) machines of a kind of novelty, improved variable block size.According to the present invention, this N * N data block is carried out conversion by a 1-D IDCT processor by row.The intermediate result of the one IDCT processor output temporarily is stored in the transposition memory.In case all each row all obtain handling, this intermediate result just by the 2nd 1-D IDCT processor by the line translation of advancing.The output of the 2nd IDCT processor comprises exporting through conversion of IDCT machine.
The object of the invention is, provides a kind of and can calculate the 2-DIDCT machine that conversion arbitrarily mixes in one N * N data block.In an exemplary embodiment, each data block can be 16 * 16 conversion, also can be the mixing of the combination in any of 8 * 8 conversion, 4 * 4 conversion and/or 2 * 2 conversion.In an exemplary embodiment, there is one 21 control signal accurately to describe desired cutting apart, and informs the conversion combination that the IDCT computer is suitable.Among the present invention, can correctly sort by making the input data, data splitting selectively before the butterfly operation level, and control the addition and the multiplication at each butterfly operation level place, carry out different conversion combinations easily.By bypass mode unwanted butterfly operation is set.
Another purpose of the present invention is, simplifies the design of 2-D IDCT machine by serial computing is provided.Because once only of data is calculated, thereby serial adder and bit serial multiplier are simplified this design greatly.Serial computing is also simplified the interleaving route of continuous butterfly operation inter-stage significantly.Because the pipeline organization of IDCT machine of the present invention, throughput rate remains on the speed of each clock period one change point or pixel.The throughput rate of this speed during with parallel computation is identical.Just owing to the serial nature of calculating, handling time-delay increases to some extent.
Still a further object of the present invention is, reduces memory requirement.For 2-D IDCT, this data block is at first carried out conversion by the 1-DIDCT processor by row, and intermediate result temporarily is stored in the transposition memory by row.Only after respectively being listed as equal conversion, all just carry out the 2nd 1-D conversion.Because the pipeline organization of IDCT machine writes to storer and reads intermediate result by row from storer by row with parallel mode.For avoiding containing in the overlaying memory position of back desired data, storer is for continuous N * N data block transposition, or between chief series and main row alternately.By employing read-revise-write cycle period, in the identical clock period, read intermediate result, and new result write to the same memory position from memory location.Transposition memory is reduced to memory requirement to have a memory bank with one N * identical scale of N data block.
Brief Description Of Drawings
Features, objects and advantages of the invention are from becoming more clear below in conjunction with specifying of accompanying drawing is central.Among the figure everywhere same numeral all refer to counterpart, in the middle of the accompanying drawing,
Figure 1A, Figure 1B and Fig. 1 C illustrate respectively be exemplary N * N image among the present invention, through split image and with the corresponding tree derivation of institute's split image;
Fig. 2 is the block diagram of the variable 2-D IDCT machine of exemplary data block size of the present invention;
Fig. 3 A-3D is respectively the synoptic diagram of 2 IDCT lattice point structures, 4 IDCT lattice point structures, 8 IDCT lattice point structures and 16 IDCT lattice point structures among the present invention;
Fig. 4 is the block diagram of exemplary 1-D IDCT processor among the present invention;
Fig. 5 A and 5B are respectively the block diagrams of exemplary plot He this serial butterfly operation one exemplary embodiment of serial butterfly operation among the present invention;
Fig. 6 A and Fig. 6 B represent and the block diagram of the position exemplary bit serial multiplier of the present invention represented of scope by the word scope;
Fig. 7 is the block diagram of exemplary serial adder among the present invention; And
Fig. 8 is the block diagram of exemplary I/O buffer among the present invention.
Specifying of preferred embodiment
Discrete cosine transform (DCT) and reverse discrete cosine transform (IDCT) are important complementary digital signals and handle operation.DCT by following formula will be through sampled data from spatial transform to frequency domain.
Figure A9880847700081
In the formula, N is the dimension of conversion, C (0)=1/ (2 1/2), for k=1,2,3 ... N-1, C (k)=1.A kind of as the operation of a series of digital signal processing usually to carrying out dct transform through sampled data.To carry out other operations that comprise that quantification, data compression and Error Correction of Coding are handled through transform data.Above mentioned United States Patent (USP) U.S.Pat.No.5 understands the discussion to the exemplary numbers Image Compression in 452,104 specifically.
IDCT returns these data to time domain from frequency domain transform by following formula.
Figure A9880847700091
DCT and idct transform are and can divide conversion.This means that the 2-D conversion is removable to be divided into 2 1-D conversion.One data block is carried out the 2-D idct transform, can be by earlier each row of this data block being carried out the 1-D idct transform.The intermediate result of first idct transform temporarily is stored in memory component.Then each row of middle result is carried out second idct transform.The output of second idct transform comprises the reconstruction pixel of original image.
Referring to accompanying drawing, Figure 1A is illustrated to be an exemplary data piece.The capacity of data block 2 is N * N, and wherein N is 2 power, i.e. N=2x, and x is an integer 1,2,3 ...When satisfying this condition, formula (1) and formula (2) all can significantly be simplified.In the one exemplary embodiment, N equals 16, but the present invention can extend to other N values easily.
Fig. 2 illustrates the exemplary block diagram of 2-D IDCT machine 10 among the present invention.In the one exemplary embodiment, comprise that the input block of IDCT coefficient offers IDCT processor 20a by row.IDCT processor 20a and 20b are the same 1-D IDCT processors of the input data being carried out idct transform according to formula (2).The intermediate result of IDCT processor 20a output offers memory component 22, and intermediate result is temporarily by the row storage here.This intermediate result offers IDCT processor 20b by row subsequently.IDCT processor 20b carries out the 1-D idct transform, and will offer the digital signal processing module (not shown in Figure 2) of back through the image that the output of conversion is promptly rebuild.In the one exemplary embodiment, input block offers IDCT processor 20a by row, and intermediate result then offers IDCT processor 20b by row.Data block offers IDCT processor 20a by row, and intermediate result then offers IDCT processor 20b by row.In the one exemplary embodiment, IDCT processor 20a and 20b have pipeline organization, make that 20 the two whiles of IDCT processor are movable.
Its key property of IDCT is, can be by management input data point, and the summation of computational data point selection combination, and the output of two less conversion carried out the serial butterfly operation, create a bigger conversion.The serial butterfly operation is a kind of computing that specifies below.Like this, one 16 IDCT are butterfly operations of two 8 IDCT, and each 8 IDCT is butterfly operations of two 4 IDCT, and each 4 IDCT is butterfly operations of two 2 IDCT.This characteristic of IDCT is known in the present technique field, and obtains the ideal explanation of grid chart.Fig. 3 A illustrates the grid chart of 2 IDCT, and Fig. 3 B illustrates the grid chart of 4 IDCT, and Fig. 3 C illustrates the grid chart of 8 IDCT.2 IDCT include only a butterfly operation level.Shown in Fig. 3 A-3B, 4 IDCT comprise: 2 levels that 2 IDCT form, the serial summing stage before these 2 IDCT levels, and the butterfly operation level after these 2 IDCT levels.Equally, shown in Fig. 3 B-3C, 8 IDCT comprise: 2 levels that 4 IDCT form, the serial summing stage before these 4 IDCT levels, and the butterfly operation level after these 4 IDCT levels.
The synoptic diagram of 16 IDCT graticule mesh 100 among the present invention shown in Fig. 3 D.Graticule mesh 100 comes from B.G.Lee, and specifies in K.R.Rao writes books " discrete cosine transform: algorithm, advantage and application " (publishing house of institute, 1990) by name.16 IDCT graticule mesh 100 comprise three grades of serial additions and level Four serial butterfly operation, and each grade butterfly operation comprises 8 serial butterfly operations.IDCT processor of the prior art, the interconnecting relation between each continuous level is fixed, thereby restriction IDCT processor can only be carried out 16 idct transforms.And among the present invention, make each grade interconnected by utilizing the graticule mesh cross connection of to recombinate.
Shown in Fig. 3 D, 16 IDCT110 are butterfly operations of 28 IDCT108, and the data point of 8 extremely lower IDCT is made up selectively.28 IDCT108 are butterfly operations of 44 IDCT106, and the data point of 4 extremely lower IDCT is made up selectively.44 IDCT106 are butterfly operations of 82 IDCT104, and the data point of 2 extremely lower IDCT is made up selectively.This graticule mesh cross connection of can recombinating combines with serial butterfly operation bypass mode, and any conversion that 2-D IDCT machine 10 of the present invention can be calculated in N * N data block mixes.Can correctly sort by making the input data, combinatorial input data selectively before the butterfly operation level, and, finish the combination in any of 2 points, 4 points, 8 points and 16 idct transforms at graticule mesh each level place control addition and multiplication.For example, IDCT processor 20 can be carried out 28 point transformation, 82 point transformation, 18 point transformation and 24 point transformation, perhaps 18 point, 14 and 22 point transformation.Among the present invention, do not need combination stage that different conversion output is aggregated into a complex transformation data block because IDCT machine 10 this can take place automatically when being configured to various conversion are suitably mixed.Whenever calculate less conversion, serial adder and butterfly operation do not need to carry out high order conversion delay latch once again.Like this, no matter how conversion mixes, always the output of each IDCT processor 20 can be alignd by the time.
In the one exemplary embodiment, the serial butterfly operation carries out computing to 2 incoming bit streams, and 2 output bit streams are provided.The serial butterfly operation comprises 1 bit serial multiplier and 2 serial adders of obviously simplifying.The serial structure of IDCT processor 20 can only be realized the interleaving route between the continuous serial butterfly operation level with 1 bit wide data bus.
In the one exemplary embodiment, IDCT machine 10 calculates the conversion of 16 * 16 data blocks in 256 clock period.Each clock period provides an IDCT coefficient to IDCT machine 10, and extracts an output pixel in the middle of IDCT machine 10.IDCT processor 20a and 20b have pipeline organization, so that two processor is movable simultaneously.Each IDCT processor 20 receives 1 input data point, and each clock period provides the data point of 1 conversion.
The I.IDCT processor
The exemplary block diagram of IDCT processor 20 of the present invention shown in Fig. 4.Per 16 clock period (N=16), 16 I/O buffers 52 receive 16 input data points, each clock period 1 data point, 521 data points of each I/0 buffer.Order to I/O buffer 52 loading data points depends on that the conversion of just carrying out mixes, and is controlled by 4 WRITE_NABLE signals by controller 26.According to the WRITE_ENABLE signal, each data point comprises the q position of loaded in parallel to corresponding I/O buffer 52.I/O buffer 52 then formerly according to LSB (least significant bit (LSB)), 1 of each clock period is carried out serial-shift, and these 16 data points are moved to serial adder 56 through interleaving route 54 together.I/O buffer 52 can conduct as described below walks to serial shift register in the lump to be realized.
Serial adder 56 receives these data bit, and by the mode that illustrates below to these execution serial additions.Serial adder 56 is started by ADD_ENABLE, and this signal comprises 7 in an exemplary embodiment, and corresponding with first three grade addition shown in the grid 100 among Fig. 3 D.Each serial addition is by ringlet 112 expressions (for for simplicity, only marking a ringlet).4 serial additions that start/forbid 112 of 7 needs are arranged in the first order.2 groups of each 2 serial additions of controlling 112 of 3 needs are arranged in the second level.And 1 serial addition of controlling 112 of 4 groups of single needs is arranged in the third level.Can control serial adder 56 according to 7 ADD_ENABLE signals and calculate as the required serial addition 112 of graticule mesh 100 first three grades among Fig. 3 D.The totalizer group of 16 serial adders 56 is symbolically represented the required function of graticule mesh 100 first three grades among Fig. 4.
The output of serial adder 56 offers the first order 58 of 8 serial butterfly operations, among the execution graph 3D in 82 IDCT104 shown in function.Fig. 5 B block diagram illustrates each serial butterfly operation.The serial butterfly operation receives 2 series flows input X1 and X2, generates 2 serials output Z1=X1+CX2 and Z2=X1-CX2, and in the formula, C is the fixedly scalar according to the location definition of butterfly operation in IDCT graticule mesh 100.Specify serial butterfly operation one one exemplary embodiment below.In the one exemplary embodiment, the first order 58 of serial butterfly operation always starts makes IDCT processor 20 carry out at least 2 point transformation.The output of the first order 58 offers the serial butterfly operation second level 62 through interleaving route 60.Interleaving route 60 shown in IDCT graticule mesh 100 with preceding 2 grades interconnected.In the one exemplary embodiment, the second level 62 that can start selectively in the serial butterfly operation provides 4 point transformation.Owing to 4 groups of butterfly operations (seeing Fig. 3 D) are arranged, need 4 controls to start each group respectively.These 4 controls are the parts that are labeled as among Fig. 4 in the control signal of MAP.
The output of the second level 62 of serial butterfly operation offers the third level 66 through interleaving route 64.The third level 66 comprises as 2 groups of each 4 serial butterfly operations shown in 28 IDCT108 among Fig. 3 D.Each group can be started respectively by one 2 controls.The output of the third level 66 offers the fourth stage 70 through interleaving route 68.The fourth stage 70 comprises as 1 group of 8 serial butterfly operation shown in 16 IDCT110 among Fig. 3 D.This serial butterfly operation can be started selectively by 1 control.The data through serial converted of the fourth stage 70 outputs comprise the output of IDCT processor 20.
The data through 1 bit serial conversion of the fourth stage 70 outputs, Route Selection is to serial-parallel output state group.In the one exemplary embodiment, IDCT processor 20 provides this word serial mode of an output word that IDCT is provided output by each clock period.This output state can combine with input buffer and form the I/O buffer 52 that specifies below.
II. controller
Referring to Fig. 2, controller 26 offers IDCT processor 20a and 20b and memory component 22 with control signal.These control signals make IDCT processor 20a and 20b and memory component 22 synchronous, and judge the composograph of rebuilding.Input of controller 26 receiver addresses and PQR input.The reference position of controller 26 data blocks is informed in this address input.PQR input then comprises three order P, Q and the R that informs that controller 26 desired data pieces cut apart.In the one exemplary embodiment, R equals " 1 " and shows that this 16 * 16 data block will be divided into 8 * 8 less transform data pieces, Q equals " 1 " and shows that this 8 * 8 data block will be divided into 4 * 4 less transform data pieces, shows that this 4 * 4 data block will be divided into 2 * 2 less transform data pieces and P equals " 1 ".In the one exemplary embodiment, each data block can be considered other data blocks in this image, cuts apart respectively.Thereby, need 1 control for R, because one 16 * 16 transform data piece is only arranged in 16 * 16 data blocks, need 4 controls for Q, because four 8 * 8 transform data pieces can be arranged in 16 * 16 data blocks, 16 4 * 4 transform data pieces need 16 controls for P, because can be arranged in 16 * 16 data blocks.Figure 1B illustrates the exemplary segmentation scheme of data block 4, and Fig. 1 C illustrates the example view with the corresponding PQR control of image segmentation scheme, for example tree derivation.These 21 control PQR can offer controller 26 by the serial or parallel mode.
The 2-D that this PQR input is a desired data piece splitting scheme represents.Controller 26 is analyzed the PQR input and is 1-D ranks control signal.Then utilizing these ranks control signals to generate control signal comes order IDCT processor 20a and 20b to carry out corresponding conversion mixing.For the exemplary segmentation scheme 4 shown in Figure 1B, controller 26 order IDCT processor 20a carry out 24 point transformation and 18 point transformation of preceding 4 column data.Controller 26 order IDCT processor 20a but carry out 14 point transformation, 22 point transformation and 18 point transformation to back 2 column data.Processing proceeds to all row and obtains till the processing.The intermediate result of IDCT processor 20a output is stored by row in memory component 22.
Controller 26 is carried out corresponding conversion by the same manner order IDCT processor 20b to memory component 22 each interline result of output and is mixed.After all row were handled by IDCT processor 20a, controller 26 order IDCT processor 20b carried out 24 point transformation and 18 point transformation of preceding 4 interline results.As for back 2 row, 26 orders of controller IDCT processor 20b carries out 14 point transformation, 22 point transformation and 18 point transformation.Processing also proceeds to all row and obtains till the processing.
Referring to Fig. 4, the control signal to IDCT processor 20a and 20b that controller 26 generates comprises WRITE_ENABLE, READ_ENABLE, ADD_ENABLE and MAP.WRITE_NABLE control input data point writes to corresponding I/O buffer 52, makes the input data point by correct series arrangement (referring to Fig. 3 D).The order through transform data is read in READ_ENABLE control from IDCT processor 20.In the one exemplary embodiment, can from IDCT processor 20, read successively through transform data.ADD_ENABLE controls preceding 3 grades of first group of serial adders 56 carrying out addition in graticule mesh 100.ADD_ENABLE depends on that required conversion mixes, and input generates according to PQR.Last 3 grade 62,66 and 70 of MAP control serial butterfly operation generates required conversion and mixes.Also input generates MAP according to PQR.The second level 62 needs 4 control bits to start respectively or forbids each group (referring to Fig. 3 D) in 4 groups of butterfly operations.Equally, the third level 66 needs 2 control bits, and the fourth stage 70 needs 1 control bit.In the one exemplary embodiment, the first order 58 does not need control signal, because IDCT processor 20 is always carried out at least 2 point transformation.But the words that need can generate a control signal, and the bypass of the first order 58 is provided.2-D conversion of the present invention is to adopt 2 1-D conversion serials to carry out, thereby controller 26 makes to the control signal of IDCT processor 20b and postpone with respect to IDCT processor 20a, and control signal and input data are kept synchronously.
The combination that controller 26 can be used as combinational logic and state machine realizes.As an alternative, controller 26 can utilize the microcontroller or the microprocessor of operation microcode to realize.All schemes of carrying out the controller 26 of this function as described here all drop in the protection domain of the present invention.
III. transposition memory
In the one exemplary embodiment, memory component 22 can be realized by transposition memory.By each row of input block are carried out the 1-D conversion, store intermediate result, and middle each row is as a result carried out the 1-D conversion realize the 2-D conversion.All obtain conversion up to all row, just each row is carried out the 1-D conversion.In the one exemplary embodiment, this two 1-D conversion has pipeline organization, makes both parallel work-flows.
Memory component 22 can be realized by memory module shown in Figure 1A.The intermediate result of supposing IDCT processor 20a output writes to memory component 22 by row at first.After IDCT processor 20a was to all row operations, IDCT processor 20b was just to the middle result line operate of respectively advancing.In case memory component 22 last row are filled with, intermediate result just offers IDCT processor 20b by row.But owing to be pipeline organization, IDCT processor 20a provides a column data to each line data that IDCT processor 20b is retrieved.This column data can not rewrite former row, because IDCT processor 20b also need be with some data point in the prostatitis.In order to address this is that, a new row intermediate result is overwritten on that line data that IDCT processor 20b just retrieved.In fact, memory component 22 can utilize and read-revise-and write capability realizes, makes same memory location to read and to write in the identical clock period.In a clock period, can in the middle of memory component 22 a certain positions, read a data point by IDCT processor 20b, and write to that same position by IDCT processor 20a.Implement in this way, memory component 22 perhaps replaces between main row and chief series 16 * 16 continuous data blocks just by transposition.This transposition reduces to storage requirement and only needs a memory set.
Provide control signal by controller 26, realize memory component 22 by a transposition memory.Controller 26 has required timing information, and can IDCT processor 20a and 20b and memory component 22 be kept synchronously by input block.
Memory component 22 can adopt such as one of memory device of memory elements well-known in the art such as RAM memory device, latch or other memory devices or any number and implement.
IV. serial butterfly operation
Fig. 5 A and Fig. 5 B illustrate the serial butterfly operation.Fig. 5 A is the example view of serial butterfly operation, and Fig. 5 B is the block diagram of identical serial butterfly operation.Serial butterfly operation 140 pairs 2 inputs X1 and X2 carry out computing.Input X1 is postponed by delay element 148, makes the highest and the alignment of lowest signal path.Input X2 presses 1/ (2C by bit serial multiplier 150 n k) calibration.C n kExpression cos (k π/n).The output of delay element 148 and multiplier 150 offers serial adder 160a and 160b.Serial adder 160a is the output of multiplier 150 and the output addition of delay element 148, and serial adder 160b then deducts the output of multiplier 150 in the middle of the output of delay element 148.The output of serial adder 160a and 160b comprises serial butterfly operation output Z1 and Z2 respectively.Among the present invention, serial adder 160a and 160b are designed to close to allow Y1 and Y2 to pass through as Z1 and Z2 respectively.In the one exemplary embodiment, 140 pairs of 2 incoming bit streams of serial butterfly operation carry out computing, and 2 output bit streams are provided.
Fig. 6 A and Fig. 6 B illustrate the exemplary block diagram of bit serial multiplier 150.Fig. 6 A illustrates the bit serial multiplier of representing by the word scope 150, and Fig. 6 B illustrates the same multipliers 150 that the step-by-step scope is represented.By continuously with C and middle generating item addition, the bit serial multiplication that this result is shifted realize X and C by a binary digit.This is illustrated by the block diagram among Fig. 6 A.Latch 212 makes latch 212 prepare next multiplication by the LD signal zero clearing that is in starting state in 1 cycle in per 16 clock period.The LD signal makes also and walks to that serial shift register 214 loads totalizers 210 outputs just finishes generating item in the multiplication.Generating item then during next multiplication by serial-shift, shift out register 214.
In the one exemplary embodiment, input data X, constant C and the generation precision of Y as a result are 16.The arithmetic mistake that 16 precision cause is less than the regulation in " ieee standard 1180-1990:8 * 8 reverse discrete cosine transform implementation specifications ".This 16 bit representation can comprise 1 bit sign position, 9 amplitude positions and 6 fractional bits.Can implement to be less than 16 or more than other expressions of 16, these all drop on protection domain of the present invention.
In the one exemplary embodiment, totalizer 210, latch 212 and register 214 are all by 16 enforcements.Each clock period, among the X 1 formerly is displaced in the bit serial multiplier 150 by least significant bit (LSB).Constant C depends on input bit value and LD signal, and with latch 212 in the middle generating item addition of storage.In the logical circuit 200, AND gate 204 according to this input position and LD signal determining C whether will with middle generating item addition.Middle the generating item of totalizer 210 output is then moved one, and presses digit order number D[14..0] store back latch 212.The least significant bit (LSB) of totalizer 210 output is rejected, and the highest significant position of latch 212 has sign extended, for example D[15]=Co[15], C[15 wherein] and be the carry output of highest significant position in the totalizer 210.As shown in Figure 6A, the hardware that bit serial multiplier 150 can utilize equal number for example totalizer realizes that this is comparatively compact for the IC design.
Further be shown specifically bit serial multiplier 150 among Fig. 6 B.Totalizer 210, latch 212 and register 214 step-by-step forms illustrate.Constant C depends on input position X numerical value and LD signal, and with latch 212 in the middle generating item addition of storage.Each totalizer 210 receives the carry input (Ci) of latch 212 outputs of next least significant bit (LSB), provides carry output (Co) to the totalizer 210 of next highest significant position.The standard carry chain that belongs to totalizer.
Produced the negative offset slightly of 2 complement on n ns output generating item simply giving up of least significant bit (LSB).Add least significant bit (LSB) by the totalizer before final stage totalizer 210a, produce half least significant bit (LSB) in the output generating item on the occasion of skew, compensate this negative offset slightly.Give up by hocketing and, can reduce total drift on the occasion of skew at continuous multiplier 150.Utilization can be controlled skew according to required access high level of hard wire as a result or low level ROUND signal (ground signalling).
Fig. 7 A illustrates the exemplary block diagram of serial adder 160.Serial adder 160 is pressed least significant bit (LSB) formerly serial received two inputs Y1 and Y2.Serial adder 160 can make two the input additions (Y1+Y2), from one the input deduct another input (Y1-Y2), or with a certain input bypass by becoming output (Z=Y2).So to addition or subtraction, depend on the position of serial adder 160 in the IDCT graticule mesh, for example serial adder 160 is positioned at the last branch road or the following branch road of butterfly operation.Bypass mode allows IDCT processor 20 of the present invention to carry out different conversion mixing.
Input Y1 and Y2 serial respectively offer AND gate circuit 240 and distance gate circuit 242.ADD_EN also offers AND gate circuit 240.When ADD_EN was low level, AND gate 240 was output as low level, and Y1 does not offer totalizer 244.When ADD_EN was high level, Y1 just offered totalizer 244.The INVERT signal offers anticoincidence circuit 242 and register 246.In order to carry out subtraction, input Y2 be transformed to negative and with another operand addition.With 2 complement on n n transformations of variables is that negative need be to all bit reversals of original number, and adds " 1 " at least significant bit (LSB).When INVERT signal (reverse signal) is high level, convenient with partial sum gate 242 each bit reversal of execution.When the LD signal is in starting state, and the INVERT signal is when being high level, by " 1 " being stored in the reference position of serial adder, and with carry input (Ci) addition of this numerical value and totalizer 244, " 1 " is added on the least significant bit (LSB) of this input number.
Each subsequent clock cycle, the carry output (Co) of previous 1 addition output of storage in register 246.Next of the output of this carry and two input Y1 and Y2 organized 2 system position additions.The summation output S of totalizer 244 represents the output of serial adder 160.
Constant C can insert by hard wire, or the maskable programming.Because carry out the first order 58 of butterfly operation in an exemplary embodiment always, thereby can adjust the constant C of this level serial multiplier 150.But when serial butterfly operation 140 placed bypass mode, for the butterfly operation level 62,66 and 70 of remainder, the programming of constant C maskable was carried out input X2 with 1/ (2C to allow multiplier 150 n k) or 1 multiply each other.Multiplier 150 also can load other numerical value of C and carry out calibration or the normalization of importing X2.
As shown in Figure 7, serial adder 160 can be carried out addition, subtraction or bypass to two inputs.Serial adder 160 can be modified as and carry out the required function of serial butterfly operation 140.For example, referring to Fig. 5 B, serial adder 160a only carries out addition or bypass.So serial adder 160 can be removed partial sum gate 242, and provide Y2 to make amendment to AND gate 240 by directly providing Y1 to the B of totalizer 244 input among Fig. 7.Can remove the INVERT signal, because totalizer 160a only carries out addition.Equally, serial adder 160b only carries out subtraction or bypass.So the INVERT signal of serial adder 160 can link to each other with high reference source.
Serial adder 160 can be used for serial adder in the execution graph 4 56 required serial addition and bypass, and this totalizer realizes graticule mesh 100 preceding 3 grades of required serial additions 112 shown in Fig. 3 D.
Referring to Fig. 5 B, delay element 148 can utilize a succession of latch to realize.The number of latch is chosen as with the processing delay of multiplier 150 and matches.
The V.I/O buffer
In the one exemplary embodiment, in each IDCT processor 20, the memory set of 16 I/O buffers 52 receives the input data and provides through transform data.The input and output of IDCT processor 20 provide by the word serial mode, or provide by each clock period one partial data point mode.16 data points were loaded in 16 I/O buffers 52 in 16 clock period.Provide one to the IDCT graticule mesh in case load 52,16 data point serial-by-bits of all I/O buffers each clock period of mode.Each clock period, I/O buffer 52 also receive serial butterfly operation afterbody 70 output through the transform data position.This offers I/O buffer 52 through the transform data serial.
Fig. 8 illustrates the exemplary block diagram of an I/O buffer 52.I/O buffer 52 comprises 16 latchs, 264,16 latchs 266 of 262,16 bit parallels, one serial shift register and output state 268.The IDCT input offers all latchs 262 in 16 I/O buffers 52.Each I/O buffer 52 latchs IDCT input when controlled signal WR (w) instructs.The WRITE_ENABLE signal decoding that slave controller 26 is started obtains WR (w).Latch 262 in each I/O buffer 52 only is in starting state during the clock period in per 16 clock period.After treating that 16 data points latch by latch 262, the LD signal just is in starting state, and the numerical value that latchs in the latch 262 just offers register 264.
For each I/O buffer 52, shift out mode earlier by least significant bit (LSB), with one of each clock period of serial mode with data shift to crossroad by 54.Each clock period, have one through the transform data position by least significant bit (LSB) elder generation displacement mode, enter highest significant position register 264q with each clock period single place shift of serial mode.After 16 clock period, all 16 data displacements go out to crossroad by 54, all 16 then are shifted through the transform data position and enter register 264.Every 16 clock period, the LD signal just loads next data point to register 264, and loads through the transform data point to latch 266.Be stored in the latch 266 up to reading through transform data by output state 268.Output state 268 is in starting state selectively, makes 16 I/O buffers 52 provide through transform data once the transform data point with each clock period of serial mode.Read order by RD (w) signal controlling that obtains from the READ_ENABLE decoding.
That Fig. 8 center illustrates is an embodiment of I/O buffer 52.Also can realize carrying out other embodiments identical with top described function, they all drop in the protection domain of the present invention.
Although the present invention is around the explanation of 2-D IDCT machine, notion of the present invention can extend to other conversion such as for example discrete Fourier transformation (DFT), reverse discrete Fourier transformation (IDFT), Fast Fourier Transform (FFT) (FFT), reverse Fast Fourier Transform (FFT) (IFFT), discrete cosine transform (DCT) and Adama (Hadamard) conversion.The invention described above notion all drops in the protection domain of the present invention the application of other conversion.
The above-mentioned explanation to preferred embodiment that is provided can allow those skilled in the art make or utilize the present invention.The various modifications of these embodiment be will become apparent to those skilled in the art that does not need to use creative thinking just general principles recited above can be applied to other embodiment.Thereby the present invention is not limited by the embodiment that illustrates here, but to meet with in the principle and the corresponding to maximum protection scope of novel feature of this explanation.

Claims (19)

1. the IDCT processor of a variable block size is characterized in that, comprising:
Receive the totalizer group of a plurality of input data points and first control signal, the described totalizer group of the described first control signal order is carried out addition to the selected combination of input data point;
A plurality of butterfly operation levels; And
A plurality of interleaving routes, a kind of interleaving route that is arranged between the described totalizer group and the first butterfly operation level, a kind of interleaving route that is arranged between the continuous butterfly operation level,
Wherein, described a plurality of butterfly operation levels receive second control signal, order described a plurality of butterfly operation level that butterfly operation is carried out in the selected input to described a plurality of butterfly operation levels.
2. IDCT processor as claimed in claim 1 is characterized in that, implements described totalizer group and described a plurality of butterfly operation level with serial adder and bit serial multiplier.
3. IDCT processor as claimed in claim 2 is characterized in that, also comprises:
Receive described a plurality of input data point by the word serial form, and the serial-by-bit form provides the I/O buffer group of described input data point to described totalizer group.
4. IDCT processor as claimed in claim 3 is characterized in that, described a plurality of butterfly operation levels have pipeline organization, so that all concurrent activitiess of level.
5. IDCT processor as claimed in claim 4 is characterized in that, the described multiplicand of described bit serial multiplier can utilize shielding to programme.
6. 2 of variable block size dimension IDCT machines is characterized in that, comprising:
The one IDCT processor, a described IDCT processor receive the input data point;
The memory component that is connected with a described IDCT processor;
The 2nd IDCT processor that is connected with described memory component; And
Be connected with described memory component with a described IDCT processor, described the 2nd IDCT processor and provide the controller of control signal to them, described controller receiving inputted signal, and generate control signal according to described input signal.
7. IDCT machine as claimed in claim 6 is characterized in that, described IDCT processor comprises:
Receive the totalizer group of the described a plurality of input data point and first control signal, the described totalizer group of the described first control signal order is carried out addition to the selected combination of input data point;
A plurality of butterfly operation levels; And
A plurality of interleaving routes, a kind of interleaving route that is arranged between the described totalizer group and the first butterfly operation level, a kind of interleaving route that is arranged between the continuous butterfly operation level,
Wherein, described a plurality of butterfly operation levels receive second control signal, order described a plurality of butterfly operation level that butterfly operation is carried out in the selected input to described a plurality of butterfly operation levels.
8. IDCT machine as claimed in claim 7 is characterized in that, implements described totalizer group and described a plurality of butterfly operation level with serial adder and bit serial multiplier.
9. IDCT machine as claimed in claim 8 is characterized in that, described IDCT processor also comprises:
Receive described a plurality of input data point by the word serial form, and the serial-by-bit form provides the I/O buffer group of described input data point to described totalizer group.
10. IDCT machine as claimed in claim 9 is characterized in that, described IDCT processor has pipeline organization, so that the equal concurrent activities of two IDCT processors.
11. IDCT machine as claimed in claim 10 is characterized in that, the described butterfly operation first order always is in starting state.
12. IDCT machine as claimed in claim 11 is characterized in that, can utilize shielding that the described multiplicand of described bit serial multiplier is programmed.
13. IDCT machine as claimed in claim 12 is characterized in that, described IDCT facility have each clock period to export the throughput rate of a pixel.
14. IDCT machine as claimed in claim 13 is characterized in that, described serial adder and bit serial multiplier have the resolution greater than 8.
15. IDCT machine as claimed in claim 14 is characterized in that, described serial adder and bit serial multiplier have 16 bit resolutions.
16. IDCT machine as claimed in claim 15 is characterized in that described memory component comprises transposition memory.
17. a device of carrying out 2 dimension idct transforms of variable block size is characterized in that, comprising:
Carry out the first idct transform device of 1 dimension idct transform of a plurality of input data points;
Store the memory storage of the intermediate result of described first idct transform device output; And
Carry out the second idct transform device of 1 dimension idct transform of described intermediate result; And
Provide the control device of control signal to the described first idct transform device, the described second idct transform device and described memory storage, described control device receiving inputted signal generates described control signal according to described input signal.
18. device as claimed in claim 17 is characterized in that, described idct transform device comprises:
Receive the adder level of a plurality of input data points and first control signal, the described adder of the described first control signal order is carried out addition to the selected combination of input data point;
Paired input data are carried out the multistage butterfly operation device of butterfly operation;
Between described adder level and described multistage butterfly operation device, signal is carried out the routing arrangement of Route Selection;
Wherein, described multistage butterfly operation device receives second control signal, orders described multistage butterfly operation device that butterfly operation is carried out in the selected paired input to described multistage butterfly operation device.
19. a transcriber is characterized in that, comprising:
First transform processor, described first transform processor receive the input data point;
The memory component that is connected with described first processor;
Second transform processor that is connected with described memory component; And
Be connected with described memory component with described first transform processor, described second transform processor and provide the controller of control signal to them, described controller receiving inputted signal, and generate control signal according to described input signal.
20. transcriber as claimed in claim 19 is characterized in that, described transform processor comprises:
A plurality of butterfly operation levels; With
A plurality of interleaving routes, an interleaving route is between continuous butterfly operation level;
Wherein, described a plurality of butterfly operation levels receive second control signal, order described a plurality of butterfly operation level that butterfly operation is carried out in the selected input to described a plurality of butterfly operation levels.
CN98808477A 1997-08-25 1998-08-24 Variable block size 2-dimensional inverse discrete cosine transform engine Pending CN1268231A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US91809097A 1997-08-25 1997-08-25
US08/918,090 1997-08-25

Publications (1)

Publication Number Publication Date
CN1268231A true CN1268231A (en) 2000-09-27

Family

ID=25439787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN98808477A Pending CN1268231A (en) 1997-08-25 1998-08-24 Variable block size 2-dimensional inverse discrete cosine transform engine

Country Status (5)

Country Link
EP (1) EP1018082A1 (en)
KR (1) KR20010023031A (en)
CN (1) CN1268231A (en)
AU (1) AU9030298A (en)
WO (1) WO1999010818A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101605259A (en) * 2009-05-31 2009-12-16 华亚微电子(上海)有限公司 Multi-medium data is carried out the device and method of conversion coding and decoding
CN101351792B (en) * 2005-10-05 2010-12-22 高通股份有限公司 Fast dct algorithm for dsp with vliw architecture
CN102065309A (en) * 2010-12-07 2011-05-18 青岛海信信芯科技有限公司 DCT (Discrete Cosine Transform) realizing method and circuit
CN101646080B (en) * 2009-06-18 2013-09-25 杭州高特信息技术有限公司 Method for fast switching parallel pipeline IDCT based on AVS and device thereof
CN106663085A (en) * 2014-08-08 2017-05-10 高通股份有限公司 System and method for reusing transform structure for multi-partition transform

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003198504A (en) * 2001-12-27 2003-07-11 Mitsubishi Electric Corp Despreading processing method, despread code assignment method, terminal for moving object and base station
JP2003223433A (en) * 2002-01-31 2003-08-08 Matsushita Electric Ind Co Ltd Method and apparatus for orthogonal transformation, encoding method and apparatus, method and apparatus for inverse orthogonal transformation, and decoding method and apparatus
US7096245B2 (en) 2002-04-01 2006-08-22 Broadcom Corporation Inverse discrete cosine transform supporting multiple decoding processes
US9110849B2 (en) 2009-04-15 2015-08-18 Qualcomm Incorporated Computing even-sized discrete cosine transforms
US8762441B2 (en) 2009-06-05 2014-06-24 Qualcomm Incorporated 4X4 transform for media coding
US9069713B2 (en) 2009-06-05 2015-06-30 Qualcomm Incorporated 4X4 transform for media coding
US9081733B2 (en) * 2009-06-24 2015-07-14 Qualcomm Incorporated 16-point transform for media data coding
US8451904B2 (en) 2009-06-24 2013-05-28 Qualcomm Incorporated 8-point transform for media data coding
US9075757B2 (en) 2009-06-24 2015-07-07 Qualcomm Incorporated 16-point transform for media data coding
US9118898B2 (en) 2009-06-24 2015-08-25 Qualcomm Incorporated 8-point transform for media data coding
US9824066B2 (en) * 2011-01-10 2017-11-21 Qualcomm Incorporated 32-point transform for media data coding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2608808B1 (en) * 1986-12-22 1989-04-28 Efcis INTEGRATED CIRCUIT FOR DIGITAL SIGNAL PROCESSING

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101351792B (en) * 2005-10-05 2010-12-22 高通股份有限公司 Fast dct algorithm for dsp with vliw architecture
CN101605259A (en) * 2009-05-31 2009-12-16 华亚微电子(上海)有限公司 Multi-medium data is carried out the device and method of conversion coding and decoding
CN101646080B (en) * 2009-06-18 2013-09-25 杭州高特信息技术有限公司 Method for fast switching parallel pipeline IDCT based on AVS and device thereof
CN102065309A (en) * 2010-12-07 2011-05-18 青岛海信信芯科技有限公司 DCT (Discrete Cosine Transform) realizing method and circuit
CN102065309B (en) * 2010-12-07 2012-12-05 青岛海信信芯科技有限公司 DCT (Discrete Cosine Transform) realizing method and circuit
CN106663085A (en) * 2014-08-08 2017-05-10 高通股份有限公司 System and method for reusing transform structure for multi-partition transform

Also Published As

Publication number Publication date
KR20010023031A (en) 2001-03-26
AU9030298A (en) 1999-03-16
WO1999010818A1 (en) 1999-03-04
EP1018082A1 (en) 2000-07-12

Similar Documents

Publication Publication Date Title
CN111095241B (en) Accelerating math engine
CN1268231A (en) Variable block size 2-dimensional inverse discrete cosine transform engine
US5859788A (en) Modulated lapped transform method
US7117236B2 (en) Parallel adder-based DCT/IDCT design using cyclic convolution
US20110264719A1 (en) High radix digital multiplier
JPH0526229B2 (en)
CN1009034B (en) Discrete cosine conversion device
AU579621B2 (en) Computer and method for discrete transforms
US4831574A (en) Device for computing a digital transform of a signal
WO1998032080A1 (en) Method and apparatus for fft computation
CN87103642A (en) Polyacetal composition and preparation method thereof
CN116521611A (en) Generalized architecture design method of deep learning processor
CN101426134A (en) Hardware device and method for video encoding and decoding
Marino et al. A parallel implementation of the 2-D discrete wavelet transform without interprocessor communications
US6003058A (en) Apparatus and methods for performing arithimetic operations on vectors and/or matrices
US6728742B1 (en) Data storage patterns for fast fourier transforms
Shahbahrami et al. Performance comparison of SIMD implementations of the discrete wavelet transform
US4823297A (en) Digit-reversal method and apparatus for computer transforms
US20030028571A1 (en) Real-time method for bit-reversal of large size arrays
JPH07152730A (en) Discrete cosine transformation device
US4899300A (en) Circuit to perform a linear transformation on a digital signal
Liguori A MAC-less Neural Inference Processor Supporting Compressed, Variable Precision Weights
KR100202567B1 (en) An arithmetic apparatus for high speed idct
Kieffer et al. Progressive lossless image coding via self-referential partitions
US7391915B1 (en) Cache friendly method for performing inverse discrete wavelet transform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C01 Deemed withdrawal of patent application (patent law 1993)
WD01 Invention patent application deemed withdrawn after publication