CN101043625B

CN101043625B - Apparatus and method for high-speed decompression of digital data

Info

Publication number: CN101043625B
Application number: CN 200710088795
Authority: CN
Inventors: 琼·拉沃恩·米切尔; 菲利普·凯斯·霍斯金斯
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2006-03-23
Filing date: 2007-03-22
Publication date: 2010-10-20
Anticipated expiration: 2027-03-22
Also published as: CN101043625A

Abstract

Operations for decompression of compressed data is performed in parallel and in a pipelined manner to generate addresses into a memory on-the-fly rather than using a large look-up table. The logic circuits for doing so are thus reduced to the point of being able to be formed by suitable programming of a field programmable gate array (FPGA) while achieving substantial increase in processing speed beyond speed increases attributable to increase of clock rates.

Description

The apparatus and method that are used for high-speed decompression of digital data

Technical field

In general, the present invention relates to that packed data is carried out high speed and decompress, or rather, the view data of JPEG compression is carried out high speed decompress, to carry out the reconstruction of image by printer or display.

Background technology

Just becoming increasingly extensive with digital store and transmission data, at least be because digital signal suffers noise still less, often can error correction, the filing storage medium is more efficient and economical, and digital signal is transmitted and the electronics infrastructure handled is easy to acquisition.But, the data of some type such as view data, when depositing with digital form, represent that the needed data byte number of single image is often also very big; Therefore increased and held this relative mass data required storage medium cost or transmission time.

As the solution of these problems, developed numerous data compression techniques in recent years.Especially the technology developed of JPEG (joint photographic experts group) (JPEG) has become the industrial standard of Image Data Compression.This technology is for given level of data compression, for the control packed data as a result size real dirigibility is provided, and make the fidelity maximization of rebuilding the back image.JPEG can realize in hardware or software or in its combination.In general, because the data processing relative complex of JPEG compression or decompress(ion)/reconstruction, so for some overcritical especially application, such as high-speed printer (HSP) or duplicating machine or display, initial preferred special hardware and keeping is so far realized in the not too crucial software of speed more frequently although JPEG is current.But, speed for situation very crucial from the packed data reconstructed image under, specialized hardware remains the preferred of decompression.

Have many software and hardware solutions to use very different algorithm managements to decompress, but they have all caused more storage requirement and bigger circuit quantity.First JPEG chip is produced in 1989 by the C-Cube of company that starts, and is the springboard that they enter MPEG hardware.C-Cube no longer provides JPEG hardware.Several other JPEG chips that before provided have been finished mission, in any case and no longer can satisfy now needs to processing speed.In addition, provide enough storeies because satisfy the application specific integrated circuit (ASIC) of required decompression processing speed with the big relatively chip of needs, it is difficult and expensive designing them, and owing to need the number of applications of special decompress(ion) processing speed relatively limited, currently think that this large-scale ASIC is infeasible economically.But, as substituting of ASIC, the so-called field programmable gate array (FPGA) of large scale, acceleration emerges recently, and provides the cost of FPGA can accept reluctantly, because they are all made with the common version that can freely programme, thereby suitable having wide range of applications.In order in view data high speed decompression applications, to use, after deliberation the FPGA of this large-scale high clock speed.FPGA comes down to the array of multiple logic gate, and these logic gates can effectively link together, to carry out desired consolidation function according to the signal of storing in the non-volatile memory architecture under the preferable case.

Even regrettably have been found that the clock rate with available raising among the current FPGA, the decoding processing of JPEG Huffman code (and the corresponding code that uses other compress techniques) also may not can be finished by needed speed.In addition, the question blank that uses in the decode procedure of this code (it is inevitable variable) is also very big usually, can't be held by current available again little and economic FPGA.

Summary of the invention

So, improve processing speed when an object of the present invention is the decoding to the corresponding code of JPEG Huffman code and other data compression techniques, surpass because of improving the raising of the processing speed that clock rate brings, and can on FPGA, realize less than available FPGA clock rate now.Another object of the present invention is to realize that with the mode of the many operating system environments of easy adaptation and hardware improved jpeg decompression contracts.By using well-known industrial components---FPGA, it is fit to predetermined purpose of the present invention very much, has realized this purpose at least in part.Because this technology is well-known to this general FPGA equipment, can be in the present invention of this introduction easily by interested people's realization and extensively employing.In order to realize these and other purposes of the present invention, the index of the present invention by producing in real time the Huffman table (rather than obtain they from big question blank) reduced the required memory span of decoding JPEG Huffman code, makes it and will be suitable for less (as not too expensive) field programmable gate array (FPGA) or less ASIC module.In addition, it makes the decompression procedure pipelining, and mode is a fastest clock rate output byte (or sampling point) data that will move with this technology.The present invention does not use big question blank when decoding JPEG Huffman code, so the design can be installed on the FPGA.Decompression machine is accepted the JPEG packed data, the 0x00 byte that deletion is filled, the Huffman coded word of checking every Huffman coded word then simultaneously and selecting code length N to be suitable for.Demoder whole 16 the possible index of (promptly concurrently) generation simultaneously enters the Huffman symbol table, all corresponding one of each possible code length.In case determined code length N, it just selects this index.JPEGRS symbol after JPEG definition Huffman list notation accords with the decoding of finding this code length the Huffman symbol table of creating.Meanwhile, code length N is used for the input packed data after this coded word is displaced to " additional bit " of determining non-zero quantification transform coefficient values.(additional bit of 4 required bit quantity of actual non-zero transform coefficient is counted and is created in 4 zero coefficient operations.) size (0-11) of additional bit takes from the low 4 and be used to produce quantization parameter of Huffman table output.Though here with reference to the open the present invention of JPEG Huffman table, it also can be used in the Huffman table of other standards.The particular case of 1 and 2 coded word has guaranteed to handle 1 code in one-period, handles 2 codes in two cycles, handles all other codes (3-32 position) in three cycles.

The quantization table data of demarcating have JPEG and quantize tabular value, the pre-appropriate value of demarcating to the inverse discrete cosine transform (IDCT) of preparing to demarcate.This then data and quantization parameter multiplies each other and by it makes it remove sawtooth by sawtooth table storage.These go quantized transform coefficients to be used to one dimension (1D) inverse discrete cosine transform (IDCT) that is listed as and goes then, to produce the view data of reconstruct.

In a word, for being provided, the above-mentioned method that provides with other valuable effects may further comprise the steps: the bit stream that receives input; Carry out relatively to determine the code word length; With carry out described comparison step and carry out the addition of the figure place of side-play amount and described bit stream concurrently; Select to carry out the result of addition step; And after carrying out step relatively and finishing, abandon the figure place of the length that equals described coded word of described bit stream; And FPGA can so programme.

Description of drawings

Following according to the preferred embodiment of the invention with reference to the accompanying drawings detailed description the in detail will be understood above-mentioned and other purposes, aspect and advantage of the present invention better, wherein:

Fig. 1 is a block diagram, has showed according to equipment of the present invention;

Fig. 2 A, 2B and 2C have shown the details of an equipment part among Fig. 1, especially carry out the part that decoding and coefficient generate;

Fig. 3 A, 3B have shown the details of an equipment part among Fig. 1, especially the part of run time version length selection;

Fig. 4 has shown the details of a part in Fig. 1 equipment, especially carries out the Huffman address and generates the parallel part of implementing;

Fig. 5 has showed control and the input that enters IDCT row buffer zone.

Embodiment

Although hereinafter will be with reference to the accompanying drawing that has shown the preferred embodiment of the present invention, more fully introduce the present invention, but the technician who begins to should be appreciated that suitable field in following introduction can revise the present invention who introduces here, still realizes good result of the present invention and valuable effect simultaneously.So it is open below to describe the wide in range instruction that should be understood to towards suitable those skilled in the art, and not should be understood to limit the present invention.

JPEG Huffman code is designed to " standard " Huffman sign indicating number.This means length greater than the N position prefix of the coded word of N on mathematics greater than the N position or less than all coded word of N position.(notice that this has supposed the convention of the shortest coded word with 0 beginning, if the shortest coded word begins with 1 then be opposite.) this feature of standard Huffman sign indicating number allows to judge from the border of each coded word the figure place each coded word, and the former can judge from some positions of this code, even no exception when not showing the border of this code in bit stream continuously.The present invention uses relatively parallel to many juxtapositions of the coded word that is enough to comprise maximum length and has made full use of this feature.This has just supposed relatively to carry out at left-justify code in the N bit register, fills these short codes to the N bit length from any position of back packed data.The form of this standard allow from length be i 16 values (Li) of showing the coded word ordinal number (as the ordinal number of the long coded word of each corresponding positions, i=1 wherein, 2 ... 16) and comprise generation Huffman sign indicating number in the table that actual symbol reaches 256 bytes.This information sends to demoder (referring to JPEG:StillImage Data Compression Standard via JPEG definition Huffman table (DHT) marker character, Pennebaker and Mitchell, VanNostrand Reinhold, New York, 116-117 in 1993,392-394 page or leaf, its full content is incorporated herein by reference).

Along with JPEG Huffman coded word becomes longer, it is more and more leading ' 1 ' that code tends to have, such as above quote in demonstration DC in the text 509-517 page or leaf and the AC Huffman table displaying.Notice that jpeg code is designed to never use all ' 1 ' codes.Because marker character always begins with the 0xFF byte, so this is a kind of illegal code, and the next marker character of feasible identification easily.Because the jpeg code word can produce these 0xFF bytes once in a while, therefore all filling the 0x00 byte thereafter with each such byte in the entropy coding data.The byte of these fillings need deletion before the entropy decoding.

With the common all patent formerly of this theme invention be people's such as Mitchell United States Patent (USP) U.S.Pat.6,373,412, its full content is incorporated herein by reference, it discloses fast parallel JPEG Huffman decoding.Huffman decoded portion of the present invention has been improved this formerly openly.This has formerly openly shown the code length N of definite Huffman coding, by with 1 to 16 contrast of packed data greater than the comparison that walk abreast of 1 to 16 prefix of described 1 to 16 next code length.The present invention recognizes that having only back 8 of each code may not be 1.This just allows by confirming that bit preamble all is 1 and for greater than only more last 8 sizes of cutting down comparer of 8 potential code.

In case known the size of coded word, this formerly openly goes back computation index as subsequent operation.The present invention calculates whole 16 kinds of possible index concurrently, determines N with complete minimal hardware in available FPGA capacity.In addition, it may be at most 8 the fact that the present invention can utilize for baseline jpeg output index, and uses with 8 calculating as mould greater than 8 o'clock in length.Greater than 8 o'clock, precalculated side-play amount was different from the United States Patent (USP) U.S.Pat.6 that formerly mentions in the Huffman table at N, proposed side-play amount in 373,412.

This formerly open further hypothesis AC and DC table are independently.Symbol in (as identification) DC table that the present invention will find in the DHT marker character is loaded into the starting position of table, and their back are followed by the some symbols that have same names in the AC table then.(JPEGHuffman table be numbered 0,1,2,3.In baseline jpeg, only allow table 0 and table 1).Side-play amount to the AC precomputation has added the DC table space.The fixed in space of distributing is 256 bytes.Owing to need less than 160 bytes for baseline Huffman AC table, and the DC table gets 11 bytes at most, so this is not strict restriction.In order to handle 12 baseline sampling data (as medical image), needs calculating be the side-play amount of mould with 9, and this table expands to 272 bytes.Can revise according to symbol lengths parallel decoding of the present invention and index calculation.For example, in the period 1, can only calculate index at short code length.Long coded word can be calculated in the mould adder of sharing after length is known serially.Suppose to exist processor or other hardware of explaining the jpeg marker symbol.This processor root definition Huffman table (DHT) marker character calculates the decoding Huffman needed some constants of encoding in advance, and loads the Huffman symbol table.It also according to definition quantization table (Define QuantizationTable) (DQT) marker character calculate in advance with the order of sawtooth and quantize Q value, and version after the demarcation of loading quantized value.Following the initial entropy coding data that scan behind the marker character is decoded in the feed-in FPGA.

Or rather, Fig. 1 has shown assembly (as being programmed for the FPGA hardware of carrying out certain function) " table and register Loading Control " 110, finishes prestrain herein.32 identical bit data bus and control 120 are used for phase one input packed data of the present invention, in the deletion JPEG entropy coding data any 0xFF byte heel with filling 0x00 byte (130), and provide this data to code and coefficient generator 140 (equally referring to Fig. 2 A, 2B and 2C).Code and coefficient generator 140 are by control state machine 150 controls.Code length selects assembly 160 (being shown in more detail among Fig. 3 A and Fig. 3 B) that the figure place N feed-in Huffman address in the Huffman sign indicating number is taken place by 170 (being shown in more detail among Fig. 4).This address is pointed in DC and the AC Huffman table 180 and output operation/size (RS) byte.Coefficient takes place and memory module 190 uses high-order R nibble (4), to skip the operation of 0 coefficient.Low level S nibble (4) allows the S position to eject from packed data.The format details of additional bit coding can be referring to the JPEG book of being published by Pennebaker and Mitchell cited above.The coefficient that row buffer zone 192 is in store four 8 * 8, every all is arranged as vertical some row.Row IDCT and 194 pairs of some row of storage carry out inverse discrete cosine transform (IDCT), and with they writing line buffer zones 196, it comprises four pieces, every 64 word * 32.Row IDCT and final IOB 198 are similar to row IDCT and storage, only final back 8 the image reconstruction values that round off of its output.

Fig. 2 A to Fig. 2 C has constituted code and the coefficient generating assembly 140 of Fig. 1 discussed above.Should admit that the assembly set function is fast parallel ground simple process data shown in Fig. 2 A and Fig. 2 B, form that can serial-shift when becoming variable length codewords and having discerned keeps the sixteen bit left-justify simultaneously forever and can be used for decoding.Fig. 2 C shows with the next be identified as from bit stream at decode operation and so handles when extracting additional bit.

Fig. 2 A is the rough schematic view of the assembly that can develop on FPGA, it shown 32 packed datas how can be in one-period left-justify is to HW0 on half-word (being double byte) border, and new data is ready in HW1 so that export (A).32 input words " packed data " are loaded in the 4 byte registers 210 and latch.Two MUX in top (220a, 220b) feed back to byte 0 and the byte 1 that latchs in the register 230 among two MUX in bottom.The known many modes of those skilled in the art are extracted the nybble that newly is input in the register, the nybble of interior register can be pressed byte shift in one-period then, is ready for the nearly displacement of sixteen bit so output has the packed data of nybble.The output of two MUX (220c, 220d) is loaded among HW0 or the HW1, and the two all is 16 bit registers.They have created 32 output register 240 together, and wherein at least 16 any moment that must select under the assembly preferable case of developing according to Fig. 2 B all use.

Fig. 2 B has showed how 32 outputs from register 240 are latched in the register 250, and it provides stable input to MUX 260, in 16 of the height of MUX 260 mask registers 270 arbitrary group.The SHIFT AMOUNT (side-play amount) that enters this MUX select data (below will explain its origin) the left-justify next one untapped (as still untapped, follow previous coded word) 16 of packed datas so that export at (B).

Fig. 2 C has shown how feed-in additional bit of the data selection piece 290 from (B), and wherein the size nibble SSSS of the R/S byte of the previous decode codewords generation of basis selects 12 at the most.These data by " LOAD COEFF " signal latch in register 295 and be output as the coefficient data of 12 quantifications.

Also be input to Fig. 3 A from 16 packed datas of Fig. 2 B (B) output, its a part of detail display is in Fig. 3 B.By the AC/DC control line that transmits signal from processor 16 DC

code length registers

310 or 16 the AC code length registers 320 of loading data of having selected prestrain.(according to Joint Photographic Experts Group, first coefficient will be the DC coefficient in the piece, and all the other will be the AC coefficients, but other agreements also can adopt.) these coefficients still be the selection of DC coefficient according to expectation AC coefficient in data stream, and be input to 16 comparator circuits 340 (referring to Fig. 3 B) that (D) locate and 16 of the next ones of untapped (as still untapped) packed data of locating to import with (B) are compared.Shown in Fig. 3 B, under the preferable case AC coefficient and DC coefficient all are latched in the register 341, and use traffic pilot 342 selectively they to be delivered to comparer 343.The output of comparer 343 is unit lines, and whether they indicate the numerical value of prestrain more than comparing numerical value.If be that so the figure place N in the coded word is just greater than this concrete comparison so in any given comparer 343.Size from vacation to genuine conversion table Ming Dynasty code word between the comparer output 344.8 of the high positions being compared 16 and the result of comparator circuit have been used when determining this point.Do like this and allow comparer 343 to constitute by still less fpga logic unit.Simultaneously, this operation detection entirely zero allows complete zero coded word of decoding in the single cycle because coding zero will be decoded as zero.From very being easy to be transformed to the code of one of online 345 N that export to false conversion (as the comparer that can only get one is exported the conversion between " 1 " and the output " 0 "), and 5 codes 280 of expression same numbers N, it feeds back to the traffic pilot 285 among Fig. 2 B discussed above.Piece/assembly 170 takes place together with the Huffman address that selector switch outputs among Fig. 1 in 5 the N values (or export 345, depend on that the form of address selection traffic pilot 450 among Fig. 4 is preferred) that comprise N=1-16.

Conclude Fig. 3 A and Fig. 3 B, the code length that has shown 16 comparator circuits among Fig. 3 B and received its output is selected the details of the preferred form that logic 350 repeats.Having proposed some important novel key element also should gain attention.Register has loaded interweaving of DC and AC fiducial value when initialization.So top 1 bit length comparer must check that one is DC or AC code.MUX 343 selects between two input positions from register 341, to create a carry-out bit.The input meeting that its ability loads from 1 for being advanced to 16.But, according to Fig. 3 B, even coded word reach 16 need be more than 8 yet.This is due to the fact that only has 8 of low levels can be not to be complete 1 (or complete 0, depend on agreement) at the most.Being input to code length selects 8 of the high positions of logic to be used for confirming that it is complete 1 that code length is selected logic 350 high-order N-8 positions.Do like this to have simplified significantly and obtain exporting 280 and 345 decoding.When table 1 has shown nearly 16 from needn't prestrain, identification and the comparison figure place of saving more than least-significant byte.

The saving that table 1 is more no more than 8

The N bit comparison is saved

1 1 0

2 2 0

3 3 0

4 4 0

5 5 0

6 6 0

7 7 0

8 8 0

9 8 1

10 8 2

11 8 3

12 8 4

13 8 5

14 8 6

15 8 7

16 8 8

Add up to: 136 100 36

Fig. 4 has shown that 170 parallel enforcement takes place in the Huffman address.Should admit, the tissue of Fig. 4 is similar to Fig. 3 B (having simplified programming when implementing with one or more pieces FPGA under the preferable case), totalizer 443 rather than comparer 343 have been adopted in main not being both, with and input be to use the AC of concrete Huffman table rather than codeword boundaries correspondence and the traffic pilot 442 of DC side-play amount to obtain from latching 441.16 identical code datas (being packed data (B)) are input to each totalizer 443.Identical logic is used for prestrain will be by the side-play amount that loads AC or loading DC line options.Another difference is that each side-play amount is minimum 5 because DC and AC table wrap in the buffer zone of 256 bytes.Maximum side-play amount allows the AC index to skip 11 DC clauses and subclauses.Select N value definite in the logic 350 according to code length, select Huffman addresses and output from address selection traffic pilot 450.In other words, the preparation offset value and in totalizer 443 parallel addition calculation 16 addresses, according to the output of selecting logic 350 from code length, only select wherein to be suitable for code word length and address right one.If use output 280, the simpler array of AND door will form suitable selector switch.As an alternative, if adopt output 345, traffic pilot is likely for example form of crossbar switch.For above IDCT row buffer zone together with Fig. 1 discussion, Fig. 5 has provided its control and has imported relevant more information.When producing code length, from the Q table, read quantized data according to the k (sequence number of 0-63 possible coefficient in the piece) of this coefficient, in steering logic and sawtooth table, follow the tracks of, therefore when this coefficient was finished, quantized data was also with regard to all set.This coefficient is taken advantage of in the B piece 510 at A and be multiply by quantized data, if it is DC item (k=0), just adds DC PRFD value (being previous DC coefficient) at 520 places.The result of addition is exactly the demarcation DC coefficient that quantizes.It is stored in the DC PRED register 540 and is input to Mux 530.Totalizer 520 is by DC/AC selection wire (not showing among Fig. 5) control.When this coefficient is DC difference, addition takes place so that transmit DC difference and add the DCPRED value.When this coefficient was AC, totalizer 520 was only transmitted the output of taking advantage of B 510 from A.For identical image color component, the DC output that DC PRED is normally last.But, behind JPEG RESTART marker character code, reinitialize this numerical value.When each RST LOAD line resets DC PRED register, the drift of numerical value pre-compensation level and the round-off constant of this register prestrain.Owing to used the DCT that demarcates, also through demarcating, they are represented with exact unit so calculate the back at IDCT for level-shift and round-off constant.For the baseline input of 8 forward DCT, this numerical value that resets is that the DC quantized value of demarcating moves to left 15.The output of totalizer 520 is that non-zero DC or AC coefficient depend on that the operation counting runs in zero (not shown).The high half-word of the RS symbol of DC and AC Huffman table 180 output is set the operation counting from Fig. 1.Operation is the number of the zero coefficient before the next nonzero coefficient.The result is by the content stores (as the k that enters address generator 550 imports the address that 540 inside are used for entering the index of sawtooth table and produce one of four possibilities of row buffer zone block buffer) of sawtooth table.The numerical value 540 of k increases the Huffman data and adds 1 last half-word.Following half-word is used to create the coefficient data of 12 quantifications of output in Fig. 2 C.In Fig. 5, these data are to enter the COEFF input that A takes advantage of B 510.Along with each new row of addressing in the row buffer zone, the current block buffer zone of four registers adds one, which row of this piece is had data keep following the tracks of.If the Huffman data be 0x00 or k greater than 63, Here it is so piece tail, what is not stored, k is set at zero, new BOB(beginning of block).

Wipe logic use the CB1:4 register to judge will to wipe in the row buffer zone in 32 addresses which (as four of eight row).In case row IDCT and storage show that it has finished this piece, wipe buffer zone control and will begin erase process immediately.It will use the register CB1/2/3/4 obliterated data of release block, empty buffer zone and give stored logic so that write new piece its release.

Read certain row remove quantized data after, at first use the IDCT algorithm, such as 52 pages of algorithms of going up among Fig. 4 to Fig. 8 in the JPEG book of Pennebake cited above and Mitchell.F in the equation (N) is actually input, because decompress(ion) is from right to left, and compression is from left to right.The result of this process is stored in the row buffer 196 (referring to Fig. 1), and it can hold four blocks of data.Because each position all storing one, therefore do not wipe and be associated with this buffer zone.So identical logic can be used for line data, this data storage is at increment position 5-3, and its meta 0 is a lowest order, so 3 is binary digit B ' 1000 '=decimal system 8 (output that promptly is listed as IDCT places 8 span thereby it will be ready for capable IDCT).This logic is as line output then, and (as in software, this will be to move to right 12 to final output modifications in order a 19:12 only to be provided when being transformed into the quantization table of demarcation; This realizes by numbering 12 of 11-0 is not exported in hardware), remove 12 that are added to quantization table.Notice that this has supposed concrete demarcation agreement.Those skilled in the art can understand the position of how to remove correct number for other demarcation agreements.Round off in this data based position 11.This data are the data of rebuilding, and each clock period provides a byte.In order to realize this point, in Pennebake cited above and the Mitchell book 52 pages go up Fig. 4 to the algorithm shown in Figure 8 take advantage of and in addition the four-stage pipelining of register and state machine in above algorithm.Other details of rapidly and efficiently demarcating IDCT can be referring to " A Fast and Accurate Inverse DiscreteCosine Transform " by Hinds and Mitchell; Proceedings of the IEEEWorkshop on Signal Processing Systems; Athers, Greece, pp.87-93, November 1-3,2005, its full content is also quoted as a reference.

Consider foregoing, visible the present invention generates for visit Huffman table provides real-time address, need not big look-up table, and very quick when doing like this because in three clock period with the parallel simultaneously definite code word length of pipeline system and address computation.In case determined the current code word length, can be initialized as the data shift of determining next code word length immediately.In addition, for given coded word, owing to use traffic pilot to select the index of (determining in first cycle) N position coded word, so the symbol after can output decoder when second period finishes, abandon N compressed-bit by displacement simultaneously, and alignment left-justify position subsequently.One and two codes can be the particular cases in the converter 450, can finish the address in the cycle at one or two numerical value respectively and determine.Especially recognize DC diff=0 and ACBOB and AC ZRL code, to allow unitary code in one-period, to handle if they occur in one " 0 ", recognize that two codes of AC EOB, AC ZRL and DC diff=0 allow two codes to handle in two cycles.The coded word of all other length was handled in maximum three cycles.In addition, four of the low levels of decoding back symbol are size values, also can be used for controlling the processing of rebuilding the required additional bit of non-zero transform coefficient in this moment (when second period finishes).Therefore, the invention provides not only provides the Huffman table or according to the addressing of the corresponding tables of other compress techniques when using available FPGA, and by doing like this in the clock period that is supported in minimal amount with pipeline system execution parallel work-flow, therefore realized the remarkable acceleration of decode procedure, realized far better than improving clock rate.

Set forth the preferred embodiments of the present invention in drawing and description, although used specific term, the description that provides is thus only used term with common and descriptive meaning, rather than is used to limit purpose.Though introduced the present invention, can have the essence of accessory claim book and the modification in the scope when it will be apparent to one skilled in the art that the invention process according to single preferred embodiment.

Claims

1. decompressing device that comprises the packed data of on-site programmable gate array FPGA, described device comprises:

Receiving element is used to receive the bit stream of input;

Comparing unit is used for carrying out relatively to determine the code word length;

Addition unit is used for carrying out concurrently with execution described comparing unit relatively the addition of the figure place of side-play amount and described bit stream;

Selected cell is used to select carry out the result of the described addition unit of addition; And

Discarding unit is used for after described comparing unit is finished execution relatively, abandons the figure place that equals described code word length of described bit stream.

2. according to the device of claim 1, wherein, on the least significant bit (LSB) of the bit stream of described figure place, carry out described comparison, and partly respond described figure place bit stream a high position and carry out described selection.

3. according to the device of claim 1, wherein, the described figure place of described bit stream equals the maximum number of digits of described coded word on number.

4. according to the device of claim 1, also comprise: bank select bit is used to store the binary word that is used for the bit comparison of described bit stream, and selects between described binary word.

5. according to the device of claim 1, also comprise: bank select bit, be used to store the side-play amount that is used for the position addition of described bit stream, and between described side-play amount, select.

6. according to the device of claim 1, wherein, the result of described selected cell is used for addressed memory.

7. according to the device of claim 6, wherein, described storer had both comprised the DC table, also comprised the AC table.

8. according to the device of claim 7, wherein, for the baseline jpeg decoding, described DC table and AC table are limited to 256 or clauses and subclauses still less.

9. according to the device of claim 1, wherein, described on-site programmable gate array FPGA is programmed to comprise the logic of the particular case that is used for one and two code word length.

10. according to the device of claim 1, also comprise a unit of selecting described bit stream as additional bit.

11. a decompression method that utilizes the packed data of on-site programmable gate array FPGA said method comprising the steps of:

Receive the bit stream of input;

Carry out relatively to determine the code word length;

Carry out the addition of the figure place of side-play amount and described bit stream concurrently with the step of described execution comparison;

Select the result of the step of described execution addition; And

After finishing described execution step relatively, abandon the figure place that equals described code word length of described bit stream.

12. according to the method for claim 11, wherein, on the least significant bit (LSB) of the described figure place of described bit stream, carry out described comparison, and partly ring described bit stream described figure place a high position and carry out described selection.

13. according to the method for claim 11, wherein, the described figure place of described bit stream equals the maximum number of digits of described coded word on number.

14. according to the method for claim 11, further comprise step: storage is used for the binary word with the bit comparison of described bit stream, and selects between described binary word.

15. according to the method for claim 11, further comprise step: storage is used for the side-play amount with the position addition of described bit stream, and selects between described side-play amount.

16. according to the method for claim 11, wherein, the result that the step of addition step results is carried out in described selection is used for addressed memory.

17. according to the method for claim 16, wherein, described storer had both comprised the DC table, also comprised the AC table.

18. according to the method for claim 17, wherein, for the baseline jpeg decoding, described DC table and AC table are limited to 256 clauses and subclauses.

19. according to the method for claim 11, wherein, described on-site programmable gate array FPGA is programmed to comprise the logic of the particular case that is used for one and two code word length.

20., further comprise a step of selecting described bit stream as additional bit according to the method for claim 11.