CN101729076A - Nonperfect code table based Huffman decoding method for analyzing code length - Google Patents

Nonperfect code table based Huffman decoding method for analyzing code length Download PDF

Info

Publication number
CN101729076A
CN101729076A CN200810218565A CN200810218565A CN101729076A CN 101729076 A CN101729076 A CN 101729076A CN 200810218565 A CN200810218565 A CN 200810218565A CN 200810218565 A CN200810218565 A CN 200810218565A CN 101729076 A CN101729076 A CN 101729076A
Authority
CN
China
Prior art keywords
code
length
huffman
code word
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810218565A
Other languages
Chinese (zh)
Other versions
CN101729076B (en
Inventor
裴少芳
苏丹
叶广明
胡胜发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ankai Microelectronics Co.,Ltd.
Original Assignee
ANKAI (GUANGZHOU) SOFTWARE TECHN Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANKAI (GUANGZHOU) SOFTWARE TECHN Co Ltd filed Critical ANKAI (GUANGZHOU) SOFTWARE TECHN Co Ltd
Priority to CN2008102185651A priority Critical patent/CN101729076B/en
Publication of CN101729076A publication Critical patent/CN101729076A/en
Application granted granted Critical
Publication of CN101729076B publication Critical patent/CN101729076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a nonperfect code table based Huffman decoding method for analyzing a code length, comprising the following steps of: structuring all code tables used for level comparison and analysis; determining the critical code length L of the nonperfect code table; then structuring an L bit nonperfect code table; reading the code stream value of a maximum code length, and searching an (L+1)-level code word in a corresponding code word search table taking a Huffman minimum code word of each level as a prefix; analyzing the length of a first code word next to an L-level code word; searching a symbolic value which corresponds to the first code word to complete the analysis of the first code word in a code stream; eliminating analyzed code words from the current code stream, and repeating the steps to complete the decoding of all Huffman codes. The invention can greatly reduce the storage spacek and increase the detection speed; in addition, when the maximum code length is 16, the space complexity is only one two hundreds and fifty-sixth of a perfect code table analysis method, thus the storage space is greatly saved.

Description

A kind of Hafman decoding method of analysing code length based on non-perfect code tabulation
Technical field
The present invention relates to a kind of Hafman decoding method, relate in particular to a kind of Hafman decoding method based on non-complete code table fast resolving code length.
Background technology
The Huffman algorithm is the algorithm that a kind of probability that occurs according to each element in the data to be compressed carries out Code And Decode, the shared space of packed data that can can't harm.Fig. 1 is the example that a Huffman code word generates tree, is leaf with its code word, and the number of plies is a number of levels under the code word.
When resolving Huffman code, the first code word size of the definite code stream that will resolve takes out it earlier, and the symbol table that provides with Huffman encoding just can find the pairing data element of this code word.Remove the first code word in the code stream, remaining code stream is resolved as stated above one by one, can finish the decode procedure of Huffman.Because Huffman encoding is variable-length encoding, in whole decode procedure, a problem that must solve is to determine the length of Huffman code word, has below described the common solution that code word size is resolved in the Hafman decoding:
1, level comparison and analysis method.Set up a leaf key based on Huffman code word generation tree and indicate nearest next stage to have the rank (being equal to code word size) of Huffman code word, and will be with minimum code word on the one-level as prefix bit, all the other positions mend 0, extend to maximum code length length.With Huffman prefix code place rank is index, and expanding sign indicating number with this is index value, the fixed length code search words table that to build minimum Huffman code words at different levels be prefix.When the Huffman code code length is resolved, press next rank of 0 grade of retrieval of leaf key, and with the code word in this next rank retrieval fixed length code search words table, the code stream numerical value that takes out maximum code length length compares with it, if this code stream numerical value is not less than the code word in the fixed length code search words table, the current rank of leaf key is replaced with next rank, and there is a rank of leaf with this new current rank retrieval next stage, then with the code word in this new next stage rank retrieval fixed length code search words table, compare with it with code stream numerical value, less than the code word in the fixed length code search words table, the code word size that exists in current rank and the code stream in this moment leaf key is identical up to code stream numerical value.
2, perfect code table analytic method.Generate the leaf code word of setting based on all Huffman code words that comprised in the code stream, set up a perfect code long code table.Every entry index of this perfect code long code table is prefix, expands to maximum code length length by certain rule with the Huffman code word, and its index value is corresponding Huffman prefix code code length.For current Huffman code stream to be resolved, generate tree according to the Huffman code word under its code word to be resolved, in fixed length code word perfect code long code table, retrieve with this Huffman code word and generate the corresponding code table part of tree.Intercept current Huffman code stream to be resolved with maximum codeword length, and with this code stream numerical value that intercepts out as index, generate the corresponding code length code table of tree in current code stream Huffman code word and partly retrieve, the current code length code table value that retrieves is first code word code length in the current code stream to be resolved.
Obtain extracting first code word behind the code length, generate in the pairing symbol table of tree in current Huffman code word and can be resolved to the pairing data of current code word.From code stream, remove the part of having resolved,, can finish the parsing of all Huffman codes in the continuous operation one by one of residue code stream relaying.
Though the probability that the code word that the existing Hafman decoding algorithm that generally uses has utilized coding schedule to embody occurs, and searching algorithm is optimized based on the existence that Huffman code generates tree leaves at different levels, but for most code words, code length all needs repeatedly could determine, large percentage consuming time in total decoding algorithm; Improved the code length resolution speed though analyse code length based on the perfect code tabulation, for the hardware store requirement of embedded system, its space complexity increase is too big, is difficult to meet the demands.
At audio frequency, video field, in embedded system, use very extensive based on coding, the decoding algorithm of Huffman data compression.In the Huffman algorithm, code word is represented with the variable length binary prefix code, in order to resolve a Huffman code word, must resolve the word length of Huffman code word earlier, traditional code length analytical algorithm is not that speed crosses is exactly that the code table data volume is excessive slowly, how reducing the time that code length is resolved when not increasing the code table data volume, this has very important significance for Hafman decoding.
Summary of the invention
The object of the invention is to provide a kind of and analyses the Hafman decoding method of code length based on non-perfect code tabulation, and this method can significantly reduce the storage area that code table takies and accelerate decoding speed.
Purpose of the present invention can realize by following scheme: a kind of Hafman decoding method based on non-complete code table, and step comprises:
1, by the level comparison and analysis method, make up the code table of the level comparison and analysis that is useful on, comprise the fixed length code search words table that leaf key and Huffman minimum code word at different levels are prefix;
2, determine the long L of non-complete code table critical codes: from selecting a value L between minimum code length and the maximum code length, as the critical code length that makes up non-complete code table;
3, generate the leaf code word that is no more than critical code length L bit of tree based on all Huffman code words that comprised in the code stream, making up one again is the non-complete code table of L bit of prefix with the Huffman code word;
4, generate tree according to the Huffman code word under the part to be resolved in the current code stream, read the code stream numerical value of maximum code length length, with this code stream numerical value is index, generate tree according to the Huffman code word under the current code stream to be resolved, in the fixed length code search words table that is prefix with Huffman minimum code word at different levels of correspondence, retrieve the code word of rank (code length) for (L+1);
5, the fixed length code word of comparison code fluxion value and (L+1) that just retrieved level, if code stream numerical value is less than the code word that has just retrieved, preceding L bit with code stream numerical value is new index, in the non-perfect code long code matrix section retrieval of correspondence, the value that retrieves is the first code word code length of current code stream part to be resolved; Otherwise,,, resolve the first code word size after its corresponding L level according to the level comparison and analysis method with old code stream numerical value object as a comparison;
6, according to the code length of having resolved, in current code stream, extract its code word, based on code word corresponding symbol table, look into and get its corresponding value of symbol, can finish the parsing of first code word in the code stream;
7, from current code stream, reject the code word of having resolved, will remain code stream repeating step 4,5,6, can finish the decoding of all Huffman codes.
It is unit that each Huffman code word that described non-perfect code long code table comprises with code stream generates tree, makes up one by one by identical mode; Generating tree for every Huffman code word, is prefix with the Huffman code word that is no more than the L bit, and all the other positions expand to the L bit length by complete 0 to complete 1, set up the index of non-perfect code long code table; The value of index point is a Huffman prefix code code length.
Described each Huffman code word generates the corresponding code length code table part building process of tree: at first, Huffman code word according to correspondence generates tree, each bit that makes up the L bit length is complete 0 to complete 1 index, and all index values (being code length code table value) are initialized as 0; It is prefix that all that generate tree with Huffman are no more than L bit leaf numeral, and the residue code word bit is filled into the L bit length with complete 0 to complete 1, with all with the code length code table value of the expansion sign indicating number of Huffman code word prefix length assignment with corresponding Huffman prefix code.
The present invention can significantly reduce memory space and accelerate detection speed.For example when maximum code length was N (being generally 16), it was o that the code length of level comparison and analysis method is resolved time complexity
Figure G2008102185651D0000031
---p wherein iFor code length is the statistical probability of the code word of i, the code length code table space complexity of each descriptor correspondence is N; It is o (1) that complete code table code length is resolved time complexity, and corresponding each perfect code long code table space complexity of describing symbol is (2^N); In contrast to this two kinds of technology, the time complexity of non-complete code table code length analytic method is o
Figure G2008102185651D0000032
Approach complete code table stud-farm time resolution complexity, the space complexity of each descriptor of its correspondence is (2^8+N), calculates by common code length 16, and its space complexity has only perfect code table analytic method
Figure G2008102185651D0000041
Greatly saved access space.
Description of drawings
Fig. 1 is that Huffman code word of the prior art generates tree;
Fig. 2 is that single Huffman code word of the present invention generates the corresponding non-perfect code long code table generation schematic flow sheet of tree;
Fig. 3 is a code length process of analysis schematic diagram of the present invention.
Embodiment
Make up a code table that is used to retrieve the fixed length code word earlier based on the level comparison and analysis method; By level comparison and analysis method of the prior art, make up the code table of the level comparison and analysis that is useful on, comprise the fixed length code search words table that leaf key and Huffman minimum code word at different levels are prefix;
Making up one again is the non-complete code table of L bit of prefix with the Huffman code word: the leaf code word that is no more than the L bit that generates tree based on all Huffman code words that comprised in the code stream, set up a non-perfect code long code table, L is for making up the critical code length of non-complete code table, between minimum code length and maximum code length, select, for maximum code length is 16 code table, recommends to use 8 as critical code length L.Every entry index of this non-perfect code long code table with the Huffman code word that is no more than the L bit be prefix, all the other expand to the L bit length by complete 0 to complete 1, its index value is corresponding Huffman prefix code code length.In the building process of this non-perfect code long code table, each the Huffman code word generation tree that comprises with code stream is a unit, makes up one by one by identical mode.
It is as follows that each Huffman code word generates the corresponding code length code table part building process of tree, at first, Huffman code word according to correspondence generates tree, and each bit that makes up the L bit length is complete 0 to complete 1 index, and all index values (being code length code table value) are initialized as 0; It is prefix that all that generate tree with Huffman are no more than L bit leaf numeral, and the residue code word bit is filled into the L bit length with complete 0 to complete 1, with all with the code length code table value of the expansion sign indicating number of Huffman code word prefix length assignment with corresponding Huffman prefix code.
Fig. 2 is the processing of selecting single Huffman code word generation tree counterpart in the non-perfect code long code of the 8 bits table building process for use, and wherein current rank, current leaf number and the current code word position preface in current rank is all initial from 0.Concrete steps are as follows:
1, initialization is made as 0 with all code lengths, and current rank is set is 1;
2, calculate current rank leaf sum; The current leaf of current rank position tagmeme 0 is set then;
3, answering code word with leaf position ordered pair is prefix code, mends 0 to the L position, calculates it and expands the code word sum; Current leaf current code word position tagmeme 0 is set then;
4, add its tagmeme index to expand code word, index value is current class value;
5, whether relatively expand the codeword bit preface less than the current code word sum, be, then expand the codeword bit preface and add 1 and return step 4 if the result returns; If the result returns not, then carry out next step;
6, prefix code is added 1 as next prefix code, whether more current leaf position preface less than the total leaf number of rank if the result returns be, then current leaf position preface adds 1 and return step 3; If the result returns not, then carry out next step;
7, detecting the current code word rank and whether be not more than L, is then to return step 2 if the result returns; If the result returns not, then the code table generation finishes and finishes.
After making up above-mentioned two code tables, resolving code length then, generate tree according to the Huffman code word under the current code stream part to be resolved, read the code stream numerical value of its maximum code length length, with this code stream numerical value is index, retrieves rank (code length) and be the code word of L+1 in the Huffman code minimum code word prefix fixed length code search words tables at different levels of correspondence.
Comparison code fluxion value and the code word that has just retrieved, if code stream numerical value is less than the code word that has just retrieved, preceding L bit with code stream numerical value is new index, with current code stream part to be resolved under the Huffman code word generate the corresponding non-perfect code long code matrix section of tree and retrieve, the value that retrieves is the first code word code length of current code stream part to be resolved; Otherwise,,, after the Huffman code word of correspondence generates the L level of tree, retrieve first code word size to be resolved the current code stream according to the level comparison and analysis method with old code stream numerical value object as a comparison.Extract the code length code stream that just has been resolved to from current code stream, the Huffman code word generates in the tree corresponding symbol table and can find its corresponding data under current code word.
Reject the code word of having resolved from current code stream, will remain code stream and resolve as stated above, can finish the decoding of all Huffman codes.
Concrete decoding process extracts at first that rank is the code word B of L+1 in current bit stream maximum code length bit A and the fixed length code word table as shown in Figure 3, and the size of A, B relatively.If A less than B the preceding L bit number of getting A as the non-perfect code matrix section retrieval of index in correspondence, code length is an index value, and finishes decoding.If A is not less than B, then in the leaf key, there is the next stage rank of code word after the retrieval Huffman code number of words L level; Work as the corresponding code word C of next stage in the fixed length code table search.If A is not less than C, then do not replace current rank with next stage, there is next rank of code word in retrieval in the leaf key, and returns and retrieve code word C again; If A is less than C, then code length is current class value, and finishes decoding.

Claims (3)

1. analyse the Hafman decoding method of code length based on non-perfect code tabulation for one kind, it is characterized in that step comprises:
(a), by the level comparison and analysis method, make up the code table of the level comparison and analysis that is useful on, comprise the fixed length code search words table that leaf key and Huffman minimum code word at different levels are prefix;
(b), determine the long L of non-complete code table critical codes: from selecting a value L between minimum code length and the maximum code length, as the critical code length that makes up non-complete code table;
(c), generate the leaf code word that is no more than critical code length L bit of tree, making up one again is the non-complete code table of L bit of prefix with the Huffman code word based on all Huffman code words that comprised in the code stream;
(d), generate tree according to the Huffman code word under the part to be resolved in the current code stream, read the code stream numerical value of maximum code length length, with this code stream numerical value is index, generate tree according to the Huffman code word under the current code stream to be resolved, in the fixed length code search words table that is prefix with Huffman minimum code word at different levels of correspondence, retrieve the code word of rank (code length) for (L+1);
(e), the fixed length code word of comparison code fluxion value and (L+1) that just retrieved level, if code stream numerical value is less than the code word that has just retrieved, preceding L bit with code stream numerical value is new index, in the non-perfect code long code matrix section retrieval of correspondence, the value that retrieves is the first code word code length of current code stream part to be resolved; Otherwise,,, resolve the first code word size after its corresponding L level according to the level comparison and analysis method with old code stream numerical value object as a comparison;
(f), according to the code length of having resolved, in current code stream, extract its code word, based on code word corresponding symbol table, look into and get its corresponding value of symbol, can finish the parsing of first code word in the code stream;
(g), from current code stream, reject the code word resolved, will remain code stream repeating step d, e, f, can finish the decoding of all Huffman codes.
2. according to claim 1ly a kind ofly analyse the Hafman decoding method of code length, it is characterized in that it is unit that each Huffman code word that described non-perfect code long code table comprises with code stream generates tree, makes up one by one by identical mode based on non-perfect code tabulation; Generating tree for every Huffman code word, is prefix with the Huffman code word that is no more than the L bit, and all the other positions expand to the L bit length by complete 0 to complete 1, set up the index of non-perfect code long code table; The value of index point is a Huffman prefix code code length.
3. a kind of Hafman decoding method of analysing code length based on non-perfect code tabulation according to claim 1, it is characterized in that, described each Huffman code word generates the corresponding code length code table part building process of tree: at first, Huffman code word according to correspondence generates tree, each bit that makes up the L bit length is complete 0 to complete 1 index, and all index values (being code length code table value) are initialized as 0; It is prefix that all that generate tree with Huffman are no more than L bit leaf numeral, and the residue code word bit is filled into the L bit length with complete 0 to complete 1, with all with the code length code table value of the expansion sign indicating number of Huffman code word prefix length assignment with corresponding Huffman prefix code.
CN2008102185651A 2008-10-22 2008-10-22 Nonperfect code table based Huffman decoding method for analyzing code length Active CN101729076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102185651A CN101729076B (en) 2008-10-22 2008-10-22 Nonperfect code table based Huffman decoding method for analyzing code length

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102185651A CN101729076B (en) 2008-10-22 2008-10-22 Nonperfect code table based Huffman decoding method for analyzing code length

Publications (2)

Publication Number Publication Date
CN101729076A true CN101729076A (en) 2010-06-09
CN101729076B CN101729076B (en) 2012-11-21

Family

ID=42449416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102185651A Active CN101729076B (en) 2008-10-22 2008-10-22 Nonperfect code table based Huffman decoding method for analyzing code length

Country Status (1)

Country Link
CN (1) CN101729076B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104717499A (en) * 2015-03-31 2015-06-17 豪威科技(上海)有限公司 Hoffman table storage method and Hoffman decoding method for JPEG
CN106027066A (en) * 2015-03-28 2016-10-12 国际商业机器公司 Parallel huffman decoder for decoding, design structure and method
CN107204776A (en) * 2016-03-18 2017-09-26 余海箭 A kind of Web3D data compression algorithms based on floating number situation

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106027066A (en) * 2015-03-28 2016-10-12 国际商业机器公司 Parallel huffman decoder for decoding, design structure and method
CN106027066B (en) * 2015-03-28 2019-06-04 国际商业机器公司 For decoded parallel huffman data decoder, design structure and method
CN104717499A (en) * 2015-03-31 2015-06-17 豪威科技(上海)有限公司 Hoffman table storage method and Hoffman decoding method for JPEG
CN104717499B (en) * 2015-03-31 2018-06-05 豪威科技(上海)有限公司 A kind of storage method of huffman table and the Hofmann decoding method for JPEG
CN107204776A (en) * 2016-03-18 2017-09-26 余海箭 A kind of Web3D data compression algorithms based on floating number situation

Also Published As

Publication number Publication date
CN101729076B (en) 2012-11-21

Similar Documents

Publication Publication Date Title
CN105893337B (en) Method and apparatus for text compression and decompression
Kreft et al. On compressing and indexing repetitive sequences
RU2464630C2 (en) Two-pass hash extraction of text strings
EP3051430B1 (en) Encoding program, decompression program, compression method, decompression method, compression device and decompresssion device
EP1578020B1 (en) Data compressing method, program and apparatus
CN101783788B (en) File compression method, file compression device, file decompression method, file decompression device, compressed file searching method and compressed file searching device
CN101729520A (en) Method and device for detecting sensitive information
KR101969848B1 (en) Method and apparatus for compressing genetic data
CN101557517A (en) Decoder, decoding method and apparatus
EP3154202A1 (en) Encoding program, encoding method, encoding device, decoding program, decoding method, and decoding device
CN103078646B (en) Dictionary enquiring compression, decompression method and device thereof
CN101729076B (en) Nonperfect code table based Huffman decoding method for analyzing code length
CN100578943C (en) Optimized Huffman decoding method and device
US20090055395A1 (en) Method and Apparatus for XML Data Processing
JP5913748B2 (en) Secure and lossless data compression
CN109670153B (en) Method and device for determining similar posts, storage medium and terminal
CN101741392A (en) Huffman decoding method for fast resolving code length
Klein et al. Huffman coding with non-sorted frequencies
US20050240619A1 (en) System and method of enhancing
Cannane et al. General‐purpose compression for efficient retrieval
WO2018226221A1 (en) Context-dependent shared dictionaries
CN109446198B (en) Trie tree node compression method and device based on double arrays
Robert et al. Simple lossless preprocessing algorithms for text compression
US11741121B2 (en) Computerized data compression and analysis using potentially non-adjacent pairs
CN110263339B (en) Retrievable compression and decompression method based on Uyghur syllables

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP03 Change of name, title or address

Address after: 510663 no.301-303, 401-402, area C1, No.182, Science City, Guangzhou hi tech Industrial Development Zone, Guangzhou City, Guangdong Province

Patentee after: Guangzhou Ankai Microelectronics Co.,Ltd.

Address before: 301-303 401-402, zone C1, No. 182, science Avenue, Science City, Guangzhou high tech Industrial Development Zone

Patentee before: ANYKA (GUANGZHOU) MICROELECTRONICS TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address
CP02 Change in the address of a patent holder

Address after: 510555 No. 107 Bowen Road, Huangpu District, Guangzhou, Guangdong

Patentee after: Guangzhou Ankai Microelectronics Co.,Ltd.

Address before: 510663 no.301-303, 401-402, area C1, No.182, Science City, Guangzhou hi tech Industrial Development Zone, Guangzhou City, Guangdong Province

Patentee before: Guangzhou Ankai Microelectronics Co.,Ltd.

CP02 Change in the address of a patent holder