CN101282121B - Method for decoding Haffmann based on conditional probability - Google Patents

Method for decoding Haffmann based on conditional probability Download PDF

Info

Publication number
CN101282121B
CN101282121B CN2007100274251A CN200710027425A CN101282121B CN 101282121 B CN101282121 B CN 101282121B CN 2007100274251 A CN2007100274251 A CN 2007100274251A CN 200710027425 A CN200710027425 A CN 200710027425A CN 101282121 B CN101282121 B CN 101282121B
Authority
CN
China
Prior art keywords
code word
code
coupling
conditional probability
original encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007100274251A
Other languages
Chinese (zh)
Other versions
CN101282121A (en
Inventor
冯云庆
姜喆
苏丹
周林均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ankai Microelectronics Co.,Ltd.
Original Assignee
Anyka Guangzhou Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anyka Guangzhou Microelectronics Technology Co Ltd filed Critical Anyka Guangzhou Microelectronics Technology Co Ltd
Priority to CN2007100274251A priority Critical patent/CN101282121B/en
Publication of CN101282121A publication Critical patent/CN101282121A/en
Application granted granted Critical
Publication of CN101282121B publication Critical patent/CN101282121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a huffman decoding method based on conditional probability, which comprises (1) statisticsing conditional probability appeared in each code word of data to be compressed to compose to a conditional probability list of each code word; (2) choosing front 50% code word each as the fronter code word to compose a corresponding sub encoding list in the code word probability arrangement, in which, the sub encoding list is composed of code words whose conditional probability value is larger than 0.1 in the corresponding conditional probability list in the front code word and index values thereof; (3) searching and matching for each code word, correspondingly choosing sub encoding list or primary encoding list of the data to be compressed as the current code list based on whether existing the front code word and the sub encoding list of the step (2) or not, and finding out the matching code word in turn to obtain associated information to complete decoding. Comparing the serial search method with the index search method, the invention can evidently reduce searching times and improve decoding speed.

Description

A kind of method of the Hafman decoding based on conditional probability
Technical field
The present invention relates to a kind of coding/decoding method, particularly a kind of method of the Hafman decoding based on conditional probability.
Background technology
Get very extensively in field of data compression Huffman algorithm application, particularly all be applied to this algorithm in the code decode algorithm of various Voice ﹠ Videos.The Huffman algorithm is the algorithm that a kind of probability that occurs according to each element in the data to be compressed carries out Code And Decode, can the shared space of lossless compress data.Cataloged procedure is exemplified below.
Data to be compressed: AABCABAADACB
Coding is as follows under the not compression situation: 00-A, 01-B, 10-C, 11-D
Be expressed as under the not compression situation: 00 00 01 10 00 01 00 00 11 00 10 01
If each element probability of occurrence is: P (A)=0.6, P (B)=0.25, P (C)=0.15, P (D)=0.05
Coding schedule is: 1-A, 01-B, 001-C, 000-D
Compression back code stream is: 11 01 001 1 01 11 000 1 001 01
Not compressing these data of storage needs 24 characters, and the compression back only needs 21 characters, the space of having reduced by 3 characters.Each element probability of occurrence is a priori data, generally is to calculate according to the characteristic of data to be compressed to draw, and perhaps a large amount of data to be compressed is added up drawing.
Decoding is the inverse process of coding, and method is that the code word in the coding schedule is searched in repetition in code stream, restores the information into the code word correspondence, is combined as data to be compressed at last successively.The compressed bit stream decode procedure of above-mentioned example is as follows:
Compressed bit stream is: 11 01 001 1 01 11 000 1 001 01
Decoding back data: A A B C A B A A D A C B
Entropy coding link in the code decode algorithm of Voice ﹠ Video generally adopts the Huffman algorithm to handle.The intermediate data of cataloged procedure link generation before is converted to corresponding code word by coding schedule, generates final code stream according to certain form again.Decode procedure is at first searched the code word in the coding schedule successively in code stream, be converted to corresponding intermediate object program, is handled by follow-up link again.Coding schedule in the practical application generally all has a hundreds of code word, formulates according to the probability that each element occurs.As seen, the Hafman decoding process is actually a matching process that comprises a large amount of table lookup operations, reduces the speed that the required number of times of tabling look-up of coupling just can be accelerated Hafman decoding.The common solution of the coupling of tabling look-up in the Hafman decoding has below been described.
(1), sequential search method.From coding schedule, take out code word successively, compare,, repeat said process otherwise get next code word if identical then the match is successful according to the binary sequence of the identical figure place of the length code fetch of this code word stream stem.The efficient of sequential search method is lower, for code table with hundreds of code word, and each comparison that generally needs more than tens times, advantage is to save the shared space of code table.
(2), optimized order is searched method.Before decoding, at first coding schedule is reset, arranged from big to small, in the sequential search method, use the coding schedule after putting in order then according to the probability that each code word occurs.The code word that probability of occurrence is big only needs less number of comparisons just can search successfully, and the number of times of totally tabling look-up decreases, and speed increases than original sequential search method.
(3), index search method.All code words in the code table according to preceding some classification, are formed several less relatively sub-coding schedules and a concordance list.Preceding some positions of code fetch stream at first during decoding relatively obtain sub-coding schedule under this code word according to concordance list, carry out sequential search then in this sub-coding schedule.Because number of codewords is less in each sub-coding schedule, this method has reduced the number of times of tabling look-up.
The probability that the code word that the existing Hafman decoding algorithm that generally uses mainly utilizes coding schedule to embody occurs, the number of times of tabling look-up is all many.Because what the probability that each code word occurs embodied is the independent probability that this code word occurs, in fact two code words that occur before and after often have certain conditional probability relation, make full use of the number of times that this relation can reduce table lookup operation, improve the speed of Hafman decoding algorithm.
Summary of the invention
The object of the present invention is to provide a kind of method of the Hafman decoding based on conditional probability, thereby can reach the remarkable minimizing number of times of tabling look-up, improve decoding speed greatly.
To achieve these goals, the method for a kind of Hafman decoding based on conditional probability provided by the invention may further comprise the steps:
1), obtain the conditional probability that each code word occurs in the data to be compressed, constitute the conditional probability table of each code word; Described conditional probability is that the probability of certain code word appearred in current location after certain code word appearred in last code word;
2), in the code word probability is arranged from big to small, select preceding 50% code word to make up corresponding sub-coding schedule respectively as last code word separately, this sub-coding schedule is to be made of greater than 0.1 code word and index value thereof condition probable value in the respective conditions probability tables in the described last code word; Described step conditional probability value rearranges sub-coding schedule greater than 0.1 code word by the probability size that it appears at after the specific last code word;
3), to each code word coupling of tabling look-up, according to last code word and step 2) sub-coding schedule existence whether, the original encoding table of sub-coding schedule of corresponding selection or data to be compressed is current code table, and finds the coupling code word in order, obtains relevant information and finishes decoding; The described matching process of tabling look-up comprises:
3.a), last code word is set for empty;
3.b), judge whether last code word is empty, if, be current code table otherwise select the corresponding sub-coding schedule of last code word for the empty original encoding table of just selecting is current code table;
3.c), check whether described sub-coding schedule exists, if existed in current code table the code word of sequential search coupling, be the current code table code word of sequential search coupling again otherwise select the original encoding table;
3.d), check and can find the coupling code word, if finding the coupling code word that last code word just is set is current coupling code word, and in the original encoding table, obtain relevant information, and get back to 3.b according to the manipulative indexing value) step, if do not find the coupling code word just to judge whether current code table is the original encoding table;
3.e), judge whether current code table is the original encoding table, if for the original encoding table then continue the code word of in original encoding table sequential search coupling and obtain relevant information, and get back to 3.b) step, if current code table is the original encoding table or can not find the coupling code word then the end that reports an error in the original encoding table.
Step 2 of the present invention) index value is the positional value of corresponding code word at the original encoding table of data to be compressed, is used for searching this code word all relevant informations at the original encoding table.
Contrast sequential search method, the present invention can significantly reduce the number of times of tabling look-up, and obviously improves decoding speed; What contrast the index search method simply carries out classified index according to some positions before the code word, the present invention is because the conditional probability of having utilized code word to occur is constructed a plurality of sub-coding schedules, common code word can be finished matching process sooner, reduces the number of times of tabling look-up, and has improved decoding speed generally.
Description of drawings
Fig. 1 is the schematic flow sheet of statistics code word conditional probability of the present invention;
Fig. 2 utilizes the Huffman of the conditional probability schematic flow sheet of tabling look-up for the present invention.
Embodiment
The method that the present invention is based on the Hafman decoding of conditional probability mainly comprises following three parts:
One, structural environment probability tables
Described conditional probability is that the probability of certain code word appearred in current location after certain code word appearred in last code word, is meant the probability that occurs code word B behind code word A as the conditional probability P (A|B) of B code word.Can utilize existing Hafman decoding algorithm, add up the conditional probability of each code word by a large amount of actual decode procedures, as shown in Figure 1, be the schematic flow sheet of statistics code word conditional probability.Can obtain a plurality of clauses and subclauses about numeral according to Fig. 1 algorithm, current code word was the occurrence number of Y when each clauses and subclauses represented that last code word is X, and (X, Y), X and Y can be the arbitrary code words in the code table to be expressed as p.Make the number of times summation of T (X) for last code word all code words that current code word occurs when being X, then T (X)=p (X, A)+p (X, B)+and p (X, C)+p (X, D), X can be the arbitrary code word in the code table, and A/B/C/D represents all code words in the code table, also need continue summation if any other code words.Conditional probability P (X|Y)=p (X, Y)/T (X), X and Y can be the arbitrary code words in the code table.
Obtain structure code word conditional probability table as shown in table 1 after the conditional probability of each code word.
Figure B2007100274251D00051
Table 1 code word conditional probability table
Two, the new coding schedule collection of structure
In arranging from big to small, the code word probability selects several code words of preceding 50%, according to the corresponding sub-coding schedule of conditional probability table (table 1) structure.Suppose that code word A and code word C probability of occurrence are positioned at preceding 50% of big minispread, simultaneously, P (A|B)>P (A|C)>P (A|A)>P (A|D), P (C|A)>P (C|C)>P (C|B)>P (C|D), then according to condition probability is constructed two subcode tables such as following table 2 and table 3 from big to small.Index value Sx is meant the positional value of code word X in the original encoding table, is used for searching this code word all relevant informations at the original encoding table.
Be subject to the memory capacity of Hafman decoding device, all code words all constructed a subcode table respectively will take a large amount of storage resources.The probability that occurs according to the restriction of memory capacity and code word, for back 50% in the code word probability is arranged from big to small code word constructor coding schedule not, in the big minispread of code word probability, be in back 50% position as code word D, then do not construct sub-coding schedule as last character code; Can not enroll corresponding sub-coding schedule for the conditional probability value less than 0.1 code word, all not enroll code word D as table 2 and table 3.All sub-coding schedules and original encoding table constitute new coding schedule collection.
Code word Index value
B Sb
C Sc
A Sa
The sub-coding schedule A of table 2
Code word Index value
A Sa
C Sc
B Sb
The sub-coding schedule C of table 3
Three, utilize the Huffman of conditional probability to table look-up
As shown in Figure 2, utilize the Huffman of the conditional probability schematic flow sheet of tabling look-up for the present invention.Its matching process of tabling look-up comprises:
1), judges whether a last code word is empty, if, be current code table otherwise select described sub-coding schedule for the empty original encoding table of just selecting is current code table;
2), check whether described sub-coding schedule exists, if existed in current code table the code word of sequential search coupling, be the current code table code word of sequential search coupling again otherwise select the original encoding table;
3), can inspection find the coupling code word, if finding the coupling code word that one code word just is set is current coupling code word, and according to obtaining relevant information in the assorted original encoding table of manipulative indexing value, if do not find the coupling code word just current code table to be carried out former code table judgment processing;
4) judge whether current code table is former code table, if for source code then continue the code word of in original encoding table sequential search coupling and obtain relevant information, if current code table is source code or can not find the coupling code word then the end that reports an error in the original encoding table.
After finishing an above-mentioned matching process, constantly repeat process and can finish all decode operations.As the back code stream of encoding is 11 01 001 1 01 11 000 1 001 01, establishes in the original encoding table each code word relevant information and puts in order and be A B C D, and coding schedule is: 1-A, 01-B, 001-C, 000-D.For first code word, a last code word is empty, directly uses the original encoding table to carry out sequential search, just finds code word A the 1st position, and index value is exactly 1, obtains the code word relevant information according to index value.For second code word, a last code word is A, uses the subcode Table A to carry out sequential search, and the index value Sa that finds A in the subcode table is 1, can find the relevant information of code word A according to index value the 1st position in the original encoding table.For the 4th code word C, a last code word is B, and the subcode table does not exist, and selects the original encoding table to carry out sequential search as current code table, and remaining process is identical with first code word.For the 9th code word D, a last code word is A, and chooser code table A carries out sequential search, do not enroll code word D in this subcode table, the coupling code word does not exist, and selects the original encoding table to search once as current code table again, find the coupling code word the 4th position, obtain relevant information.
Because common code word is all enrolled each sub-coding schedule, the number of codewords of each sub-coding schedule can be not a lot, and the major part process of tabling look-up can be finished coupling in sub-coding schedule, have only some uncommon code word just need search in the more original encoding table of code word.As seen most of code word is searched number of times and can be finished coupling through less, totally searches number of times and will be starkly lower than the sequential search method.

Claims (2)

1. method based on the Hafman decoding of conditional probability is characterized in that may further comprise the steps:
1), obtain the conditional probability that each code word occurs in the data to be compressed, constitute the conditional probability table of each code word; Described conditional probability is that the probability of certain code word appearred in current location after certain code word appearred in last code word;
2), in the code word probability is arranged from big to small, select preceding 50% code word to make up corresponding sub-coding schedule respectively as last code word separately, this sub-coding schedule is to be made of greater than 0.1 code word and index value thereof condition probable value in the respective conditions probability tables in the described last code word; Described conditional probability value rearranges sub-coding schedule greater than 0.1 code word by the probability size that it appears at after the last code word;
3), to each code word coupling of tabling look-up, according to last code word and step 2) sub-coding schedule existence whether, the original encoding table of sub-coding schedule of corresponding selection or data to be compressed is current code table, and finds the coupling code word in order, obtains relevant information and finishes decoding; The described matching process of tabling look-up comprises:
3.a), last code word is set for empty;
3.b), judge whether last code word is empty, if, be current code table otherwise select the corresponding sub-coding schedule of last code word for the empty original encoding table of just selecting is current code table;
3.c), check whether described sub-coding schedule exists, if existed in current code table the code word of sequential search coupling, be the current code table code word of sequential search coupling again otherwise select the original encoding table;
3.d), check and can find the coupling code word, if finding the coupling code word that last code word just is set is current coupling code word, and in the original encoding table, obtain relevant information, and get back to 3.b according to the manipulative indexing value) step, if do not find the coupling code word just to judge whether current code table is the original encoding table;
3.e), judge whether current code table is the original encoding table, if for the original encoding table then continue the code word of in original encoding table sequential search coupling and obtain relevant information, and get back to 3.b) step, if current code table is the original encoding table or can not find the coupling code word then the end that reports an error in the original encoding table.
2. the method for Hafman decoding according to claim 1, it is characterized in that: index value described step 2) is the positional value of corresponding code word at the original encoding table of data to be compressed.
CN2007100274251A 2007-04-05 2007-04-05 Method for decoding Haffmann based on conditional probability Active CN101282121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007100274251A CN101282121B (en) 2007-04-05 2007-04-05 Method for decoding Haffmann based on conditional probability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007100274251A CN101282121B (en) 2007-04-05 2007-04-05 Method for decoding Haffmann based on conditional probability

Publications (2)

Publication Number Publication Date
CN101282121A CN101282121A (en) 2008-10-08
CN101282121B true CN101282121B (en) 2010-10-06

Family

ID=40014446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007100274251A Active CN101282121B (en) 2007-04-05 2007-04-05 Method for decoding Haffmann based on conditional probability

Country Status (1)

Country Link
CN (1) CN101282121B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741392B (en) * 2008-11-27 2013-01-09 安凯(广州)微电子技术有限公司 Huffman decoding method for fast resolving code length
EP2781072B1 (en) * 2011-11-15 2015-10-21 Citrix Systems Inc. Systems and methods for compressing short text by dictionaries in a network
CN102438145A (en) * 2011-11-22 2012-05-02 广州中大电讯科技有限公司 Image lossless compression method on basis of Huffman code
CN102811113B (en) * 2012-07-12 2014-12-10 中国电子科技集团公司第二十八研究所 Character-type message compression method
CN105988777B (en) * 2015-01-27 2019-03-15 深圳市腾讯计算机系统有限公司 A kind of rule matching method and device
CN105630755B (en) * 2016-01-22 2018-08-14 上海普适导航科技股份有限公司 Big-dipper satellite short message expands the source encoding and decoding method and device of information content transmission
US10735736B2 (en) * 2017-08-29 2020-08-04 Google Llc Selective mixing for entropy coding in video compression
CN109412604A (en) * 2018-12-05 2019-03-01 云孚科技(北京)有限公司 A kind of data compression method based on language model
CN111130558A (en) * 2019-12-31 2020-05-08 世纪恒通科技股份有限公司 Coding table compression method based on statistical probability
CN114885035A (en) * 2022-06-30 2022-08-09 北京城建设计发展集团股份有限公司 Lossless compression method and device, and lossless decompression method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774081A (en) * 1995-12-11 1998-06-30 International Business Machines Corporation Approximated multi-symbol arithmetic coding method and apparatus
CN1547805A (en) * 2000-10-31 2004-11-17 ض� Method of performing huffman decoding
CN1889366A (en) * 2006-07-13 2007-01-03 浙江大学 Hafman decoding method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774081A (en) * 1995-12-11 1998-06-30 International Business Machines Corporation Approximated multi-symbol arithmetic coding method and apparatus
CN1547805A (en) * 2000-10-31 2004-11-17 ض� Method of performing huffman decoding
CN1889366A (en) * 2006-07-13 2007-01-03 浙江大学 Hafman decoding method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CN 1889366 A,全文.
JP特开平11-313325A 1999.11.09

Also Published As

Publication number Publication date
CN101282121A (en) 2008-10-08

Similar Documents

Publication Publication Date Title
CN101282121B (en) Method for decoding Haffmann based on conditional probability
US9589046B2 (en) Anomaly, association and clustering detection
CN101630323B (en) Method for compressing space of deterministic automaton
CN100525450C (en) Method and device for realizing Hoffman decodeng
US8676779B2 (en) Efficient storage and search of word lists and other text
CN109325032B (en) Index data storage and retrieval method, device and storage medium
US6831575B2 (en) Word aligned bitmap compression method, data structure, and apparatus
CN112953550B (en) Data compression method, electronic device and storage medium
CN1868127B (en) Data compression system and method
US20060106870A1 (en) Data compression using a nested hierarchy of fixed phrase length dictionaries
JP2001357048A (en) Method for retrieving block sort compressed data and encoding method for block sort compression suitable for retrieval
CN116318173B (en) Digital intelligent management system for financial financing service
US7650040B2 (en) Method, apparatus and system for data block rearrangement for LZ data compression
US20060069857A1 (en) Compression system and method
CN103577783A (en) Efficient self-adaptive RFID (radio frequency identification) anti-collision tracking tree algorithm
US6518895B1 (en) Approximate prefix coding for data compression
US7212679B2 (en) Font compression and retrieval
US20090058694A1 (en) Decompressing Dynamic Huffman Coded Bit Streams
CN107623524B (en) Hardware-based Huffman coding method and system
CN109446198B (en) Trie tree node compression method and device based on double arrays
US10171103B1 (en) Hardware data compression architecture including shift register and method thereof
CN115865099A (en) Multi-type data segmentation compression method and system based on Huffman coding
US6573847B1 (en) Multi-table mapping for huffman code decoding
CN102955828B (en) The method reformed for database large objects and system
CN117112609B (en) Method for improving retrieval efficiency of monitoring historical data by using key element matrix

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 510663 301-303, 401-402, zone C1, 182 science Avenue, Science City, Guangzhou high tech Industrial Development Zone, Guangdong Province

Patentee after: Guangzhou Ankai Microelectronics Co.,Ltd.

Address before: 510663 301-303, 401-402, zone C1, 182 science Avenue, Science City, Guangzhou high tech Industrial Development Zone, Guangdong Province

Patentee before: ANYKA (GUANGZHOU) MICROELECTRONICS TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 510555 No. 107 Bowen Road, Huangpu District, Guangzhou, Guangdong

Patentee after: Guangzhou Ankai Microelectronics Co.,Ltd.

Address before: 510663 301-303, 401-402, zone C1, 182 science Avenue, Science City, Guangzhou high tech Industrial Development Zone, Guangdong Province

Patentee before: Guangzhou Ankai Microelectronics Co.,Ltd.