A kind of method of the Hafman decoding based on conditional probability
Technical field
The present invention relates to a kind of coding/decoding method, particularly a kind of method of the Hafman decoding based on conditional probability.
Background technology
Get very extensively in field of data compression Huffman algorithm application, particularly all be applied to this algorithm in the code decode algorithm of various Voice ﹠ Videos.The Huffman algorithm is the algorithm that a kind of probability that occurs according to each element in the data to be compressed carries out Code And Decode, can the shared space of lossless compress data.Cataloged procedure is exemplified below.
Data to be compressed: AABCABAADACB
Coding is as follows under the not compression situation: 00-A, 01-B, 10-C, 11-D
Be expressed as under the not compression situation: 00 00 01 10 00 01 00 00 11 00 10 01
If each element probability of occurrence is: P (A)=0.6, P (B)=0.25, P (C)=0.15, P (D)=0.05
Coding schedule is: 1-A, 01-B, 001-C, 000-D
Compression back code stream is: 11 01 001 1 01 11 000 1 001 01
Not compressing these data of storage needs 24 characters, and the compression back only needs 21 characters, the space of having reduced by 3 characters.Each element probability of occurrence is a priori data, generally is to calculate according to the characteristic of data to be compressed to draw, and perhaps a large amount of data to be compressed is added up drawing.
Decoding is the inverse process of coding, and method is that the code word in the coding schedule is searched in repetition in code stream, restores the information into the code word correspondence, is combined as data to be compressed at last successively.The compressed bit stream decode procedure of above-mentioned example is as follows:
Compressed bit stream is: 11 01 001 1 01 11 000 1 001 01
Decoding back data: A A B C A B A A D A C B
Entropy coding link in the code decode algorithm of Voice ﹠ Video generally adopts the Huffman algorithm to handle.The intermediate data of cataloged procedure link generation before is converted to corresponding code word by coding schedule, generates final code stream according to certain form again.Decode procedure is at first searched the code word in the coding schedule successively in code stream, be converted to corresponding intermediate object program, is handled by follow-up link again.Coding schedule in the practical application generally all has a hundreds of code word, formulates according to the probability that each element occurs.As seen, the Hafman decoding process is actually a matching process that comprises a large amount of table lookup operations, reduces the speed that the required number of times of tabling look-up of coupling just can be accelerated Hafman decoding.The common solution of the coupling of tabling look-up in the Hafman decoding has below been described.
(1), sequential search method.From coding schedule, take out code word successively, compare,, repeat said process otherwise get next code word if identical then the match is successful according to the binary sequence of the identical figure place of the length code fetch of this code word stream stem.The efficient of sequential search method is lower, for code table with hundreds of code word, and each comparison that generally needs more than tens times, advantage is to save the shared space of code table.
(2), optimized order is searched method.Before decoding, at first coding schedule is reset, arranged from big to small, in the sequential search method, use the coding schedule after putting in order then according to the probability that each code word occurs.The code word that probability of occurrence is big only needs less number of comparisons just can search successfully, and the number of times of totally tabling look-up decreases, and speed increases than original sequential search method.
(3), index search method.All code words in the code table according to preceding some classification, are formed several less relatively sub-coding schedules and a concordance list.Preceding some positions of code fetch stream at first during decoding relatively obtain sub-coding schedule under this code word according to concordance list, carry out sequential search then in this sub-coding schedule.Because number of codewords is less in each sub-coding schedule, this method has reduced the number of times of tabling look-up.
The probability that the code word that the existing Hafman decoding algorithm that generally uses mainly utilizes coding schedule to embody occurs, the number of times of tabling look-up is all many.Because what the probability that each code word occurs embodied is the independent probability that this code word occurs, in fact two code words that occur before and after often have certain conditional probability relation, make full use of the number of times that this relation can reduce table lookup operation, improve the speed of Hafman decoding algorithm.
Summary of the invention
The object of the present invention is to provide a kind of method of the Hafman decoding based on conditional probability, thereby can reach the remarkable minimizing number of times of tabling look-up, improve decoding speed greatly.
To achieve these goals, the method for a kind of Hafman decoding based on conditional probability provided by the invention may further comprise the steps:
1), obtain the conditional probability that each code word occurs in the data to be compressed, constitute the conditional probability table of each code word; Described conditional probability is that the probability of certain code word appearred in current location after certain code word appearred in last code word;
2), in the code word probability is arranged from big to small, select preceding 50% code word to make up corresponding sub-coding schedule respectively as last code word separately, this sub-coding schedule is to be made of greater than 0.1 code word and index value thereof condition probable value in the respective conditions probability tables in the described last code word; Described step conditional probability value rearranges sub-coding schedule greater than 0.1 code word by the probability size that it appears at after the specific last code word;
3), to each code word coupling of tabling look-up, according to last code word and step 2) sub-coding schedule existence whether, the original encoding table of sub-coding schedule of corresponding selection or data to be compressed is current code table, and finds the coupling code word in order, obtains relevant information and finishes decoding; The described matching process of tabling look-up comprises:
3.a), last code word is set for empty;
3.b), judge whether last code word is empty, if, be current code table otherwise select the corresponding sub-coding schedule of last code word for the empty original encoding table of just selecting is current code table;
3.c), check whether described sub-coding schedule exists, if existed in current code table the code word of sequential search coupling, be the current code table code word of sequential search coupling again otherwise select the original encoding table;
3.d), check and can find the coupling code word, if finding the coupling code word that last code word just is set is current coupling code word, and in the original encoding table, obtain relevant information, and get back to 3.b according to the manipulative indexing value) step, if do not find the coupling code word just to judge whether current code table is the original encoding table;
3.e), judge whether current code table is the original encoding table, if for the original encoding table then continue the code word of in original encoding table sequential search coupling and obtain relevant information, and get back to 3.b) step, if current code table is the original encoding table or can not find the coupling code word then the end that reports an error in the original encoding table.
Step 2 of the present invention) index value is the positional value of corresponding code word at the original encoding table of data to be compressed, is used for searching this code word all relevant informations at the original encoding table.
Contrast sequential search method, the present invention can significantly reduce the number of times of tabling look-up, and obviously improves decoding speed; What contrast the index search method simply carries out classified index according to some positions before the code word, the present invention is because the conditional probability of having utilized code word to occur is constructed a plurality of sub-coding schedules, common code word can be finished matching process sooner, reduces the number of times of tabling look-up, and has improved decoding speed generally.
Description of drawings
Fig. 1 is the schematic flow sheet of statistics code word conditional probability of the present invention;
Fig. 2 utilizes the Huffman of the conditional probability schematic flow sheet of tabling look-up for the present invention.
Embodiment
The method that the present invention is based on the Hafman decoding of conditional probability mainly comprises following three parts:
One, structural environment probability tables
Described conditional probability is that the probability of certain code word appearred in current location after certain code word appearred in last code word, is meant the probability that occurs code word B behind code word A as the conditional probability P (A|B) of B code word.Can utilize existing Hafman decoding algorithm, add up the conditional probability of each code word by a large amount of actual decode procedures, as shown in Figure 1, be the schematic flow sheet of statistics code word conditional probability.Can obtain a plurality of clauses and subclauses about numeral according to Fig. 1 algorithm, current code word was the occurrence number of Y when each clauses and subclauses represented that last code word is X, and (X, Y), X and Y can be the arbitrary code words in the code table to be expressed as p.Make the number of times summation of T (X) for last code word all code words that current code word occurs when being X, then T (X)=p (X, A)+p (X, B)+and p (X, C)+p (X, D), X can be the arbitrary code word in the code table, and A/B/C/D represents all code words in the code table, also need continue summation if any other code words.Conditional probability P (X|Y)=p (X, Y)/T (X), X and Y can be the arbitrary code words in the code table.
Obtain structure code word conditional probability table as shown in table 1 after the conditional probability of each code word.
Table 1 code word conditional probability table
Two, the new coding schedule collection of structure
In arranging from big to small, the code word probability selects several code words of preceding 50%, according to the corresponding sub-coding schedule of conditional probability table (table 1) structure.Suppose that code word A and code word C probability of occurrence are positioned at preceding 50% of big minispread, simultaneously, P (A|B)>P (A|C)>P (A|A)>P (A|D), P (C|A)>P (C|C)>P (C|B)>P (C|D), then according to condition probability is constructed two subcode tables such as following table 2 and table 3 from big to small.Index value Sx is meant the positional value of code word X in the original encoding table, is used for searching this code word all relevant informations at the original encoding table.
Be subject to the memory capacity of Hafman decoding device, all code words all constructed a subcode table respectively will take a large amount of storage resources.The probability that occurs according to the restriction of memory capacity and code word, for back 50% in the code word probability is arranged from big to small code word constructor coding schedule not, in the big minispread of code word probability, be in back 50% position as code word D, then do not construct sub-coding schedule as last character code; Can not enroll corresponding sub-coding schedule for the conditional probability value less than 0.1 code word, all not enroll code word D as table 2 and table 3.All sub-coding schedules and original encoding table constitute new coding schedule collection.
Code word |
Index value |
B |
Sb |
C |
Sc |
A |
Sa |
The sub-coding schedule A of table 2
Code word |
Index value |
A |
Sa |
C |
Sc |
B |
Sb |
The sub-coding schedule C of table 3
Three, utilize the Huffman of conditional probability to table look-up
As shown in Figure 2, utilize the Huffman of the conditional probability schematic flow sheet of tabling look-up for the present invention.Its matching process of tabling look-up comprises:
1), judges whether a last code word is empty, if, be current code table otherwise select described sub-coding schedule for the empty original encoding table of just selecting is current code table;
2), check whether described sub-coding schedule exists, if existed in current code table the code word of sequential search coupling, be the current code table code word of sequential search coupling again otherwise select the original encoding table;
3), can inspection find the coupling code word, if finding the coupling code word that one code word just is set is current coupling code word, and according to obtaining relevant information in the assorted original encoding table of manipulative indexing value, if do not find the coupling code word just current code table to be carried out former code table judgment processing;
4) judge whether current code table is former code table, if for source code then continue the code word of in original encoding table sequential search coupling and obtain relevant information, if current code table is source code or can not find the coupling code word then the end that reports an error in the original encoding table.
After finishing an above-mentioned matching process, constantly repeat process and can finish all decode operations.As the back code stream of encoding is 11 01 001 1 01 11 000 1 001 01, establishes in the original encoding table each code word relevant information and puts in order and be A B C D, and coding schedule is: 1-A, 01-B, 001-C, 000-D.For first code word, a last code word is empty, directly uses the original encoding table to carry out sequential search, just finds code word A the 1st position, and index value is exactly 1, obtains the code word relevant information according to index value.For second code word, a last code word is A, uses the subcode Table A to carry out sequential search, and the index value Sa that finds A in the subcode table is 1, can find the relevant information of code word A according to index value the 1st position in the original encoding table.For the 4th code word C, a last code word is B, and the subcode table does not exist, and selects the original encoding table to carry out sequential search as current code table, and remaining process is identical with first code word.For the 9th code word D, a last code word is A, and chooser code table A carries out sequential search, do not enroll code word D in this subcode table, the coupling code word does not exist, and selects the original encoding table to search once as current code table again, find the coupling code word the 4th position, obtain relevant information.
Because common code word is all enrolled each sub-coding schedule, the number of codewords of each sub-coding schedule can be not a lot, and the major part process of tabling look-up can be finished coupling in sub-coding schedule, have only some uncommon code word just need search in the more original encoding table of code word.As seen most of code word is searched number of times and can be finished coupling through less, totally searches number of times and will be starkly lower than the sequential search method.