CN102811113A - Character-type message compression method - Google Patents

Character-type message compression method Download PDF

Info

Publication number
CN102811113A
CN102811113A CN2012102412204A CN201210241220A CN102811113A CN 102811113 A CN102811113 A CN 102811113A CN 2012102412204 A CN2012102412204 A CN 2012102412204A CN 201210241220 A CN201210241220 A CN 201210241220A CN 102811113 A CN102811113 A CN 102811113A
Authority
CN
China
Prior art keywords
character
message
coding
frequency
interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012102412204A
Other languages
Chinese (zh)
Other versions
CN102811113B (en
Inventor
常传文
李玮
茅文深
鉴福升
林明
夏宁
吴杰
姚浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201210241220.4A priority Critical patent/CN102811113B/en
Publication of CN102811113A publication Critical patent/CN102811113A/en
Application granted granted Critical
Publication of CN102811113B publication Critical patent/CN102811113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a character-type message compression method, and provides an optimized self-adaptive updating method for updating a frequency table. The method is characterized in that updating is conducted character by character in a coding process, i.e. the frequency table is updated after each character is arithmetically coded in a current message. The updating of the frequency table occupies certain calculation quantity, and if the calculation resource is limited, a first way cannot be adaptive. The invention adopts the other way, i.e. the updating of the frequency table is conducted by adopting a plurality of messages as a unit, only the appearance times of each character is recorded after a single message is arithmetically coded character by character, and the updating of the frequency table is conducted according to a record situation after the coding of the messages reaching the set quantity is ended. Due to the adoption of the character-type message compression method, the nondestructive compression of the message can be effectively realized, the problems such as long delay, occupation of surplus bandwidth, occupation of vast storage resource and the like in the application process such as message sharing, storage, allocation and the like can be solved, so that the compression ratio approximates or reaches a maximal value of entropy coding theory.

Description

A kind of character type message compression method
Technical field
The present invention relates to a kind of effective compression method based on the character type message; Have the advantages that according to the character type message limited character set closes; Introduce the static frequency space and upgrade adaptively, and combine correlation technique such as arithmetic coding, obtained the good compression effect.
The present invention be applicable to any based on limited character set close message compression such as share, occasions such as storage, transmission, especially the message transmissions real-time is required all can satisfy its application demand well than under the condition with higher through actual verification.
Background technology
Whether data compression method has loss can be divided into two types according to amount of information before and after the compression, is respectively lossy compression method and lossless compress.Lossy compression method is meant that the data of using after the compression carry out reconstruct (perhaps be called reduction, decompress), and data after the reconstruct and original data are different; And after lossless compress was meant that the data of using after the compression are carried out reconstruct, data and original data were identical, and the method that patent of the present invention is set forth is a kind of lossless compression method.
Lossless data compression is divided according to the technology of realization, can be divided into prediction, dictionary, statistics three major types.Predictive coding mainly is according to the characteristics that exist certain relevance between the discrete signal; Utilize one or more signals of front that next signal is predicted; Then poor (predicated error) of actual value and predicted value encoded; Typical method has DPCM, ADPCM etc., and they are more suitable for the compression of sound, view data.Dictionary encoding mainly is to utilize data itself to comprise the characteristic of the character string of more repetition, and its basic principle is constantly from character stream, to extract new character string, replaces this character string with code name then, thereby realizes compression, and typical method has LZW coding etc.The LZW coding is through in cataloged procedure, dynamically generating a string table, replacing long character string to realize compression with short code name.Statistical coding is called entropy coding method again, mainly compresses according to the distribution characteristics of character probability of occurrence, and typical method has run-length encoding, huffman coding, arithmetic coding etc.The basic principle of run-length encoding is the continuous symbol that has equal values with a value of symbol or string replacement, makes symbol lengths be less than the length of initial data, is applicable to that occasion repeatedly appears in prosign continuously; The basic principle of huffman coding is that the big information symbol of probability of occurrence is compiled the short code word, and the information symbol that probability of occurrence is little is compiled long code word; The notion of arithmetic coding is to be proposed in nineteen sixty by Peter Elias, though it is set up on mathematics, can not pass through computer realization, does not obtain practical application at that time.1976; R.Pasco and J.Rissanen have realized the arithmetic coding of limited precision respectively with the register of fixed length, it can be realized on computers, and its basic principle is that the message table of encoding is shown as an interval between real number 0 and 1; Message is long more; Its interval of coded representation is more little, representes that this required at interval binary digit is just many more, and the symbol that probability of happening is bigger makes interval slower variation in coding; Just produce less figure place in the coding result; Whole cataloged procedure has adopted the thought that replaces a string incoming symbol with an independent floating number, has avoided using a certain code word to replace an incoming symbol, also is the problem that bit number must round in the huffman coding.By contrast, arithmetic coding has higher efficient and superiority, and especially when the symbol that comprises in the information source was fewer, such as having only two symbols, arithmetic coding obviously more had superiority, and huffman coding does not almost have any compression effectiveness.
The use of communication message (follow-up abbreviation message) is very general, and such as radar target information, positional information, temporal information etc., it mainly is made up of character.Character is meant the letter that uses in the computer, numbers and symbols etc., and its storage needs a byte, specifically sees ASC II code table for details.Along with the arriving of information age, the storage of various messages presents mass property, brings than big pressure for sharing, store, distributing.Vehicle (bus, taxi) monitoring, dispatching patcher such as covering whole city; Each vehicle transfers to the center with self attributes (like position, state) etc. through the special packet form; Its mobility decision must communicate through wireless mode; Simultaneously, the account of the history database can be set up for each vehicle in the center, and the information of vehicles of enormous amount makes troubles for communication, storage.In actual use; For ease of observation, mutual; Used message format in a large number with character feature; Such as the message format of widely used NMEA-0183, it is the reference format that American National ocean Institution of Electronics formulates with electronic equipment for the sea, has become the unified standard agreement of GPS navigation equipment at present already.
At present, for the use (like transmission, storage) of character type message format, all be that uncompressed is directly handled basically, from existing literature and openly material inquiry, the compression scheme of employing has:
1. adopt binary-coded decimal that message is compressed
Binary-coded decimal is also claimed binary code decimal number or binary-decimal code, is a kind of binary digital coding form, is applicable to 0 ~ 9 these ten numerals are handled, and fixing 4 bits that use are represented ten numerals.
This scheme restricted application only is suitable for numerical character is compressed, for letter etc. and inapplicable.
2. adopt the expansion binary-coded decimal that message is compressed
With all character binarizations in the character set, and use the data represented character after the binarization, to realize compression.Such as 100 character sets are arranged, to its binarization, then each character will distribute 7 binary digits.
This scheme is a kind of typical equiprobable huffman coding method, thinks that each character is equiprobable, does not consider the character probabilities characteristic, the presence bit waste, and compression ratio is limited.
3. adopt huffman coding that message is compressed
Huffman coding uses variable length coding table to the source encoding symbols, and wherein variable length coding table is to obtain through a kind of method of assessing the source symbol frequency of occurrences, and the symbol that the frequency of occurrences is big uses short coding, otherwise uses long coding.Traditional huffman coding is a kind of coding method of static state; The frequency that it mainly occurs through each character in the statistics initial data; And create Hofman tree thus; Thereby initial data is encoded, and this method has very big limitation in real application systems, especially in such as real-time Transmission, treatment systems such as communications.Therefore, in the message compression, be not widely used.Adaptive Huffman coding is a kind of dynamic coding method to said method; In the message compression, be applied; It is the Hofman tree of dynamic change to the foundation of data coding, and promptly the coding to N+1 character is to carry out according to the Hofman tree that top n character in the initial data obtains, and whenever reads in the counting that character will be adjusted in a character; And carry out the renewal of Hofman tree, thereby guarantee that code efficiency is the highest.
This scheme is not considered joint probability, and owing to bit number in the cataloged procedure must round, makes compression efficiency produce discount, has caused the waste of output code flow.
Summary of the invention
Goal of the invention: the present invention just is being based on the above-mentioned problem that is run at processing character type message format; Character-oriented type message format; A kind of general harmless message compression method has been proposed; This method is based on arithmetic coding, and the foundation of introducing static frequency table and adaptive frequency table, can realize the lossless compress of message effectively; Improved the time-delay that runs in the application processes such as message is shared, storage, distribution higher, take unnecessary bandwidth, use problems such as big storage resources, make compression ratio near or reach the theoretical maximum of entropy coding.
Technical scheme: a kind of character type message compression method comprises the steps:
The character set of supposing this character type message is A, and its character number is n, and character probabilities is P i, then have
a i∈A
Σ i = 1 n P i = 1 , 1≤i≤n wherein
(1) preliminary treatment
When first use character type message format is encoded, need initialization frequency meter _ Adapt_Table, concrete mode has two kinds: the one, to the character set characteristics of message,, distribute to P in conjunction with concrete environment for use iOccurrence, thus the empirical value static frequency table _ Exper_Table of character set created, and its occurrence is composed to frequency meter _ Adapt_Table; The 2nd, create equiprobability static frequency table _ EqualPro_Table, promptly
P i = 1 n
And compose to give frequency meter _ Adapt_Table with it.In actual use, can select initialization mode according to real needs;
(2) receive a message
Suppose that a said message that receives is Message, character string is B, and the sequential element number is m, promptly
b j∈ A, wherein 1≤j≤m
(3) read in character
Each character of the said message Message that receives is read in one by one, and supposing to read in character is b j, 1≤j≤m, its probability are P Bj
(4) arithmetic coding
According to current frequency meter _ Adapt_Table, and combine current character frequency P BjThis character is carried out arithmetic coding;
(5) judge whether the renewal frequency table
According to the actual requirements, in cataloged procedure, said frequency meter _ Adapt_Table upgrades and can carry out character by character, promptly to after each character carries out arithmetic coding among this message Message, and equal renewal frequency table; Can some messages be that unit carries out also, after promptly character carries out arithmetic coding one by one to the wall scroll message, only write down the number of times that each character occurs, after some message codings that reach setting finish, carry out the renewal of frequency meter again according to record case; If need renewal frequency table _ Adapt_Table, then carry out next step (6), otherwise jump to step (7);
(6) renewal frequency table
Through fresh character b more jFrequency P Bj, and then renewal frequency table _ Adapt_Table;
(7) whether this message coding finishes
If this message Message coding does not finish, then jump to step (3), continue the coding character late, otherwise carry out next step (8);
(8) judged whether next bar message
If, execution in step (9) then, otherwise execution in step (11) promptly finishes this coding;
(9) judge whether the renewal frequency table
Is that unit carries out under the situation of renewal frequency table _ Adapt_Table method for adopting said with some messages; After this message Message end-of-encode,, otherwise jump to step (2) if the renewal frequency table is then carried out next step; Read in next bar message, continue coding;
(10) renewal frequency table
Use the character occurrence number that is write down to carry out the renewal of frequency meter _ Adapt_Table;
(11) finish
Finish this coding.
Arithmetic coding in the said step (4) comprises the steps:
Suppose that the initial code interval that arithmetic coding adopts is [0, Max], Max is interval maximum; Be traditionally arranged to be 0xFFFF, the interval is [Low, High] in the cataloged procedure; Interval range is Range, and wherein Low is interval lower edge, is initially 0; High is interval upper edge, is initially Max, and reading in character is b j, its frequency is P Bj, cumulative frequency is CumP Bj, promptly value of symbol is less than the total of the frequency of this symbol.
(41) initialization
Initialization codes interval [0, Max] is set up frequency meter;
(42) read in character b j
Said each character of message Message is read in one by one, and supposing to read in character is b j, 1≤j≤m, its probability are P Bj
(43) between update area
According to current frequency meter and P BjAnd CumP Bj, between update area [Low, High], concrete computing formula is following:
Range=High-Low+1
High=Low+Range*(CumP bj+P bj)-1
Low=Low+Range*CumP bj
(44) normalization
Whether [Low, High] satisfies the condition that continues coding between the test zone, continues coding if satisfy, otherwise interval [Low, High] carried out the normalization operation;
(45) judge whether the renewal frequency table
If then carry out next step (46), otherwise jump to step (47);
(46) renewal frequency table
Upgrade the frequency P of said code character BjAnd corresponding cumulative frequency CumP Bj, i.e. renewal frequency table;
(47) judge whether to finish
If, then finish coding this time, otherwise jump to step (42), continue next character of coding.
In the said step 44, interval [Low, High] carried out the normalization operation, specifically is divided into following three kinds of situation:
Situation one: interval upper edge highest order is 1, and an inferior high position is 0, and the lower edge highest order is 0, and an inferior high position is 1, it is done an inferior high position is shifted out operation, promptly neglects a time high position, and notes and ignore time high-order number of times Case1Num;
Situation two: the lower edge highest order all is 0 on the interval; Then carry out last lower edge is moved to left 1, and the upper edge adds 1 operation, and add shift-out bit to output code flow; After this check whether Case1Num is 0; If be not 0, then the highest order negate is called Case1Bit, and Case1Num Case1Bit of output is to output code flow;
Situation three: the lower edge highest order all is 1 on the interval; Then carry out last lower edge is moved to left 1, and the upper edge adds 1 operation, and add shift-out bit to output code flow; After this check whether Case1Num is 0; If be not 0, then the highest order negate is called Case1Bit, and Case1Num Case1Bit of output is to output code flow.
Normalized purpose is the carrying out that prevents along with coding, and the interval more and more narrow that becomes is so that mistake appears in encoding and decoding.
According to this joint technical scheme, decoding is the inverse process of coding, repeats no more.
Beneficial effect: the present invention has following beneficial effect through practical application and demonstration:
(1) close the characteristics of message based on limited character set, arithmetic coding is applied in it above lossless compress, given full play to the advantage of arithmetic coding, compression modes such as the binary-coded decimal of comparing, huffman coding have higher compression ratio and efficient.
(2) through the foundation of use experience value frequency meter, make the initial period of message in compression process just can reach compression effectiveness preferably;
(3) introduce two kinds of methods that dynamically update frequency meter, wherein, in cataloged procedure, carry out method for updating character by character and taken into full account the character probabilities problem, increased the message compression ratio as much as possible; With some messages is that unit carries out method for updating and satisfied the computational resource constrained environment especially;
Description of drawings
Fig. 1 is the flow chart of the embodiment of the invention;
Fig. 2 is the flow chart of the arithmetic coding in the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment; Further illustrate the present invention; Should understand these embodiment only be used to the present invention is described and be not used in the restriction scope of the present invention; After having read the present invention, those skilled in the art all fall within the application's accompanying claims institute restricted portion to the modification of the various equivalent form of values of the present invention.
As shown in Figure 1, this programme is accomplished the initialization of frequency meter in pre-treatment step, and it is accomplished through adopting equiprobability static frequency table or empirical value static probability table dual mode.Provided a kind of adaptive updates method of optimization for the renewal of frequency meter; Be embodied in step (6) and (10), it adopts dual mode, and the one, in cataloged procedure, upgrade character by character; Promptly to after each character carries out arithmetic coding in this message, equal renewal frequency tables.Renewal to frequency meter can take certain amount of calculation, if computational resource is limited, first kind of mode can't be suitable for.The present invention can adopt other a kind of mode; The renewal that is frequency meter is that unit carries out with some messages; After character carries out arithmetic coding one by one to the wall scroll message; Only write down the number of times that each character occurs, after some message codings that reach setting finish, carry out the renewal of frequency meter again according to record case.
The variable declaration that relates in this scheme is following:
1. _ and Exper_Table: the static frequency table set up of value rule of thumb;
2. _ and EqualPro_Table: each character probabilities equates in the character set, i.e. equiprobability static frequency table;
3. _ and Adapt_Table: the adaptive frequency table in the cataloged procedure.
Suppose that in the character type message might character set be A, the set element number is n, wherein, and character a iProbability of occurrence is P i, then have:
a i∈A
Σ i = 1 n P i = 1 , 1≤i≤n wherein
The technical scheme steps that the present invention adopts is following, and particular flow sheet is seen accompanying drawing 1:
(1) preliminary treatment
When first this message format of use is encoded, need initialization frequency meter _ Adapt_Table.Concrete mode has two kinds, and the one, can in conjunction with concrete environment for use, distribute to P to the character set characteristics of message iOccurrence, thus the empirical value static frequency table _ Exper_Table of character set created, and its occurrence is composed to _ Adapt_Table; The 2nd, create equiprobability static frequency table _ EqualPro_Table, promptly
P i = 1 n
And it compose is given _ Adapt_Table.In actual use, can select initialization mode according to real needs;
(2) receive a message
Suppose that this message is Message, character string is B, and the sequential element number is m, promptly
b j∈ B, wherein 1≤j≤m
(3) read in character
Each character of this message Message is read in one by one, and supposing to read in character is b j, 1≤j≤m, its probability are P Bj
(4) arithmetic coding
According to current frequency meter, and combine current character frequency P BjThis character is carried out arithmetic coding;
(5) renewal frequency table whether
According to the actual requirements, in cataloged procedure, frequency meter _ Adapt_Table upgrades and can carry out character by character, promptly to after each character carries out arithmetic coding among this message Message, and equal renewal frequency table; Can some messages be that unit carries out also, after promptly character carries out arithmetic coding one by one to the wall scroll message, only write down the number of times that each character occurs, after some message codings that reach setting finish, carry out the renewal of frequency meter again according to record case.
Concrete steps are then carried out next step (6), otherwise are jumped to step (7) for if need renewal frequency table _ Adapt_Table;
(6) renewal frequency table
Through fresh character b more jFrequency P Bj, and then renewal frequency table _ Adapt_Table;
(7) whether this message coding finishes
If this message Message coding does not finish, then jump to step (3), continue the coding character late, otherwise carry out next step (8);
(8) whether next bar message is arranged
If, execution in step (9) then, otherwise execution in step (11) promptly finishes this coding;
(9) renewal frequency table whether
Under the situation that adopts above-mentioned second kind of renewal frequency table _ Adapt_Table method, after this message Message end-of-encode,, otherwise jump to step (2) if the renewal frequency table is then carried out next step, read in next bar message, continue coding;
(10) renewal frequency table
Be specially the character occurrence number that use writes down and carry out the renewal of frequency meter _ Adapt_Table;
(11) finish
Finish this coding.
For step (4) arithmetic coding in the technical scheme of the present invention's employing, its detailed process is following, and particular flow sheet is seen accompanying drawing 2:
Suppose that the initial code interval that arithmetic coding adopts is [0, Max], Max is interval maximum; Be traditionally arranged to be 0xFFFF, the interval is [Low, High] in the cataloged procedure; Interval range is Range, and wherein Low is interval lower edge, is initially 0; High is interval upper edge, is initially Max, and reading in character is b j, its frequency is P Bj, cumulative frequency is CumP Bj, promptly value of symbol is less than the total of the frequency of this symbol.
(41) initialization
Initialization codes interval [0, Max] is set up frequency meter etc.;
(42) read in character b j
Each character of this message Message is read in one by one, and supposing to read in character is b j, 1≤j≤m, its probability are P Bj
(43) between update area
According to current frequency meter and P BjAnd CumP Bj, between update area [Low, High], concrete computing formula is following:
Range=High-Low+1
High=Low+Range*(CumP bj+P bj)-1
Low=Low+Range*CumP bj
(44) normalization
Whether [Low, High] satisfies the condition that continues coding between the test zone, continues coding if satisfy, otherwise interval [Low, High] carried out the normalization operation, specifically is divided into three kinds of situation:
Situation one: interval upper edge highest order is 1, and an inferior high position is 0, and the lower edge highest order is 0, and an inferior high position is 1, it is done an inferior high position is shifted out operation, promptly neglects a time high position, and notes and ignore time high-order number of times Case1Num;
Situation two: the lower edge highest order all is 0 on the interval; Then carry out last lower edge is moved to left 1, and the upper edge adds 1 operation, and add shift-out bit to output code flow; After this check whether Case1Num is 0; If be not 0, then the highest order negate is called Case1Bit, and Case1Num Case1Bit of output is to output code flow;
Situation three: the lower edge highest order all is 1 on the interval; Then carry out last lower edge is moved to left 1, and the upper edge adds 1 operation, and add shift-out bit to output code flow; After this check whether Case1Num is 0; If be not 0, then the highest order negate is called Case1Bit, and Case1Num Case1Bit of output is to output code flow.
Normalized purpose is the carrying out that prevents along with coding, and the interval more and more narrow that becomes is so that mistake appears in encoding and decoding;
(45) renewal frequency table whether
If then carry out next step (46), otherwise jump to step (47);
(46) renewal frequency table
Upgrade the frequency P of this code character BjAnd corresponding cumulative frequency CumP Bj, i.e. renewal frequency table;
(47) whether finish
If, then finish coding this time, otherwise jump to step (42), continue next character of coding.
According to this joint technical scheme, decoding is the inverse process of coding, repeats no more.
Form with widely used NMEA-0183 in the locating information is an example below, technical scheme of the present invention is elaborated, but protection scope of the present invention is not limited to said embodiment.
Specifically the message format with expression geo-localisation information among the NMEA-0183 is an example, and supposes that this message Message is ” $GPGLL, 4250.5589, and S; 14718.5084, E, 092204.999; A*2D ", each field separates with comma in the message, and the specifying information of each field representative is following:
Field 0:$GPGLL, statement ID shows that this statement is Geographic Position (GLL) geo-localisation information;
Field 1: latitude ddmm.mmmm, degree cellular (leading figure place deficiency then mends 0);
Field 2: latitude N (north latitude) or S (south latitude);
Field 3: longitude dddmm.mmmm, degree cellular (leading figure place deficiency then mends 0);
Field 4: longitude E (east longitude) or W (west longitude);
The field 5:UTC time, the hhmmss.sss form;
Field 6: state, A=location, V=no-fix;
Field 7: check value.
The specific coding step of this message is following:
(1) preliminary treatment
Characteristics such as frequent occur to numeral and comma in the message character set,, create the empirical value static frequency table _ Exper_Table of character set, and general _ Adapt_Table is initialized as _ Exper_Table in conjunction with concrete environment for use;
(2) read in character
Each character of this message Message is read in one by one;
(3) arithmetic coding
According to current frequency meter, and combine the current character frequency that this character is carried out arithmetic coding;
(4) renewal frequency table whether
According to the actual requirements, in cataloged procedure, the renewal of frequency meter _ Adapt_Table can be carried out character by character, promptly to after each character carries out arithmetic coding among this message Message, and equal renewal frequency table; Can some messages be that unit carries out also, after promptly character carries out arithmetic coding one by one to the wall scroll message, only write down the number of times that each character occurs, after some message codings that reach setting finish, carry out the renewal of frequency meter again according to record case.
Concrete steps are then carried out next step (5), otherwise are jumped to step (6) for if need renewal frequency table _ Adapt_Table;
(5) renewal frequency table
Through upgrading the frequency of this code character, and then renewal frequency table _ Adapt_Table;
(6) whether this message coding finishes
If this message Message coding does not finish, then jump to step (2), continue the coding character late, otherwise carry out next step (7);
(7) finish
Finish this coding.

Claims (3)

1. a character type message compression method is characterized in that: comprise the steps:
The character set of supposing this character type message is A, and its character number is n, and character probabilities is P i, then have
a i∈A
Figure FDA00001877645400011
be 1≤i≤n wherein
(1) preliminary treatment
When first use character type message format is encoded, need initialization frequency meter _ Adapt_Table, and it is composed to frequency meter _ Adapt_Table;
(2) receive a message
Suppose that a said message that receives is Message, character string is B, and the sequential element number is m, promptly
b j∈ A, wherein 1≤j≤m
(3) read in character
Each character of the said message Message that receives is read in one by one, and supposing to read in character is b j, 1≤j≤m, its probability are P Bj
(4) arithmetic coding
According to current frequency meter _ Adapt_Table, and combine current character frequency P BjThis character is carried out arithmetic coding;
The concrete steps of arithmetic coding are:
Suppose that the initial code interval that arithmetic coding adopts is [0, Max], Max is interval maximum, is set to 0xFFFF; The interval is [Low, High] in the cataloged procedure, and interval range is Range, and wherein Low is interval lower edge; Be initially 0, High is interval upper edge, is initially Max, and reading in character is b j, its frequency is P Bj, cumulative frequency is CumP Bj, promptly value of symbol is less than the total of the frequency of this symbol;
(41) initialization
Initialization codes interval [0, Max] is set up frequency meter;
(42) read in character b j
Said each character of message Message is read in one by one, and supposing to read in character is b j, 1≤j≤m, its probability are P Bj
(43) between update area
According to current frequency meter and P BjAnd CumP Bj, between update area [Low, High], concrete computing formula is following:
Range=High-Low+1
High=Low+Range*(CumP bj+P bj)-1
Low=Low+Range*CumP bj
(44) normalization
Whether [Low, High] satisfies the condition that continues coding between the test zone, continues coding if satisfy, otherwise interval [Low, High] carried out the normalization operation;
Interval [Low, High] carried out the normalization operation, specifically is divided into following three kinds of situation:
Situation one: interval upper edge highest order is 1, and an inferior high position is 0, and the lower edge highest order is 0, and an inferior high position is 1, it is done an inferior high position is shifted out operation, promptly neglects a time high position, and notes and ignore time high-order number of times Case1Num;
Situation two: the lower edge highest order all is 0 on the interval; Then carry out last lower edge is moved to left 1, and the upper edge adds 1 operation, and add shift-out bit to output code flow; After this check whether Case1Num is 0; If be not 0, then the highest order negate is called Case1Bit, and Case1Num Case1Bit of output is to output code flow;
Situation three: the lower edge highest order all is 1 on the interval; Then carry out last lower edge is moved to left 1, and the upper edge adds 1 operation, and add shift-out bit to output code flow; After this check whether Case1Num is 0; If be not 0, then the highest order negate is called Case1Bit, and Case1Num Case1Bit of output is to output code flow.
(45) judge whether the renewal frequency table
If then carry out next step (46), otherwise jump to step (47);
(46) renewal frequency table
Upgrade the frequency P of said code character BjAnd corresponding cumulative frequency CumP Bj, i.e. renewal frequency table;
(47) judge whether to finish
If, then finish coding this time, otherwise jump to step (42), continue next character of coding.
(5) judge whether the renewal frequency table
According to the actual requirements, in cataloged procedure, said frequency meter _ Adapt_Table upgrades or carries out character by character, promptly to after each character carries out arithmetic coding among this message Message, and equal renewal frequency table; Be that unit carries out perhaps, after promptly character carries out arithmetic coding one by one to the wall scroll message, only write down the number of times that each character occurs, after some message codings that reach setting finish, carry out the renewal of frequency meter again according to record case with some messages; If need renewal frequency table _ Adapt_Table, then carry out next step (6), otherwise jump to step (7);
(6) renewal frequency table
Through fresh character b more jFrequency P Bj, and then renewal frequency table _ Adapt_Table;
(7) whether this message coding finishes
If this message Message coding does not finish, then jump to step (3), continue the coding character late, otherwise carry out next step (8);
(8) judged whether next bar message
If, execution in step (9) then, otherwise execution in step (11) promptly finishes this coding;
(9) judge whether the renewal frequency table
Is that unit carries out under the situation of renewal frequency table _ Adapt_Table method for adopting said with some messages; After this message Message end-of-encode,, otherwise jump to step (2) if the renewal frequency table is then carried out next step; Read in next bar message, continue coding;
(10) renewal frequency table
Use the character occurrence number that is write down to carry out the renewal of frequency meter _ Adapt_Table;
(11) finish
Finish this coding.
2. character type message compression method as claimed in claim 1 is characterized in that: the initialization to frequency meter proposes to press empirical value and equiprobability dual mode, wherein, presses the character set characteristics of empirical value mode to message, in conjunction with concrete environment for use, distributes to P iOccurrence, thus the empirical value static frequency table _ Exper_Table of character set created, and it is composed give frequency meter _ Adapt_Table; The equiprobability mode then is to create equiprobability static frequency table _ EqualPro_Table, promptly
Figure FDA00001877645400031
also compose to give frequency meter _ Adapt_Table with it.
3. character type message compression method as claimed in claim 1 is characterized in that: decoding is the inverse process of coding.
CN201210241220.4A 2012-07-12 2012-07-12 Character-type message compression method Active CN102811113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210241220.4A CN102811113B (en) 2012-07-12 2012-07-12 Character-type message compression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210241220.4A CN102811113B (en) 2012-07-12 2012-07-12 Character-type message compression method

Publications (2)

Publication Number Publication Date
CN102811113A true CN102811113A (en) 2012-12-05
CN102811113B CN102811113B (en) 2014-12-10

Family

ID=47234703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210241220.4A Active CN102811113B (en) 2012-07-12 2012-07-12 Character-type message compression method

Country Status (1)

Country Link
CN (1) CN102811113B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107846263A (en) * 2017-11-01 2018-03-27 中国电子科技集团公司第二十八研究所 A kind of information source binary arithmetic coding method and coded system calculated based on segmentation
CN116015312A (en) * 2023-03-28 2023-04-25 山东奔虎智能科技有限公司 Gas alarm system data storage method based on Internet of things platform
CN116702708A (en) * 2023-08-04 2023-09-05 陕西交通电子工程科技有限公司 Road pavement construction data management system
CN116896769A (en) * 2023-09-11 2023-10-17 深圳市久实电子实业有限公司 Optimized transmission method for motorcycle Bluetooth sound data
WO2024060351A1 (en) * 2022-09-20 2024-03-28 Hong Kong Applied Science and Technology Research Institute Company Limited Hardware implementation of frequency table generation for asymmetric-numeral-system-based data compression

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1249083A (en) * 1996-12-30 2000-03-29 艾利森电话股份有限公司 Method and means for handling information
US20040075596A1 (en) * 2000-09-28 2004-04-22 Richard Price Huffman data compression method
CN101282121A (en) * 2007-04-05 2008-10-08 安凯(广州)软件技术有限公司 Method for decoding Haffmann based on conditional probability
CN101534124A (en) * 2008-12-16 2009-09-16 北京航空航天大学 Compression algorithm for short natural language

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1249083A (en) * 1996-12-30 2000-03-29 艾利森电话股份有限公司 Method and means for handling information
US20040075596A1 (en) * 2000-09-28 2004-04-22 Richard Price Huffman data compression method
CN101282121A (en) * 2007-04-05 2008-10-08 安凯(广州)软件技术有限公司 Method for decoding Haffmann based on conditional probability
CN101534124A (en) * 2008-12-16 2009-09-16 北京航空航天大学 Compression algorithm for short natural language

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107846263A (en) * 2017-11-01 2018-03-27 中国电子科技集团公司第二十八研究所 A kind of information source binary arithmetic coding method and coded system calculated based on segmentation
CN107846263B (en) * 2017-11-01 2020-07-14 南京莱斯电子设备有限公司 Information source binary arithmetic coding method and coding system based on segmented calculation
WO2024060351A1 (en) * 2022-09-20 2024-03-28 Hong Kong Applied Science and Technology Research Institute Company Limited Hardware implementation of frequency table generation for asymmetric-numeral-system-based data compression
CN116015312A (en) * 2023-03-28 2023-04-25 山东奔虎智能科技有限公司 Gas alarm system data storage method based on Internet of things platform
CN116702708A (en) * 2023-08-04 2023-09-05 陕西交通电子工程科技有限公司 Road pavement construction data management system
CN116702708B (en) * 2023-08-04 2023-11-03 陕西交通电子工程科技有限公司 Road pavement construction data management system
CN116896769A (en) * 2023-09-11 2023-10-17 深圳市久实电子实业有限公司 Optimized transmission method for motorcycle Bluetooth sound data
CN116896769B (en) * 2023-09-11 2023-11-10 深圳市久实电子实业有限公司 Optimized transmission method for motorcycle Bluetooth sound data

Also Published As

Publication number Publication date
CN102811113B (en) 2014-12-10

Similar Documents

Publication Publication Date Title
CN101501999B (en) Data coding
CN102811113B (en) Character-type message compression method
CN102811114B (en) Character-type communication message compression method adopting inter-frame coding
CN112953550B (en) Data compression method, electronic device and storage medium
US8159374B2 (en) Unicode-compatible dictionary compression
CN101165510A (en) Spaceborne synthetic aperture radar variable digit BAQ compression system and method
CN116506073B (en) Industrial computer platform data rapid transmission method and system
CN102122960A (en) Multi-character combination lossless data compression method for binary data
CN103236847A (en) Multilayer Hash structure and run coding-based lossless compression method for data
CN106170922A (en) The source code of data and the equipment of decoding and method
CN101449462A (en) High-speed data compression based on set associative cache mapping techniques
CN100367316C (en) Window idle frame memory compression
CN101483779A (en) Compressing method for two-dimension vector map
Howard et al. Parallel lossless image compression using Huffman and arithmetic coding
US10897270B2 (en) Dynamic dictionary-based data symbol encoding
CN105306951A (en) Pipeline parallel acceleration method for data compression encoding and system architecture thereof
CN104156990A (en) Lossless compressed encoding method and system supporting oversize data window
CN104468044A (en) Data compression method and device applied to network transmission
CN101534124B (en) Compression algorithm for short natural language
CN113220651B (en) Method, device, terminal equipment and storage medium for compressing operation data
CN101469989B (en) Compression method for navigation data in mobile phone network navigation
CN103152054A (en) Method and apparatus for arithmetic coding
CN103078646A (en) Dictionary lookup compression and decompression method and device
CN102394718B (en) Sensing network data compression coding/decoding method
CN101657973B (en) Recorded medium having program for coding and decoding using bit-precision, and apparatus thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant