CN106559084B - A kind of lossless data compression coding method based on arithmetic coding - Google Patents
A kind of lossless data compression coding method based on arithmetic coding Download PDFInfo
- Publication number
- CN106559084B CN106559084B CN201611026314.4A CN201611026314A CN106559084B CN 106559084 B CN106559084 B CN 106559084B CN 201611026314 A CN201611026314 A CN 201611026314A CN 106559084 B CN106559084 B CN 106559084B
- Authority
- CN
- China
- Prior art keywords
- rational
- unit interval
- symbol
- character string
- data compression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4006—Conversion to or from arithmetic code
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
A kind of lossless data compression coding method based on arithmetic coding, including following procedure: 1) encoding: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, it is that dull increasing arrangement is carried out according to subscript, mixing symbolic label to n symbol is { n-1, n-2 ..., 1,0 }, probability piSymbol mix marked as n-i, i=1,2 ..., n, it is assumed that the length of input be K character string J1J2J3...Jn‑1JK, wherein Ji∈ { 0,1,2 ..., n-1 }, the character string is so defined corresponding to the rational τ in unit interval [0,1], regards τ as decimal number, mathematically prove that it is to belong to [0,1] unit interval, converts thereof into binary representation, indicates that the binary sequence of result is the output bit flow encoded;2) it decodes: the bit stream is changed into the corresponding rational in unit interval [0,1] of decimal system rational.The present invention does not need division, and is entropy optimum code, does not depend on grouping mode of extension, usage mode is flexible.
Description
Technical field
The invention belongs to field of data compression, are related to a kind of lossless data compression coding method.
Background technique
Arithmetic coding is a kind of very classical and well-known lossless data compression coding method.Data compression technique is in information
There is extremely important status, data compression method is divided into lossy compression and lossless compression again in technology.Compression method one
As be based on the characteristic of human perception organ (auditory system, vision system), delete a large amount of perception redundant datas, but do not influence people
The perceived quality listened to rating.For example, time domain or the frequency domain mask spy of human auditory system is utilized in audio MP3 technology
Property and threshold of audibility characteristic;And the characteristic that video mpeg encoded technology takes full advantage of human eye frame sampling frequency estimates image interframe movement
It is counted as out effective compensation, so that deleting a large amount of inter-frame redundancy information achievees the effect that data compression.Lossless compressiong is usual
For computer document, medical image data compression in, data are to need stringent lossless recovery there.In addition, nothing
Damage compress technique is also used as the rear end of lossy compression system, makees further nothing to the lossy compression data flow of input
Damage compression, as shown in Figure 1.
Huffman coding is widely used entropy coding algorithm in lossless compressiong, relative to arithmetic coding method
For, maximum problem is to carry out coded treatment to character string stream based on grouping mode of extension, with the increasing of grouping width
Add, coding redundancy degree reduces, thus it is optimal to approach entropy.But since grouping width increase brings the complexity of character list design
The increase of degree, and the fixed scalability of grouping width, flexibility ratio are poor in practical applications, are actually inferior to arithmetic volume
Code.Arithmetic coding is natural streaming coding, is not grouped the concept of extension, the input character crossfire of any length can all reflect
It is mapped to the rational (or mutually disjoint segment) of [0, a 1] unit interval, applicability, flexibility are all compiled compared with Huffman
Code is high, and the accurate estimation based on the probability distribution to character source, and arithmetic coding is also entropy optimum code, the probability point of character source
Cloth estimation is more accurate, and the redundancy of arithmetic coding is lower, to optimal closer to entropy.Just because of the advantage of arithmetic coding, causes
It is technically studied most sufficiently, and authorized relevant patent is also most, although many patents were at eighties of last century 90 years
In generation, fails, but has the relevant patent of many arithmetic codings to be applied and authorized again in recent years.In turn, due to calculating
Art is encoded by widely granted patent, so leaving the space of many industrial applications to Huffman coding instead.
The thought of classical arithmetic coding be will input character crossfire be mapped to [0,1] section a rational (ten into
Form processed), then this number is shown as bit stream using binary form, which is exactly coding result;Decoding process is by bit
The decimal fraction for circulating into [0,1] section, by the probability distribution of symbolic source, Cong Shouzhi tail solves character one by one.It is specific to compile
Code process is as follows, if symbolic source is { 1,2,3 ..., n }, their probability distribution is { p1,p2,...,pn, input character
String is J1J2J3......Jn-1JKLength is K, wherein Ji∈ { 1,2 ..., n }, the left and right endpoint point in the corresponding section of the character string
It is not
Wherein arrangeIt can be seen that the corresponding interval width of the character string isThe word
The corresponding rational τ of symbol string may be defined as the midpoint τ=(τ in sectionl+τr)/2。
Decoding process is as follows, is firstly introduced into markWherein F0=0, Fn=1;The length for inputting character string is L
The corresponding section of substring left and right endpoint be τl LAnd τr L, wherein τl 0=0, τr 0=1, L=1 is enabled, is solved in four steps below
Code:
1) τ is calculated*=(τ-τl L-1)/(τr L-1-τl L-1)
2) determination meets Fi-1<=τ*<=Fi,I*As JL=i*
3)And
4) such as L < K, then L=L+1, returns to step 1, otherwise terminates.
It is exactly above the decoded method frame of arithmetic coding, still there is the thin of some software code layouts in actual implementation
Section.
Existing arithmetic coding method rests on the details aspect of software code realization or applies to arithmetic coding
In the design of other systems.
Summary of the invention
In order to overcome the shortcomings of that the present invention provides one kind not to need dependent on division existing for existing arithmetic coding method
The lossless data compression coding method based on arithmetic coding of division, and be equally entropy optimum code with existing arithmetic coding, and
It is non-grouping mode of extension, usage mode is flexible.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of lossless data compression coding method based on arithmetic coding, including following procedure:
1) it encodes: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, it is under
Mark carries out dull increasing arrangement, mixes symbolic label to n symbol as { n-1, n-2 ..., 1,0 }, probability piSymbol
It mixes marked as n-i, i=1,2 ..., n, it is assumed that the length of input is K character stringWherein Ji∈{0,1,
2 ..., n-1 }, then defining the character string corresponding to the rational τ in unit interval [0,1];
Regard τ as decimal number, mathematically prove that it is to belong to [0,1] unit interval, convert thereof into two into
Tabulation is shown, indicates that the binary sequence of result is the output bit flow encoded;
2) it decodes: the bit stream being changed into the corresponding rational in unit interval [0,1] of decimal system rational, is set as
τ enables L=1, τ1=τ or less is decoded in three steps:
2.1) gatheringIn find out and meet τL>=kpL n-kMaximum k, be denoted as k*;
2.2)JL=k*;
2.3) such as L < K, thenAnd L=L+1 is enabled, step 2.1) is returned to, is otherwise terminated.
Further, in the step 2.1), meet τL>=kpL n-kK be constantly present, at least k=0 always meets, thus
There is maximum k every time*;Obtain J1J2J3......Jn-1JKAs decoded result.
Beneficial effects of the present invention are mainly manifested in: it is also that a streaming coding makes independent of grouping mode of extension
It is flexible with mode;In addition, mathematically proving that it is also entropy optimum code.
Detailed description of the invention
Fig. 1 is schematic diagram of the lossless compressiong as the rear module of lossy compression system.
Specific embodiment
The invention will be further described below.
A kind of lossless data compression coding method based on arithmetic coding, the arithmetic coding can be regarded as from some angle
The method frame of original arithmetic coding, including following procedure are breached from basic ideas:
1) it encodes: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, it is noted that be
Dull increasing arrangement (having no this arrangement in classical arithmetic coding mode to require) is carried out according to subscript, mixes symbol to n symbol
Number marked as { n-1, n-2 ..., 1,0 }, probability piSymbol mix marked as n-i, i=1,2 ..., n, from mathematics
Upper proofAssuming that the length of input is K character string
J1J2J3......Jn-1JK, wherein Ji∈ { 0,1,2 ..., n-1 }, then defining the character string corresponding in unit interval [0,1]
Rational τ;
Regard τ as decimal number, mathematically prove that it is to belong to [0,1] unit interval, convert thereof into two into
Tabulation is shown, indicates that the binary sequence of result is the output bit flow encoded;
2) it decodes: above-mentioned bit stream being changed into the corresponding rational in unit interval [0,1] of decimal system rational, is set as
τ enables L=1, τ1=τ or less is decoded in three steps:
2.1) gatheringIn find out and meet τL>=kpL n-kMaximum k, be denoted as k*;
2.2)JL=k*;
2.3) such as L < K, thenAnd L=L+1 is enabled, step 2.1) is returned to, is otherwise terminated.
In the step 2.1), meet τL>=kpL n-kK be constantly present, at least k=0 always meets, thus every time
There are maximum k*;Obtain J1J2J3......Jn-1JKAs decoded result.
Example one: a kind of lossless data compression coding method based on arithmetic coding, process are as follows:
Symbol | A | B | C |
Probability | 1/4 | 1/4 | 1/2 |
Number | 2 | 1 | 0 |
1) character string CBAB is inputted
Reference numeral is 0121, and corresponding rational is 25/256=0.09765625, and corresponding bit stream is
00011001.Decoding, is converted into decimal system rational 25/256 for 00011001 first, meets in set { 1/2,1/4,0 }
τL>=kpL n-kMaximum k be 0, so J1=0;Meet τ in set { 1/8,1/16,0 } againL>=kpL n-kMaximum k be 1
So J2=1;It is found again to 9/256 in set { 1/32,1/64,0 } and meets τL>=kpL n-kMaximum k be 2 so J3=2;
It is finally found to 1/256 in set { 1/128,1/256,0 } and meets τL>=kpL n-kMaximum k be 1 so J4=1, last solution
Code obtains J1J2J3J4=0121 corresponds to character string CBAB.
2) character string CACCB is inputted
Reference numeral is 02001, and corresponding rational is 0.1259765625, and corresponding bit stream is 0010000001.
The corresponding rational of decoding process 0010000001 is 0.1259765625, meets τ in { 1/2,1/4,0 }L>=kpL n-k's
Maximum k is 0 so J1=0;It is 2 so J in the maximum k that set { 1/8,1/16,0 } is met the requirements2=0;It similarly obtains so J3
=J4=0, J5=1.
Example two: a kind of lossless data compression coding method based on arithmetic coding, process are as follows:
It inputs character string ABCCCCE and corresponds to numbered sequence 2100003, corresponding in rational is 0.4400003.Decoding is such as
Under, the maximum k met the requirements is 2 in { 0.3,0.4,0.2,0 }, secondly maximum full in { 0.03,0.08,0.04,0 }
The k required enough is 1, and the k in { 0.003,0.016,0.008,0 } is 0, hereafter continuous 4 times all be 0, to the 7th time
K is 3 in { 0.0000003,0.0000256,0.0000128,0 }.Final decoding obtains 2100003.
Claims (1)
1. a kind of lossless data compression coding method based on arithmetic coding, it is characterised in that: the coding method includes following
Process:
1) it encodes: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, be according to subscript into
Row dullness increasing arrangement mixes symbolic label to n symbol as { n-1, n-2 ..., 1,0 }, probability piSymbol mix
Marked as n-i, i=1,2 ..., n, it is assumed that the length of input is K character string J1J2J3......Jn-1JK, wherein Ji∈{0,1,
2 ..., n-1 }, then defining the character string corresponding to the rational τ in unit interval [0,1];
Regard τ as decimal number, mathematically proves that it is to belong to [0,1] unit interval, convert thereof into binary form
Show, indicates that the binary sequence of result is the output bit flow encoded;
2) it decodes: the bit stream being changed into the corresponding rational in unit interval [0,1] of decimal system rational, τ is set as, enables
L=1, τ1=τ or less is decoded in three steps:
2.1) gatheringIn find out and meet τL>=kpL n-kMaximum k, be denoted as k*;Meet τL>=kpL n-kK
It is constantly present, at least k=0 always meets, thus there is maximum k every time*;Obtain J1J2J3......Jn-1JKAs decode
Result;
2.2)JL=k*;
2.3) such as L < K, then τL+1=τL-k*pL n-k *, and L=L+1 is enabled, step 2.1) is returned to, is otherwise terminated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611026314.4A CN106559084B (en) | 2016-11-15 | 2016-11-15 | A kind of lossless data compression coding method based on arithmetic coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611026314.4A CN106559084B (en) | 2016-11-15 | 2016-11-15 | A kind of lossless data compression coding method based on arithmetic coding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106559084A CN106559084A (en) | 2017-04-05 |
CN106559084B true CN106559084B (en) | 2019-07-30 |
Family
ID=58444626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611026314.4A Active CN106559084B (en) | 2016-11-15 | 2016-11-15 | A kind of lossless data compression coding method based on arithmetic coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106559084B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117095752B (en) * | 2023-08-21 | 2024-03-19 | 基诺创物(武汉市)科技有限公司 | DNA protein coding region streaming data storage method capable of keeping codon preference |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102438145A (en) * | 2011-11-22 | 2012-05-02 | 广州中大电讯科技有限公司 | Image lossless compression method on basis of Huffman code |
CN102684703A (en) * | 2012-04-26 | 2012-09-19 | 北京师范大学 | Efficient lossless compression method for digital elevation model data |
CN105191145A (en) * | 2013-03-01 | 2015-12-23 | 古如罗技微系统公司 | Data encoder, data decoder and method |
-
2016
- 2016-11-15 CN CN201611026314.4A patent/CN106559084B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102438145A (en) * | 2011-11-22 | 2012-05-02 | 广州中大电讯科技有限公司 | Image lossless compression method on basis of Huffman code |
CN102684703A (en) * | 2012-04-26 | 2012-09-19 | 北京师范大学 | Efficient lossless compression method for digital elevation model data |
CN105191145A (en) * | 2013-03-01 | 2015-12-23 | 古如罗技微系统公司 | Data encoder, data decoder and method |
Non-Patent Citations (1)
Title |
---|
LOSSLESS IMAGE COMPRESSION USING INTEGER COEFFICIENT FILTER BANKS AND CLASS-WISE ARITHMETIC CODING;I. Balasingham 等;《Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing,ICASSP"98》;19980515;第6卷;第1349-1352页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106559084A (en) | 2017-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105656604B (en) | A kind of Bit Interleave Polarization Coding modulator approach and device | |
CN109194337B (en) | A kind of Polar code encoding method, device | |
CN1185795C (en) | Device and method for entropy encoding of information words and device and method for decoding entropy-encoded information words | |
Gad et al. | Repair-optimal MDS array codes over GF (2) | |
Kumar et al. | A high capacity email based text steganography scheme using Huffman compression | |
CN109474281B (en) | Data encoding and decoding method and device | |
CN112399181B (en) | Image coding and decoding method, device and storage medium | |
WO2021130754A1 (en) | Systems and methods of data compression | |
Yang et al. | Rate distortion theory for causal video coding: Characterization, computation algorithm, and comparison | |
CN106559084B (en) | A kind of lossless data compression coding method based on arithmetic coding | |
Rifà-Pous et al. | Product perfect codes and steganography | |
CN107018426A (en) | Binarizer for image and video coding is selected | |
CN105915317B (en) | It can the decoded forward erasure correction code code coefficient Matrix Construction Method of Zigzag | |
Sharififar et al. | On the optimality of linear index coding over the fields with characteristic three | |
CN103746701A (en) | Rapid encoding option selecting method applied to Rice lossless data compression | |
CN110545435B (en) | Table top pixel coding method, device and storage medium based on probability model | |
Zhou et al. | Distributed block arithmetic coding for equiprobable sources | |
CN103597828B (en) | Image quantization parameter encoding method and image quantization parameter decoding method | |
CN102651795B (en) | Run-length reduced binary sequence compressed encoding method | |
CN105472395B (en) | A kind of Lossless Image Compression Algorithm method based on discrete Krawtchouk orthogonal polynomial | |
CN108880752B (en) | Polarization code puncture communication method for information bit fixing situation | |
CN106603081A (en) | General arithmetic coding method and decoding method thereof | |
Radulovic et al. | Multiple description image coding with block-coherent redundant dictionaries | |
Chang et al. | Lossless Information Hiding in the VQ Index Table. | |
US20160323603A1 (en) | Method and apparatus for performing an arithmetic coding for data symbols |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220127 Address after: 310000 5-1006, No. 501, No. 2 street, Baiyang street, Qiantang new area, Hangzhou, Zhejiang Province Patentee after: Hangzhou Markov Technology Co.,Ltd. Address before: The city Zhaohui six districts Chao Wang Road Hangzhou City, Zhejiang province 310014 18 Patentee before: ZHEJIANG University OF TECHNOLOGY |