CN106559084B - A kind of lossless data compression coding method based on arithmetic coding - Google Patents

A kind of lossless data compression coding method based on arithmetic coding Download PDF

Info

Publication number
CN106559084B
CN106559084B CN201611026314.4A CN201611026314A CN106559084B CN 106559084 B CN106559084 B CN 106559084B CN 201611026314 A CN201611026314 A CN 201611026314A CN 106559084 B CN106559084 B CN 106559084B
Authority
CN
China
Prior art keywords
rational
unit interval
symbol
character string
data compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611026314.4A
Other languages
Chinese (zh)
Other versions
CN106559084A (en
Inventor
陆成刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Markov Technology Co ltd
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201611026314.4A priority Critical patent/CN106559084B/en
Publication of CN106559084A publication Critical patent/CN106559084A/en
Application granted granted Critical
Publication of CN106559084B publication Critical patent/CN106559084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A kind of lossless data compression coding method based on arithmetic coding, including following procedure: 1) encoding: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, it is that dull increasing arrangement is carried out according to subscript, mixing symbolic label to n symbol is { n-1, n-2 ..., 1,0 }, probability piSymbol mix marked as n-i, i=1,2 ..., n, it is assumed that the length of input be K character string J1J2J3...Jn‑1JK, wherein Ji∈ { 0,1,2 ..., n-1 }, the character string is so defined corresponding to the rational τ in unit interval [0,1], regards τ as decimal number, mathematically prove that it is to belong to [0,1] unit interval, converts thereof into binary representation, indicates that the binary sequence of result is the output bit flow encoded;2) it decodes: the bit stream is changed into the corresponding rational in unit interval [0,1] of decimal system rational.The present invention does not need division, and is entropy optimum code, does not depend on grouping mode of extension, usage mode is flexible.

Description

A kind of lossless data compression coding method based on arithmetic coding
Technical field
The invention belongs to field of data compression, are related to a kind of lossless data compression coding method.
Background technique
Arithmetic coding is a kind of very classical and well-known lossless data compression coding method.Data compression technique is in information There is extremely important status, data compression method is divided into lossy compression and lossless compression again in technology.Compression method one As be based on the characteristic of human perception organ (auditory system, vision system), delete a large amount of perception redundant datas, but do not influence people The perceived quality listened to rating.For example, time domain or the frequency domain mask spy of human auditory system is utilized in audio MP3 technology Property and threshold of audibility characteristic;And the characteristic that video mpeg encoded technology takes full advantage of human eye frame sampling frequency estimates image interframe movement It is counted as out effective compensation, so that deleting a large amount of inter-frame redundancy information achievees the effect that data compression.Lossless compressiong is usual For computer document, medical image data compression in, data are to need stringent lossless recovery there.In addition, nothing Damage compress technique is also used as the rear end of lossy compression system, makees further nothing to the lossy compression data flow of input Damage compression, as shown in Figure 1.
Huffman coding is widely used entropy coding algorithm in lossless compressiong, relative to arithmetic coding method For, maximum problem is to carry out coded treatment to character string stream based on grouping mode of extension, with the increasing of grouping width Add, coding redundancy degree reduces, thus it is optimal to approach entropy.But since grouping width increase brings the complexity of character list design The increase of degree, and the fixed scalability of grouping width, flexibility ratio are poor in practical applications, are actually inferior to arithmetic volume Code.Arithmetic coding is natural streaming coding, is not grouped the concept of extension, the input character crossfire of any length can all reflect It is mapped to the rational (or mutually disjoint segment) of [0, a 1] unit interval, applicability, flexibility are all compiled compared with Huffman Code is high, and the accurate estimation based on the probability distribution to character source, and arithmetic coding is also entropy optimum code, the probability point of character source Cloth estimation is more accurate, and the redundancy of arithmetic coding is lower, to optimal closer to entropy.Just because of the advantage of arithmetic coding, causes It is technically studied most sufficiently, and authorized relevant patent is also most, although many patents were at eighties of last century 90 years In generation, fails, but has the relevant patent of many arithmetic codings to be applied and authorized again in recent years.In turn, due to calculating Art is encoded by widely granted patent, so leaving the space of many industrial applications to Huffman coding instead.
The thought of classical arithmetic coding be will input character crossfire be mapped to [0,1] section a rational (ten into Form processed), then this number is shown as bit stream using binary form, which is exactly coding result;Decoding process is by bit The decimal fraction for circulating into [0,1] section, by the probability distribution of symbolic source, Cong Shouzhi tail solves character one by one.It is specific to compile Code process is as follows, if symbolic source is { 1,2,3 ..., n }, their probability distribution is { p1,p2,...,pn, input character String is J1J2J3......Jn-1JKLength is K, wherein Ji∈ { 1,2 ..., n }, the left and right endpoint point in the corresponding section of the character string It is not
Wherein arrangeIt can be seen that the corresponding interval width of the character string isThe word The corresponding rational τ of symbol string may be defined as the midpoint τ=(τ in sectionlr)/2。
Decoding process is as follows, is firstly introduced into markWherein F0=0, Fn=1;The length for inputting character string is L The corresponding section of substring left and right endpoint be τl LAnd τr L, wherein τl 0=0, τr 0=1, L=1 is enabled, is solved in four steps below Code:
1) τ is calculated*=(τ-τl L-1)/(τr L-1l L-1)
2) determination meets Fi-1<=τ*<=Fi,I*As JL=i*
3)And
4) such as L < K, then L=L+1, returns to step 1, otherwise terminates.
It is exactly above the decoded method frame of arithmetic coding, still there is the thin of some software code layouts in actual implementation Section.
Existing arithmetic coding method rests on the details aspect of software code realization or applies to arithmetic coding In the design of other systems.
Summary of the invention
In order to overcome the shortcomings of that the present invention provides one kind not to need dependent on division existing for existing arithmetic coding method The lossless data compression coding method based on arithmetic coding of division, and be equally entropy optimum code with existing arithmetic coding, and It is non-grouping mode of extension, usage mode is flexible.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of lossless data compression coding method based on arithmetic coding, including following procedure:
1) it encodes: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, it is under Mark carries out dull increasing arrangement, mixes symbolic label to n symbol as { n-1, n-2 ..., 1,0 }, probability piSymbol It mixes marked as n-i, i=1,2 ..., n, it is assumed that the length of input is K character stringWherein Ji∈{0,1, 2 ..., n-1 }, then defining the character string corresponding to the rational τ in unit interval [0,1];
Regard τ as decimal number, mathematically prove that it is to belong to [0,1] unit interval, convert thereof into two into Tabulation is shown, indicates that the binary sequence of result is the output bit flow encoded;
2) it decodes: the bit stream being changed into the corresponding rational in unit interval [0,1] of decimal system rational, is set as τ enables L=1, τ1=τ or less is decoded in three steps:
2.1) gatheringIn find out and meet τL>=kpL n-kMaximum k, be denoted as k*
2.2)JL=k*
2.3) such as L < K, thenAnd L=L+1 is enabled, step 2.1) is returned to, is otherwise terminated.
Further, in the step 2.1), meet τL>=kpL n-kK be constantly present, at least k=0 always meets, thus There is maximum k every time*;Obtain J1J2J3......Jn-1JKAs decoded result.
Beneficial effects of the present invention are mainly manifested in: it is also that a streaming coding makes independent of grouping mode of extension It is flexible with mode;In addition, mathematically proving that it is also entropy optimum code.
Detailed description of the invention
Fig. 1 is schematic diagram of the lossless compressiong as the rear module of lossy compression system.
Specific embodiment
The invention will be further described below.
A kind of lossless data compression coding method based on arithmetic coding, the arithmetic coding can be regarded as from some angle The method frame of original arithmetic coding, including following procedure are breached from basic ideas:
1) it encodes: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, it is noted that be Dull increasing arrangement (having no this arrangement in classical arithmetic coding mode to require) is carried out according to subscript, mixes symbol to n symbol Number marked as { n-1, n-2 ..., 1,0 }, probability piSymbol mix marked as n-i, i=1,2 ..., n, from mathematics Upper proofAssuming that the length of input is K character string J1J2J3......Jn-1JK, wherein Ji∈ { 0,1,2 ..., n-1 }, then defining the character string corresponding in unit interval [0,1] Rational τ;
Regard τ as decimal number, mathematically prove that it is to belong to [0,1] unit interval, convert thereof into two into Tabulation is shown, indicates that the binary sequence of result is the output bit flow encoded;
2) it decodes: above-mentioned bit stream being changed into the corresponding rational in unit interval [0,1] of decimal system rational, is set as τ enables L=1, τ1=τ or less is decoded in three steps:
2.1) gatheringIn find out and meet τL>=kpL n-kMaximum k, be denoted as k*
2.2)JL=k*
2.3) such as L < K, thenAnd L=L+1 is enabled, step 2.1) is returned to, is otherwise terminated.
In the step 2.1), meet τL>=kpL n-kK be constantly present, at least k=0 always meets, thus every time There are maximum k*;Obtain J1J2J3......Jn-1JKAs decoded result.
Example one: a kind of lossless data compression coding method based on arithmetic coding, process are as follows:
Symbol A B C
Probability 1/4 1/4 1/2
Number 2 1 0
1) character string CBAB is inputted
Reference numeral is 0121, and corresponding rational is 25/256=0.09765625, and corresponding bit stream is 00011001.Decoding, is converted into decimal system rational 25/256 for 00011001 first, meets in set { 1/2,1/4,0 } τL>=kpL n-kMaximum k be 0, so J1=0;Meet τ in set { 1/8,1/16,0 } againL>=kpL n-kMaximum k be 1 So J2=1;It is found again to 9/256 in set { 1/32,1/64,0 } and meets τL>=kpL n-kMaximum k be 2 so J3=2; It is finally found to 1/256 in set { 1/128,1/256,0 } and meets τL>=kpL n-kMaximum k be 1 so J4=1, last solution Code obtains J1J2J3J4=0121 corresponds to character string CBAB.
2) character string CACCB is inputted
Reference numeral is 02001, and corresponding rational is 0.1259765625, and corresponding bit stream is 0010000001. The corresponding rational of decoding process 0010000001 is 0.1259765625, meets τ in { 1/2,1/4,0 }L>=kpL n-k's Maximum k is 0 so J1=0;It is 2 so J in the maximum k that set { 1/8,1/16,0 } is met the requirements2=0;It similarly obtains so J3 =J4=0, J5=1.
Example two: a kind of lossless data compression coding method based on arithmetic coding, process are as follows:
It inputs character string ABCCCCE and corresponds to numbered sequence 2100003, corresponding in rational is 0.4400003.Decoding is such as Under, the maximum k met the requirements is 2 in { 0.3,0.4,0.2,0 }, secondly maximum full in { 0.03,0.08,0.04,0 } The k required enough is 1, and the k in { 0.003,0.016,0.008,0 } is 0, hereafter continuous 4 times all be 0, to the 7th time K is 3 in { 0.0000003,0.0000256,0.0000128,0 }.Final decoding obtains 2100003.

Claims (1)

1. a kind of lossless data compression coding method based on arithmetic coding, it is characterised in that: the coding method includes following Process:
1) it encodes: assuming that the probability distribution of n source symbol is { 0 < p1<=p2<=... <=pn< 1 }, be according to subscript into Row dullness increasing arrangement mixes symbolic label to n symbol as { n-1, n-2 ..., 1,0 }, probability piSymbol mix Marked as n-i, i=1,2 ..., n, it is assumed that the length of input is K character string J1J2J3......Jn-1JK, wherein Ji∈{0,1, 2 ..., n-1 }, then defining the character string corresponding to the rational τ in unit interval [0,1];
Regard τ as decimal number, mathematically proves that it is to belong to [0,1] unit interval, convert thereof into binary form Show, indicates that the binary sequence of result is the output bit flow encoded;
2) it decodes: the bit stream being changed into the corresponding rational in unit interval [0,1] of decimal system rational, τ is set as, enables L=1, τ1=τ or less is decoded in three steps:
2.1) gatheringIn find out and meet τL>=kpL n-kMaximum k, be denoted as k*;Meet τL>=kpL n-kK It is constantly present, at least k=0 always meets, thus there is maximum k every time*;Obtain J1J2J3......Jn-1JKAs decode Result;
2.2)JL=k*
2.3) such as L < K, then τL+1L-k*pL n-k *, and L=L+1 is enabled, step 2.1) is returned to, is otherwise terminated.
CN201611026314.4A 2016-11-15 2016-11-15 A kind of lossless data compression coding method based on arithmetic coding Active CN106559084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611026314.4A CN106559084B (en) 2016-11-15 2016-11-15 A kind of lossless data compression coding method based on arithmetic coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611026314.4A CN106559084B (en) 2016-11-15 2016-11-15 A kind of lossless data compression coding method based on arithmetic coding

Publications (2)

Publication Number Publication Date
CN106559084A CN106559084A (en) 2017-04-05
CN106559084B true CN106559084B (en) 2019-07-30

Family

ID=58444626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611026314.4A Active CN106559084B (en) 2016-11-15 2016-11-15 A kind of lossless data compression coding method based on arithmetic coding

Country Status (1)

Country Link
CN (1) CN106559084B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095752B (en) * 2023-08-21 2024-03-19 基诺创物(武汉市)科技有限公司 DNA protein coding region streaming data storage method capable of keeping codon preference

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102438145A (en) * 2011-11-22 2012-05-02 广州中大电讯科技有限公司 Image lossless compression method on basis of Huffman code
CN102684703A (en) * 2012-04-26 2012-09-19 北京师范大学 Efficient lossless compression method for digital elevation model data
CN105191145A (en) * 2013-03-01 2015-12-23 古如罗技微系统公司 Data encoder, data decoder and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102438145A (en) * 2011-11-22 2012-05-02 广州中大电讯科技有限公司 Image lossless compression method on basis of Huffman code
CN102684703A (en) * 2012-04-26 2012-09-19 北京师范大学 Efficient lossless compression method for digital elevation model data
CN105191145A (en) * 2013-03-01 2015-12-23 古如罗技微系统公司 Data encoder, data decoder and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LOSSLESS IMAGE COMPRESSION USING INTEGER COEFFICIENT FILTER BANKS AND CLASS-WISE ARITHMETIC CODING;I. Balasingham 等;《Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing,ICASSP"98》;19980515;第6卷;第1349-1352页 *

Also Published As

Publication number Publication date
CN106559084A (en) 2017-04-05

Similar Documents

Publication Publication Date Title
CN105656604B (en) A kind of Bit Interleave Polarization Coding modulator approach and device
CN109194337B (en) A kind of Polar code encoding method, device
CN1185795C (en) Device and method for entropy encoding of information words and device and method for decoding entropy-encoded information words
Gad et al. Repair-optimal MDS array codes over GF (2)
Kumar et al. A high capacity email based text steganography scheme using Huffman compression
CN109474281B (en) Data encoding and decoding method and device
CN112399181B (en) Image coding and decoding method, device and storage medium
WO2021130754A1 (en) Systems and methods of data compression
Yang et al. Rate distortion theory for causal video coding: Characterization, computation algorithm, and comparison
CN106559084B (en) A kind of lossless data compression coding method based on arithmetic coding
Rifà-Pous et al. Product perfect codes and steganography
CN107018426A (en) Binarizer for image and video coding is selected
CN105915317B (en) It can the decoded forward erasure correction code code coefficient Matrix Construction Method of Zigzag
Sharififar et al. On the optimality of linear index coding over the fields with characteristic three
CN103746701A (en) Rapid encoding option selecting method applied to Rice lossless data compression
CN110545435B (en) Table top pixel coding method, device and storage medium based on probability model
Zhou et al. Distributed block arithmetic coding for equiprobable sources
CN103597828B (en) Image quantization parameter encoding method and image quantization parameter decoding method
CN102651795B (en) Run-length reduced binary sequence compressed encoding method
CN105472395B (en) A kind of Lossless Image Compression Algorithm method based on discrete Krawtchouk orthogonal polynomial
CN108880752B (en) Polarization code puncture communication method for information bit fixing situation
CN106603081A (en) General arithmetic coding method and decoding method thereof
Radulovic et al. Multiple description image coding with block-coherent redundant dictionaries
Chang et al. Lossless Information Hiding in the VQ Index Table.
US20160323603A1 (en) Method and apparatus for performing an arithmetic coding for data symbols

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220127

Address after: 310000 5-1006, No. 501, No. 2 street, Baiyang street, Qiantang new area, Hangzhou, Zhejiang Province

Patentee after: Hangzhou Markov Technology Co.,Ltd.

Address before: The city Zhaohui six districts Chao Wang Road Hangzhou City, Zhejiang province 310014 18

Patentee before: ZHEJIANG University OF TECHNOLOGY