CN104156990B - A kind of lossless compression-encoding method and system for supporting super-huge data window - Google Patents
A kind of lossless compression-encoding method and system for supporting super-huge data window Download PDFInfo
- Publication number
- CN104156990B CN104156990B CN201410317732.3A CN201410317732A CN104156990B CN 104156990 B CN104156990 B CN 104156990B CN 201410317732 A CN201410317732 A CN 201410317732A CN 104156990 B CN104156990 B CN 104156990B
- Authority
- CN
- China
- Prior art keywords
- coding
- length
- index
- encoded
- statistical form
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000006835 compression Effects 0.000 claims abstract description 41
- 238000007906 compression Methods 0.000 claims abstract description 39
- 229910002056 binary alloy Inorganic materials 0.000 claims abstract description 13
- 230000011218 segmentation Effects 0.000 claims abstract description 10
- 238000005192 partition Methods 0.000 claims description 6
- 238000000926 separation method Methods 0.000 description 8
- 238000013144 data compression Methods 0.000 description 6
- 241001269238 Data Species 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000686 essence Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000005056 compaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000010902 straw Substances 0.000 description 1
Abstract
The invention discloses a kind of lossless compression-encoding method for supporting super-huge data window, comprise the steps of:String matching encoder carries out string matching to uncompressed data, and generation contains the multiple-length coding of individual character coding and control command code, multiple index coding;Segmented cutter encodes multiple-length, multiple index code division into other regular lengths of binary system 8 or binary system encoded segment, and from multiple-length coding, multiple index coding in extract coding specification information;Categorised statistical form carries out dynamic statistics to the encoded segment of regular length, and according to the numerical computations probabilistic forecasting table of statistical form;According to probabilistic forecasting table, entropy coder uses the Coding Compression Algorithm based on probability statistics, is compressed coding to fixed-length code (FLC) segmentation, exports binary compressed data.The present invention can support GB levels down to the super-huge data window of EB levels, and will not become big because index code length and reduce compression ratio.
Description
Technical field
It is more particularly to a kind of to support super-huge data window the present invention relates to the information coding technique field of lossless date-compress
The lossless compression-encoding method and system of mouth.
Background technology
With developing rapidly for cloud computing, data volume increases with surprising rapidity.Development as information industry becomes
Gesture, big data are just becoming more and more important.At the same time also there is a problem:How these TB level, PBs are efficiently handled
Level even more big datasWe need to store and transmit these big datas in a network environment, and this is to memory space, network
Bandwidth and computing resource are all a kind of challenges.
Data compression is to save data storage and a kind of wise method of transmission cost, but faces big data, tradition
Compression and encryption technology seem scarce capacity.Such as software WinRAR only has 4MB small data window, this can limit pressure
Shrinkage, and its compression speed is not sufficiently fast.
Big data window is possible to improve compression ratio.But expand data window and be difficult, because index length can increase,
Compression ratio is caused to reduce.
Lossless Compression is also known as Lossless Compression, is one kind among data compression technique, and feature can when being decompression
Data it is the same restore.Such as the use of the software such as WinZip, WinRAR, 7-zip is exactly lossless compressiong.
The another kind of of data compression technique is referred to as lossy compression method, and the object generally compressed is the multi-medium datas such as sound, picture, video,
It is characterized in decompressing obtained data and initial data is variant, but the perceptual difference given people is away from unobvious.Such as JPG figures
Piece, DVD video have just used lossy compression.All lossy compression method coding methods, it is required for adopting in compressibility end
Complete to compress with the addressable part of a Lossless Compression, therefore this patent is equally applicable in the case of lossy compression method.
Entropy coder is the important component in lossless compressiong, and its principle is to determine word according to the probability of character appearance
The length of coding is accorded with, the big character of probability uses short coding, and the small character of probability uses long codes, so that the data of output are compiled
Code is as short as possible, reaches the effect of data compression.Entropy coder use common algorithms have arithmetic coding, Interval Coding,
Huffman codings etc., such as WinZip Huffman codings, the 7-zip algorithms of Interval Coding.This patent is for adopting
Situation with other entropy code algorithms is equally applicable.
From the point of view of theoretic classification, current Lossless Compression mathematical modeling and method can be divided into following 3 type:
1) compression based on probability statistics, such as Huffman encoding, arithmetic coding etc..In this type, based on Marko husband
PPM (Partial Prediction Match, fractional prediction matching) algorithm of chain model has good compression ratio.
2) compression based on dictionary index, such as LZ77/LZSS algorithms, LZ78/LZW algorithms etc..The compression mould of LZ series
Type has the advantage in speed.
3) compression of the order based on symbol and repetition situation, such as (Burrows-Wheeler turns by Run- Length Coding, BWT
Change) coding etc..
The compressed software of current popular is the Application of composite of above basis compression theory.Every kind of software is generally integrated different
Compact model and method reach more preferable effect.The characteristics of some popular compressed softwares are as follows:
1) dbase:WinZip
Compressed format:Deflat;
Rudimentary algorithm:LZSS&Huffman is encoded;
Data window maxsize:512KB;
Weak point:Data window is small;Compression ratio is low;Big data tenability is weak.
2) dbase:WinRAR
Compressed format:RAR;
Rudimentary algorithm:LZSS&Huffman is encoded;
Data window maxsize:4MB;
Weak point:Data window is small;Compression ratio is low;Big data tenability is weak.
3) dbase:Bzip2
Compressed format:BZ2;
Rudimentary algorithm:BWT&Huffman is encoded;
Data window (data block) maxsize:900KB;
Weak point:BWT data blocks are small;Compression ratio is low;Big data tenability is weak.
4) dbase:7-zip
Compressed format:7z;
Rudimentary algorithm:LZSS& arithmetic codings (Interval Coding is identical with arithmetic coding essence);
Data window maxsize:4GB;
Weak point:Data window is smaller;Limited big data tenability.
Also other compressed softwares, such as PAQ, WinUDA etc..They may have higher compression ratio, but speed is slower,
Be not suitable for big data compression.
In summary, or existing lossless date-compress technology speed is slow, it is even more more to be not suitable for progress GB levels, TB levels
Big data compression, or data window is smaller, cause compression ratio relatively low.But directly increase data window can not be carried effectively
High compression rate, because big data window is needed with longer index, and the increase for indexing length can reduce compression ratio, obtain and do not repay
Lose, unless effective compaction coding method and compressed format can be found.
The content of the invention
The shortcomings that it is an object of the invention to overcome prior art and deficiency, there is provided a kind of to support super-huge data window
Lossless compression-encoding method.
Another object of the present invention is to provide a kind of lossless compression-encoding system for supporting super-huge data window.
The purpose of the present invention is realized by following technical scheme:
A kind of lossless compression-encoding method for supporting super-huge data window, the step of comprising following order:
S1. string matching encoder to uncompressed data carry out string matching, generation contain individual character coding and
Multiple-length coding, the multiple index coding of control command code;
S2. segmented cutter multiple-length is encoded, multiple index code division into binary system 8 or binary system other
The encoded segment of regular length, there is provided to categorised statistical form and entropy coder;Meanwhile compiled from multiple-length coding, multiple index
Coding specification information is extracted in code, there is provided to categorised statistical form;
S3. the encoded segment of regular length is put into according to coding specification information, categorised statistical form by different statistics first
Dynamic statistics are carried out in table, then according to the numerical value of statistical form, probabilistic forecasting table is provided to entropy coder;
S4. the Coding Compression Algorithm based on probability statistics is used according to probabilistic forecasting table, entropy coder, regular length is compiled
Code division section is compressed coding, exports binary compressed data.
In step S1, described multiple-length coding, it is classified, and is followed specific according to string length to be encoded
Length coded format.
Described is classified according to string length to be encoded, specifically includes:Using at least 3 grades of length coding, together
The other code length of one-level is identical;The other length coding of lower level, advanced other length coding are short;Shorter character string is adopted
With the other length coding of lower level, longer character string uses the length coding of higher level, specific partition of the level with it is specific
Length coded format it is related;
Described multiple-length coded format, is specifically included:L0 length codings, represent single character;L1 length codings, generation
Table string length 2 to 11;L2 length codings, represent string length 12 to 75;Optional L3 length codings, represent character string
Length 76 to 325;Optional L4 length codings, including L4 lineal measures coding and L4 index length codings, it would be preferable to support maximum
16EB string length, and follow the form that specific lineal measure coding is combined with index length coding.
In step S1, described multiple index coding, divided according to the distance of character string and matched character string to be encoded
Level, and follow specific index coded format.
The described distance according to character string and matched character string to be encoded is classified, and is specifically included:Using at least 3 grades
Index coding, the code length of same rank is identical;The other index coding of lower level, advanced other index coding are short;
Index value is equal to the distance value of character string and matched character string to be encoded;Less index value is compiled using the other index of lower level
Code, larger index value are encoded using the index of higher level, and specific partition of the level is related to specific index coded format;
Described multiple index coded format, is specifically included:L1 indexes encode, and represent index value X1 to X1+Y1-1, wherein
X1=1, Y1 are equal to 28 powers;L2 indexes encode, and represent index value X2 to X2+Y2-1, wherein X2=X1+Y1, and Y2 is equal to 2
16 powers;L3 indexes encode, and represent index value X3 to X3+Y3-1, wherein X3=X2+Y2, and Y3 is equal to 2 24 powers;Optionally
L4 indexes encode, and represent index value X4 to X4+Y4-1, wherein X4=X3+Y3, and Y4 is equal to 2 32 powers;Optional L5 indexes are compiled
Code, represents index value X5 to X5+Y5-1, wherein X5=X4+Y4, and Y5 is equal to 2 40 powers;Optional L6 indexes coding, is represented
Index value X6 to X6+Y6-1, wherein X6=X5+Y5, Y6 are equal to 2 48 powers;Optional L7 indexes coding, represents index value X7
To X7+Y7-1, wherein X7=X6+Y6, Y7 is equal to 2 56 powers;Optional L8 indexes coding, represents index value X8 to X8+Y8-
1, wherein X8=X7+Y7, Y8 are equal to 2 64 powers;The data that L1, L2, L3, L4, L5, L6, L7, L8 index coding can be supported
Window size is followed successively by 256B, 64KB, 16MB, 4GB, 1TB, 256TB, 64PB, 16EB.
In step S2, the coding for being categorized as N number of rank in described categorised statistical form, wherein N >=1.
In step S3, described statistical form designs according to 0 rank, 1 rank or 2 rank PPM probability statistics models.
In step S4, the Coding Compression Algorithm based on probability statistics is arithmetic coding, Interval Coding or Huffman
One kind in coding.
Another object of the present invention is realized by following technical scheme:
A kind of lossless compression-encoding system for supporting super-huge data window, including the string matching coding that order is connected
Device, segmented cutter, categorised statistical form, entropy coder, wherein
String matching encoder, for uncompressed data to be carried out with string matching, generation contains individual character coding
Multiple-length coding, multiple index coding with control command code, there is provided to segmented cutter;
Segmented cutter, for multiple-length to be encoded, multiple index code division into binary system 8 or binary system its
The encoded segment of his regular length, there is provided to categorised statistical form and entropy coder, at the same time, from multiple-length coding, multistage
Index coding in extract coding specification information, there is provided to categorised statistical form, one classification be M kind ranks coding, wherein M >=
1;
Categorised statistical form, for according to coding specification information, the encoded segment of regular length being put into different statistical forms
Middle carry out dynamic statistics, each statistical form used can be designed according to 0 rank, 1 rank or 2 rank PPM probability statistics models, with
According to the numerical value of statistical form, probabilistic forecasting table is provided to entropy coder simultaneously for this;
Entropy coder, for according to probabilistic forecasting table, using arithmetic coding, Interval Coding, Huffman codings or other bases
In the Coding Compression Algorithm of probability statistics, coding is compressed to fixed-length code (FLC) segmentation, output is binary to have compressed number
According to.
The present invention compared with prior art, has the following advantages that and beneficial effect:
The method of the present invention, index coding can support oversized data window, can reach PB, EB level, be far longer than
Current common compression software WinRAR and 7-zip MB, GB DBMS window, and will not be because of data window increase, rope
Draw the increase of length and reduce compression ratio;Linear length coding and index length coding can be supported simultaneously, can be encoded and be up to
16EB string length;Index, the Encoder Advantage of length above, make the present invention in the magnanimity number of compression GB, TB level or more
According to when, the compression ratio higher than current common compression software can be obtained.
Brief description of the drawings
Fig. 1 is a kind of flow chart of lossless compression-encoding method for supporting large data window of the present invention;
Fig. 2 is a kind of structural representation of lossless compression-encoding system for supporting large data window of the present invention;
Fig. 3 is the multiple-length coding of the string matching encoder output of system described in Fig. 2, multistage straw line coded format
Schematic diagram.
Embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are unlimited
In this.
A kind of lossless compression-encoding method for supporting super-huge data window, the step of comprising following order:
S1. string matching encoder to uncompressed data carry out string matching, generation contain individual character coding and
Multiple-length coding, the multiple index coding of control command code;
Described multiple-length coding, is classified, and follow specific length coding according to string length to be encoded
Form.
Described is classified according to string length to be encoded, specifically includes:Using at least 3 grades of length coding, together
The other code length of one-level is identical;The other length coding of lower level, advanced other length coding are short;Shorter character string is adopted
With the other length coding of lower level, longer character string uses the length coding of higher level, specific partition of the level with it is specific
Length coded format it is related.
Described multiple-length coded format, is specifically included:L0 length codings, represent single character;L1 length codings, generation
Table string length 2 to 11;L2 length codings, represent string length 12 to 75;Optional L3 length codings, represent character string
Length 76 to 325;Optional L4 length codings, including L4 lineal measures coding and L4 index length codings, it would be preferable to support maximum
16EB string length, and follow the form that specific lineal measure coding is combined with index length coding;
In step S1, described multiple index coding, divided according to the distance of character string and matched character string to be encoded
Level, and follow specific index coded format.
The described distance according to character string and matched character string to be encoded is classified, and is specifically included:Using at least 3 grades
Index coding, the code length of same rank is identical;The other index coding of lower level, advanced other index coding are short;
Index value is equal to the distance value of character string and matched character string to be encoded;Less index value is compiled using the other index of lower level
Code, larger index value are encoded using the index of higher level, and specific partition of the level is related to specific index coded format.
Described multiple index coded format, is specifically included:L1 indexes encode, and represent index value X1 to X1+Y1-1, wherein
X1=1, Y1 are equal to 28 powers;L2 indexes encode, and represent index value X2 to X2+Y2-1, wherein X2=X1+Y1, and Y2 is equal to 2
16 powers;L3 indexes encode, and represent index value X3 to X3+Y3-1, wherein X3=X2+Y2, and Y3 is equal to 2 24 powers;Optionally
L4 indexes encode, and represent index value X4 to X4+Y4-1, wherein X4=X3+Y3, and Y4 is equal to 2 32 powers;Optional L5 indexes are compiled
Code, represents index value X5 to X5+Y5-1, wherein X5=X4+Y4, and Y5 is equal to 2 40 powers;Optional L6 indexes coding, is represented
Index value X6 to X6+Y6-1, wherein X6=X5+Y5, Y6 are equal to 2 48 powers;Optional L7 indexes coding, represents index value X7
To X7+Y7-1, wherein X7=X6+Y6, Y7 is equal to 2 56 powers;Optional L8 indexes coding, represents index value X8 to X8+Y8-
1, wherein X8=X7+Y7, Y8 are equal to 2 64 powers;The data that L1, L2, L3, L4, L5, L6, L7, L8 index coding can be supported
Window size is followed successively by 256B, 64KB, 16MB, 4GB, 1TB, 256TB, 64PB, 16EB;
S2. segmented cutter multiple-length is encoded, multiple index code division into binary system 8 or binary system other
The encoded segment of regular length, there is provided to categorised statistical form and entropy coder;Meanwhile compiled from multiple-length coding, multiple index
Coding specification information is extracted in code, there is provided to categorised statistical form;
The coding for being categorized as N number of rank in described categorised statistical form, wherein N >=1;
S3. the encoded segment of regular length is put into according to coding specification information, categorised statistical form by different statistics first
Dynamic statistics are carried out in table, then according to the numerical value of statistical form, probabilistic forecasting table is provided to entropy coder;
Described statistical form designs according to 0 rank, 1 rank or 2 rank PPM probability statistics models;
S4. the Coding Compression Algorithm based on probability statistics is used according to probabilistic forecasting table, entropy coder, regular length is compiled
Code division section is compressed coding, exports binary compressed data;
The described Coding Compression Algorithm based on probability statistics is in arithmetic coding, Interval Coding or Huffman codings
It is a kind of.
Below to this method, do and further introduce:
As shown in figure 1, a kind of lossless compression-encoding method for supporting super-huge data window, the step of following order is included
Suddenly:
S301:Initialization index segmentation statistical form tabI [i], wherein i share 8 group indexes segmentation statistical form from 1 to 8;
S302:Length segmentation statistical form tabL [i] is initialized, wherein i shares 5 groups of length segmentation statistical forms from 0 to 4;
S303:Judge whether without still unpressed data, be that flow terminates, otherwise continue step S304;
S304:Special function is judged whether to, is, S305 is gone to step, otherwise goes to step S306;
S305:Setting variable C, I is control instruction code, and L is control operation number, goes to step S311 for control class;
S306:Read new uncompressed data;
S307:String matching is carried out in data window;
S308:Judge whether string matching succeeds, be, go to step S309, otherwise go to step S310;
S309:It is dictionary class to set variable C, and I is the index value of matching, and L is the string length value of matching, is gone to step
S311;
S310:It is monocase class to set variable C, and I is character ASCII character, goes to step S311;
S311:C three category codes represented are integrated into Unified coding, hierarchical index coding is made according to I, is classified according to L
Length coding;
S312:Variable Idx is set to be encoded for hierarchical index, S1 is index level, and Len is classification length coding, and S2 is length
Spend rank;
S313:Segmentation cutting is carried out to index coding Idx, turns into the encoded segment Idx1, Idx2, Idx3 of regular length
Number Deng, encoded segment depends on index coding Idx digit;
S314:Segmentation cutting is carried out to index coding Len, turns into the encoded segment Len1, Len2, Len3 of regular length
Number Deng, encoded segment depends on length coding Len digit;
S315:Using 0 rank, 1 rank or 2 rank PPM probability statistics table tabL [S2] to encoded segment Len1, Len2, Len3 etc.
Carry out entropy code;
S316:Using 0 rank, 1 rank or 2 rank PPM probability statistics table tabI [S1] to encoded segment Idx1, Idx2, Idx3 etc.
Carry out entropy code;
S317:The compressed data that entropy code is obtained exports;
S318:According to encoded segment Idx1, Idx2, Idx3 etc., renewal statistical form tabI [S1], according to encoded segment
Len1, Len2, Len3 etc., renewal statistical form tabL [S2], and jump to step S303.
Such as Fig. 2, a kind of lossless compression-encoding system for supporting super-huge data window, including the character string that order is connected
With encoder, segmented cutter, categorised statistical form, entropy coder, wherein
String matching encoder 101, for uncompressed data to be carried out with string matching, generation contains monocase volume
The multiple-length of code and control command code coding, multiple index coding, there is provided to segmented cutter 102;
Segmented cutter 102, for multiple-length to be encoded, multiple index code division enters into binary system 8 or two
Make the encoded segment of other regular lengths, there is provided to categorised statistical form 103 and entropy coder 104, at the same time, from multiple-length
Coding specification information is extracted in coding, multiple index coding, there is provided to categorised statistical form 103, a classification is M kind ranks
Coding, wherein M >=1;
Categorised statistical form 103, for according to coding specification information, the encoded segment of regular length being put into different statistics
Dynamic statistics are carried out in table, each statistical form used can be designed according to 0 rank, 1 rank or 2 rank PPM probability statistics models,
At the same time, according to the numerical value of statistical form, probabilistic forecasting table is provided to entropy coder 104;
Entropy coder 104, for according to probabilistic forecasting table, using arithmetic coding, Interval Coding, Huffman is encoded or it
His Coding Compression Algorithm based on probability statistics, coding is compressed to fixed-length code (FLC) segmentation, exports binary has pressed
Contracting data.
Such as Fig. 3, the multiple-length coding of string matching encoder output, multiple index coded format include:
String matching coding is carried out for each section of uncompressed data, obtained result is a classification length coding,
Encoded followed by a hierarchical index, wherein classification length coded format, from length coding separation 201, hierarchical index is compiled
Code form is from index coding separation 212;
Length coding separation 201:According to different situations, L0 length codings 202 may be encoded as to single character, to length 2
L1 length codings 203 are may be encoded as to 11, L2 length codings 204 are may be encoded as to length 12 to 75, the coding of other situations is then
From L3 length separation 205;
L3 length separation 205:According to different situations, control command code 210 is may be encoded as to control word, to length 76 to
325 may be encoded as L3 lineal measures coding 206, and to length 326 and the situation of the above, coding is then from L4 length separation 207;
L4 length separation 207:According to different situations, L4 lineal measures coding 208 is may be encoded as to length 326 to 65535,
L4 indexes length coding 209 may be encoded as to 64 powers of 16 powers to 2 of length 2;
Control command code 210:Followed by control operation number encoder 211;
Index coding separation 212:According to different situations, matched position can be compiled in the range of the data window 1 of 256 bytes
Code is L1 indexes coding 213, and L2 indexes are may be encoded as in the range of 64KB data window 2 to matched position and encode 214, to
L3 indexes coding 215 is may be encoded as in the range of 16MB data window 3 with position, to matched position 4GB data window 4
In the range of may be encoded as L4 indexes coding 216, L5 indexes are may be encoded as in the range of 1TB data window 5 to matched position and are compiled
Code 217, L6 indexes coding 218 is may be encoded as in the range of 256TB data window 6 to matched position, matched position is existed
L7 indexes coding 219 is may be encoded as in the range of 64PB data window 7, to matched position in the range of 16EB data window 8
It may be encoded as L8 indexes coding 220.
Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification,
Equivalent substitute mode is should be, is included within protection scope of the present invention.
Claims (5)
- A kind of 1. lossless compression-encoding method for supporting super-huge data window, it is characterised in that the step of comprising following order:S1. string matching encoder carries out string matching to uncompressed data, and generation contains individual character coding and control Multiple-length coding, the multiple index coding of instruction encoding;Described multiple-length coding, is classified, and follow specific length coded format according to string length to be encoded;Described is classified according to string length to be encoded, specifically includes:Using at least 3 grades of length coding, same to one-level Other code length is identical;The other length coding of lower level, advanced other length coding are short;Shorter character string use compared with The length coding of low level, longer character string use the length coding of higher level, specific partition of the level and specific length It is related to spend coded format;Described multiple-length coded format, is specifically included:L0 length codings, represent single character;L1 length codings, represent word Accord with string length 2 to 11;L2 length codings, represent string length 12 to 75;L3 length codings, represent string length 76 to 325;L4 length codings, including L4 lineal measures coding and L4 index length codings, it would be preferable to support maximum 16EB character string length Degree, and follow the form that specific lineal measure coding is combined with index length coding;Described multiple index coding, is classified, and follow specific according to the distance of character string and matched character string to be encoded Index coded format;The described distance according to character string and matched character string to be encoded is classified, and is specifically included:Using at least 3 grades of rope Draw coding, the code length of same rank is identical;The other index coding of lower level, advanced other index coding are short;Index Distance value of the value equal to character string and matched character string to be encoded;Less index value is encoded using the other index of lower level, Larger index value is encoded using the index of higher level, and specific partition of the level is related to specific index coded format;Described multiple index coded format, is specifically included:L1 indexes encode, and represent index value X1 to X1+Y1-1, wherein X1= 1, Y1 is equal to 28 powers;L2 indexes encode, and represent index value X2 to X2+Y2-1, wherein X2=X1+Y1, and Y2 is equal to 16 times of 2 Side;L3 indexes encode, and represent index value X3 to X3+Y3-1, wherein X3=X2+Y2, and Y3 is equal to 2 24 powers;L4 indexes encode, Index value X4 to X4+Y4-1, wherein X4=X3+Y3 are represented, Y4 is equal to 2 32 powers;L5 indexes encode, and represent index value X5 extremely X5+Y5-1, wherein X5=X4+Y4, Y5 are equal to 2 40 powers;L6 indexes encode, and represent index value X6 to X6+Y6-1, wherein X6 =X5+Y5, Y6 are equal to 2 48 powers;L7 indexes encode, and represent index value X7 to X7+Y7-1, wherein X7=X6+Y6, Y7 is equal to 2 56 powers;L8 indexes encode, and represent index value X8 to X8+Y8-1, wherein X8=X7+Y7, and Y8 is equal to 2 64 powers;L1、 The data window size that can support of L2, L3, L4, L5, L6, L7, L8 index coding be followed successively by 256B, 64KB, 16MB, 4GB, 1TB、256TB、64PB、16EB;S2. segmented cutter encodes multiple-length, into binary system 8 or binary system, other are fixed multiple index code division The encoded segment of length, there is provided to categorised statistical form and entropy coder;Meanwhile from multiple-length coding, multiple index coding Extract coding specification information, there is provided to categorised statistical form;S3. the encoded segment of regular length is put into different statistical forms first according to coding specification information, categorised statistical form Dynamic statistics are carried out, then according to the numerical value of statistical form, probabilistic forecasting table is provided to entropy coder;S4. the Coding Compression Algorithm based on probability statistics is used according to probabilistic forecasting table, entropy coder, to fixed-length code (FLC) point Section is compressed coding, exports binary compressed data.
- 2. the lossless compression-encoding method according to claim 1 for supporting super-huge data window, it is characterised in that:Step In S2, the coding for being categorized as N number of rank in described categorised statistical form, wherein N >=1.
- 3. the lossless compression-encoding method according to claim 1 for supporting super-huge data window, it is characterised in that:Step In S3, described statistical form designs according to 0 rank, 1 rank or 2 rank PPM probability statistics models.
- 4. the lossless compression-encoding method according to claim 1 for supporting super-huge data window, it is characterised in that:Step In S4, the Coding Compression Algorithm based on probability statistics is one in arithmetic coding, Interval Coding or Huffman codings Kind.
- 5. the lossless compression-encoding method of super-huge data window is supported described in claim 1-4 any claims for realizing A kind of lossless compression-encoding system for supporting super-huge data window, it is characterised in that:Including the connected character string of order With encoder, segmented cutter, categorised statistical form, entropy coder, whereinString matching encoder, for uncompressed data to be carried out with string matching, generation contains individual character coding and control Multiple-length coding, the multiple index coding of instruction encoding processed, there is provided to segmented cutter;Segmented cutter, for multiple-length to be encoded, into binary system 8 or binary system, other consolidate multiple index code division The encoded segment of measured length, there is provided to categorised statistical form and entropy coder, at the same time, from multiple-length coding, multiple index Coding specification information is extracted in coding, there is provided to categorised statistical form, a classification is the coding of M kind ranks, wherein M >=1;Categorised statistical form, for according to coding specification information, the encoded segment of regular length being put into different statistical forms Mobile state is counted, and each statistical form used is designed according to 0 rank, 1 rank or 2 rank PPM probability statistics models, at the same time, According to the numerical value of statistical form, probabilistic forecasting table is provided to entropy coder;Entropy coder, for according to probabilistic forecasting table, being encoded using arithmetic coding, Interval Coding, Huffman or other being based on generally The Coding Compression Algorithm of rate statistics, coding is compressed to fixed-length code (FLC) segmentation, exports binary compressed data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410317732.3A CN104156990B (en) | 2014-07-03 | 2014-07-03 | A kind of lossless compression-encoding method and system for supporting super-huge data window |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410317732.3A CN104156990B (en) | 2014-07-03 | 2014-07-03 | A kind of lossless compression-encoding method and system for supporting super-huge data window |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104156990A CN104156990A (en) | 2014-11-19 |
CN104156990B true CN104156990B (en) | 2018-02-27 |
Family
ID=51882478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410317732.3A Active CN104156990B (en) | 2014-07-03 | 2014-07-03 | A kind of lossless compression-encoding method and system for supporting super-huge data window |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104156990B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6336524B2 (en) * | 2016-07-25 | 2018-06-06 | 株式会社高速屋 | Data compression encoding method, apparatus thereof, and program thereof |
CN110377288A (en) * | 2018-04-13 | 2019-10-25 | 赛灵思公司 | Neural network compresses compiler and its compiling compression method |
CN110868222B (en) * | 2019-11-29 | 2023-12-15 | 中国人民解放军战略支援部队信息工程大学 | LZSS compressed data error code detection method and device |
CN111177432B (en) * | 2019-12-23 | 2020-11-03 | 北京航空航天大学 | Large-scale image retrieval method based on hierarchical depth hash |
CN112380196B (en) * | 2020-10-28 | 2023-03-21 | 安擎(天津)计算机有限公司 | Server for data compression transmission |
CN117238504B (en) * | 2023-11-01 | 2024-04-09 | 江苏亿通高科技股份有限公司 | Smart city CIM data optimization processing method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0878914A2 (en) * | 1997-05-12 | 1998-11-18 | Lexmark International, Inc. | Data compression method and apparatus |
CN1251449A (en) * | 1998-10-18 | 2000-04-26 | 华强 | Combined use with reference of two category dictionary compress algorithm in data compaction |
CN1447603A (en) * | 2003-01-10 | 2003-10-08 | 李春林 | Data compress method based on higher order entropy of message source |
CN101090501A (en) * | 2006-06-13 | 2007-12-19 | 财团法人工业技术研究院 | Mould search type variable-length code-decode method and device |
CN103067022A (en) * | 2012-12-19 | 2013-04-24 | 中国石油天然气集团公司 | Nondestructive compressing method, uncompressing method, compressing device and uncompressing device for integer data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7507897B2 (en) * | 2005-12-30 | 2009-03-24 | Vtech Telecommunications Limited | Dictionary-based compression of melody data and compressor/decompressor for the same |
-
2014
- 2014-07-03 CN CN201410317732.3A patent/CN104156990B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0878914A2 (en) * | 1997-05-12 | 1998-11-18 | Lexmark International, Inc. | Data compression method and apparatus |
CN1251449A (en) * | 1998-10-18 | 2000-04-26 | 华强 | Combined use with reference of two category dictionary compress algorithm in data compaction |
CN1447603A (en) * | 2003-01-10 | 2003-10-08 | 李春林 | Data compress method based on higher order entropy of message source |
CN101090501A (en) * | 2006-06-13 | 2007-12-19 | 财团法人工业技术研究院 | Mould search type variable-length code-decode method and device |
CN103067022A (en) * | 2012-12-19 | 2013-04-24 | 中国石油天然气集团公司 | Nondestructive compressing method, uncompressing method, compressing device and uncompressing device for integer data |
Non-Patent Citations (1)
Title |
---|
Design of new format for mass data compression;QIN Jiancheng et al;《The Journal of China Universities of Posts and Telecommunications》;20110228;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104156990A (en) | 2014-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104156990B (en) | A kind of lossless compression-encoding method and system for supporting super-huge data window | |
CN103236847B (en) | Based on the data lossless compression method of multilayer hash data structure and Run-Length Coding | |
CN106407285B (en) | A kind of optimization bit file compression & decompression method based on RLE and LZW | |
CN102122960B (en) | Multi-character combination lossless data compression method for binary data | |
CN103258030B (en) | Based on the mobile device memory compression methods that dictionary and brigade commander are encoded | |
EP2455853A2 (en) | Data compression method | |
CN110518917B (en) | LZW data compression method and system based on Huffman coding | |
Bhattacharjee et al. | Comparison study of lossless data compression algorithms for text data | |
CN103067022A (en) | Nondestructive compressing method, uncompressing method, compressing device and uncompressing device for integer data | |
CN104348490A (en) | Combined data compression algorithm based on effect optimization | |
CN103995887A (en) | Bitmap index compressing method and bitmap index decompressing method | |
CN103995988A (en) | High-throughput DNA sequencing mass fraction lossless compression system and method | |
CN100349160C (en) | Data compression method by finite exhaustive optimization | |
CN107565970B (en) | Hybrid lossless compression method and device based on feature recognition | |
CN110021369A (en) | Gene sequencing data compression decompressing method, system and computer-readable medium | |
CN104125475A (en) | Multi-dimensional quantum data compressing and uncompressing method and apparatus | |
CN116016606B (en) | Sewage treatment operation and maintenance data efficient management system based on intelligent cloud | |
CN103428498A (en) | Lossless image compression system | |
RU2013144665A (en) | CODER, DATA CODING METHOD, DECODER, DATA DECODING METHOD, DATA TRANSFER SYSTEM, DATA TRANSFER METHOD AND SOFTWARE | |
CN104467868A (en) | Chinese text compression method | |
CN104410424A (en) | Quick lossless compression method of memory data of embedded device | |
CN102238376B (en) | Image processing system and method | |
CN110021368A (en) | Comparison type gene sequencing data compression method, system and computer-readable medium | |
CN116737716A (en) | Time sequence data compression method and device | |
CN116680269A (en) | Time sequence data coding and compressing method, system, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
OL01 | Intention to license declared | ||
OL01 | Intention to license declared |