Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and
Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one
Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing
Every other embodiment obtained under the premise of creative work out, shall fall in the protection scope of this application.
In the compression schemes of categorical datas such as traditional long type, double type or float type, elongated volume can be directly used
The mode of code is compressed, and the method is uneven for the Distribution value of the categorical datas sequences such as long type, double type or float type
Even situation, can bring instead negative optimization (when the value of the categorical datas such as long type, double type or float type is bigger,
Byte number > 8B of compression);Also, conventional compression schemes are bad for the compression effectiveness for the sequence that data are upset at random.In view of
This, need to propose a kind of data more effective for types such as long type, double type or float types, not by data
The data compression method that Distribution value and/or randomness influence.
For example, figure engine Sharkgraph is capable of providing the figure storage of vast capacity in a specific application scenarios
It is calculated with figure, and supports timing diagram, the point side attribute in Sharkgraph can be stored according to column, wherein point side attribute
In the data comprising a large amount of such as long/double types.The memory space that such a chart database occupies is usually very
Greatly, therefore, it is necessary to propose a kind of effective data compression scheme, to cope with the calculating of super large figure, preferably to utilize storage
Space.
Below in conjunction with attached drawing, the technical scheme provided by various embodiments of the present application will be described in detail.
Fig. 1 is a kind of flow diagram for data compression method that this specification embodiment provides.For program angle,
The executing subject of process can be to be equipped on the program or application client of application server.
Referring to Fig.1, which may comprise steps of:
S110: according to original series, the first array is generated, the element in first array is different, and includes institute
State whole numerical value in original series.
Wherein, original series can be the sequence of arbitrary data types, it is preferable that can be long type, double type or
The sequence of the data for the more-figure number that float type etc. is 32 or 64.That is, the data pressure that embodiment of the disclosure provides
Contracting method can be adapted for the data sequence of any data type, wherein for classes such as long type, double type or float types
The compression effectiveness of the data sequence of type is especially prominent.
According to embodiment, the first array can be generated by the method for sequence duplicate removal according to original series, so that the
Include all mutually different elements in original series in one array.Can be generated using any existing De-weight method
One array, the element in the first array obtained according to different methods can have consistent with the element in original series
Sequentially, or it can have random order.Specifically, for example, if in original series comprising [100,400,200,400,500,
100,500], the order of elements in the first array can be consistent with original series, that is, can be [100,400,200,500];Or
Order of elements in the first array of person can be it is random, for example, can be [400,200,100,500] etc..
For example, the first array can be obtained by distinct sentence, may include in the first obtained array original
Mutually different element in sequence, wherein in the first array the sequence of element can with respective element in original series for the first time
The sequence consensus of appearance.The data type of element in first array can be with the data type phase of the element in original series
Together.
S120: according to the original series, the second array is generated, the element in second array is the original series
In location index of the element in first array.
Wherein, the element in the second array is index value, specifically the element in original series is in the first array
Location index.Since the amount of the element in original series is limited, so in the second array of the set as location index
Element numberical range it is limited, and the numerical value of the element in the second array is usually relatively small, is highly suitable for using change
Long codes method realizes effective compression.The data type of element in second array can be with the element in original series
Data type is different, for example, the element in the second array can be when the data type in original series is Long type
32int type data.
According to embodiment, according to different array generation methods, the first array is generated according to original series and according to original
Sequence generates two steps of the second array and can successively carry out, and can also carry out simultaneously, the application does not limit this.
S130: first array is encoded using the coding method based on numerical prediction, obtains the first coding knot
Fruit.
Specifically, Numerical Predicting Method may include such as Last value fallout predictor (Last Value Predictor), step
Long fallout predictor (Stride Predictor), finite context prediction technique (Finite Context Method, FCM), difference
The Numerical Predicting Method of finite context prediction technique (Differential Finite Context Method, DFCM) etc.,
But it is not limited to cited herein.
Specifically, it based on the coding method of Numerical Predicting Method, refers to and is predicted by treating code element, obtained
The difference is carried out code storage, that is, realizes the coding to element by the difference between predicted value and element to be encoded.
According to embodiment, after obtaining the first coding result, the first coding result can be carried out in the form of byte arrays
Storage.
S140: second array is encoded using variable length encoding method, obtains the second coding result.
According to embodiment, variable length encoding method, that is, elongated byte coding method is to same type of data using not
The coding method indicated with the byte data of length, can advantageously reduce the total amount of byte of data, to reduce occupancy
Memory space.The method of variable-length encoding includes the elongated volume based on high digit separator of UTF-8, Base 128Varints etc.
Code method and the variable length encoding method that distinguishes is removed using agency.
According to embodiment, after obtaining the second coding result, the second coding result can be carried out in the form of byte arrays
Storage.According to embodiment, when the element in the second array is 32int type data, before coding, each data occupy 4 bytes;
After coding, each data can occupy 1 to 4 byte.
S150: according to default mapping relations, first coding result and second coding result are stored.
Specifically, first coding structure and second coding result together constitute the compression number of original series
According to.Wherein, the mapping relations include, and the element representation in second coding result is element in original series
Location index in one coding result.
It may include compression in the coding result comprising the first coding result and the second coding result according to embodiment
Head can store the default mapping relations in the compression head.Optionally, the first coding can also be stored in the compression head
As a result with the storage organization relevant information of the second coding result.
After the completion of the first array and the second array are encoded, corresponding first coding knot can be stored according to byte
Fruit and the second coding result, that is, the first coding result and the second coding result are stored as byte arrays.Here, storage medium
It is not specifically limited.
Fig. 2 is a kind of schematic illustration for data compression method that this specification embodiment provides.
In conjunction with Fig. 1 and Fig. 2 it is found that the disclosure above mentioned embodiment provide a kind of data compression method, creatively lead to
It crosses and converts two arrays for original series to compress to sequence, and carried out according to the feature of the different arrays after conversion
Specified compression.The compression effectiveness of the compression method of the embodiment of the present disclosure is unrelated with Distribution value, may be implemented for any number
The compression of the sequence of values of range, even with the sequence that numerical values recited is unevenly distributed, compression effectiveness is also very good, obtain compared with
Big data compression rate;When sequence intermediate value is densely distributed, or when having many identical values, compression effectiveness is extremely prominent, tool
There is bigger compression ratio.It can reduce the text for needing to transmit or store by the data compression method that embodiment of the disclosure provides
The total amount of byte of part, thus the utilization realized the disk occupied space of data faster transmitted, reduce file, improve memory space
Rate.
It describes in detail below in conjunction with specific implementation step of the attached drawing to the embodiment of the present disclosure.
It is described according to the original series according to embodiment, the second array (step S120) is generated, can specifically include:
Read the element in original series;Element identical with the value of the element is searched in the first array;Obtain first number
Subscript value of the element in first array in group, stores to the second array.The method for generating the second array is unlimited
In this method described herein.
It is described that first array is encoded using the coding method based on numerical prediction according to embodiment, it obtains
First coding result (step S130), can specifically include: the element in first array predicted using fallout predictor,
Obtain predicted value;XOR operation is executed with the true value for being predicted element to the predicted value, obtains exclusive or result;To described different
Or result is encoded, and the first coding result is obtained.
According to embodiment, if the predicted value of element and the value of element are close, the predicted value and element of that identical element element
Described value several high positions it is identical, then their exclusive or result can have several leading zeroes, it is thus possible to by exclusive or
As a result several leading zeroes are compressed.Obviously, predicted value and the true value of element are closer, then the leading zero in exclusive or result
It is more, then exclusive or result can be carried out to the compression of bigger degree.
Fig. 3 is the schematic diagram for the coding method based on numerical prediction that this specification embodiment provides.
It is in step s 130, described that the exclusive or result is encoded according to optional embodiment, obtain the first coding
As a result, can specifically include: being encoded to the leading zero of the exclusive or result, obtain the first coding section;By the exclusive or knot
Part in fruit in addition to the leading zero is determined as the second coding section;According to first coding section and second coding section
Obtain the first coding result.
It optionally, can be successively including the first coding section and the second coding section of each element, specifically in the first coding result
Ground, sequentially can include by initial position to end position in the first coding result the first element the first coding section, first
Second coding section of element, the first coding section of second element, second element the second coding section ..., the n-th element first
Second coding section of coding section, the n-th element.
Optionally, in the first coding result can with every two element be one group come include the first coding section and second coding
Section specifically, in the first coding result sequentially can include the first coding of the first element by initial position to end position
Section, the first coding section of second element, the second coding section of the first element, second element the second coding section ..., (n-1)th
First coding section of element, the first coding section of the n-th element, the second coding section of the (n-1)th element, the n-th element second coding
Section.Wherein, the first coding section of the first element and the first coding section of second element can be stored in a byte.It needs
Bright, in the first coding result, the first coding section of each element and the storage mode of the second coding section are not limited to above two
Method, but can be with any reasonable sequential storage.
Specifically, referring to Fig. 3, data DnCode compression method may include: fallout predictor generate predicted value Predn;It will be to
Compressed value DnWith predicted value PrednXOR operation is done, exclusive or result Diff is obtainednIf, wherein predicted value PrednWith value to be compressed
DnIt approaches, then exclusive or result DiffnInclude several leading zeroes;The exclusive or result Diff is encoded by leading zero count valuenIn
Several leading zeroes;Obtain coding result LZCnBitsn.Wherein, with the first coding section LZCnIndicate leading zero count value;
With the second coding section BitsnIndicate remaining bit.According to embodiment, the bit (for example, 4 bit) that preset quantity can be used carrys out table
Show the first coding section LZCn。
Correspondingly, when carrying out data decompression, by compressed data LZCnBitsnDecompression is reduced to DnMethod may include:
Read the first coding section LZC of the bit (for example, 4 bit) of preset quantitynWith corresponding residue bit (for example, 64 subtract LZCn
In bit number shared by the leading zero that reads out) the second coding section Bitsn, obtain Diffn;It is identical when using with data compression
Predicted value PrednWith DiffnXOR operation is done, decompression data D is obtainedn。
When including not only leading zero in exclusive or result, when further including tail portion zero, can simultaneously to leading zero and tail portion zero into
Row coding compression.
It is in step s 130, described that the exclusive or result is encoded according to another optional embodiment, obtain first
Coding result specifically can also include: to encode to the leading zero of the exclusive or result, obtain the first coding section;To described
The tail portion zero of exclusive or result is encoded, and the second coding section is obtained;The leading zero and the tail will be removed in the exclusive or result
Part except portion zero is determined as third coding section;It is compiled according to first coding section, second coding section and the third
Code section obtains the first coding result, wherein successively includes first coding section, third volume in first coding result
Code section and second coding section.
It optionally, can successively include the first coding section, the second coding section and the third of each element in the first coding result
Coding section sequentially can include specifically the first of the first element by initial position to end position in the first coding result
Coding section, the second coding section of the first element, the third coding section of the first element, the first coding section of second element, second yuan
Element the second coding section, second element third coding section ..., the first coding section of the n-th element, the n-th element second compile
The third coding section of code section, the n-th element.
Optionally, in the first coding result can with every two element be one group come include the first coding section, second coding
Section and third coding section sequentially can include specifically first yuan by initial position to end position in the first coding result
Element the first coding section, the first coding section of second element, the second coding section of the first element, second element second coding
Section, the third coding section of the first element, second element third coding section ..., the first coding section of the (n-1)th element, n-th yuan
Element the first coding section, the second coding section of the (n-1)th element, the second coding section of the n-th element, the (n-1)th element third coding
The third coding section of section, the n-th element.Wherein, the first coding section of the first element and the first coding section of second element can be protected
There are in a byte, the third coding section of the first element and the third coding section of second element can be stored in a byte
In.It should be noted that in the first coding result, the storage of the first coding section, the second coding section, third coding section of each element
Mode is not limited to above two method, but can be with any reasonable sequential storage.
Specifically, to data DnCode compression method may include: fallout predictor generate predicted value Predn;By value to be compressed
DnWith predicted value PrednXOR operation is done, exclusive or result Diff is obtainedn, wherein exclusive or result DiffnComprising n1 leading zero and
N2 tail portion zero;The n1 leading zero is encoded by leading zero count value, and is encoded by tail portion count value of zero described
N2 tail portion zero;Obtain coding result LZCnBitsnTZCn.Wherein, in coding result, with the first coding section LZCnIndicate leading
Count value of zero, with the second coding section TZCnIndicate tail portion count value of zero;With third coding section BitsnIndicate remaining bit.According to implementation
Example, can be used the bit (for example, 4 bit) of the first preset quantity to indicate the first coding section LZCn, it is pre- to can be used second
If the bit (for example, 4 bit) of quantity indicates the second coding section TZCn。
According to embodiment, in practical applications, the pressure of leading zero and/or tail portion zero can be carried out according to actual needs
Contracting, to realize that data are farthest compressed, to obtain maximum compression ratio.
Fig. 4 is one of the coding method based on numerical prediction that this specification embodiment provides Numerical Predicting Method
Schematic diagram.
According to optional embodiment, in step s 130, the element in first array is predicted, is obtained
Predicted value can specifically include: the sequence constituted based on elements several before element to be predicted searches corresponding history value sequence
Column;Based on the history value sequence, corresponding predicted value is obtained.
Specifically, the sequence that can be constituted according to (for example, first 3) elements several before element to be predicted, in history
It is worth lookup and the consistent history value sequence of the sequence in table, and using the history value sequence found as index, in predicted value table
It is middle to search corresponding predicted value.If do not find with the consistent history value sequence of current sequence, for current element to be predicted
Assign an initial predicted value.
More specifically, the coding method based on numerical prediction is specifically as follows the numerical value based on context referring to Fig. 4
Prediction technique, wherein context can be the history value that several adjacent elements are constituted.It include processed number in history value table
According to sequence, for example, may include the number that continuous element processed in array to be compressed is constituted in the data sequence
According to sequence.Wherein, the quantity of the data in data sequence can according to need to set, this numerical value is arranged bigger, prediction
As a result more accurate, but predict that time-consuming is more, for example, the quantity can be set to 3.Wherein, include in predicted value table is
The numerical value occurred after the data sequence in history value table.When being predicted, searches and work as in history value table first
The corresponding history value sequence of the adjacent element of preceding element;Then it is index with the history value sequence, is being predicted by hash function
Predicted value is searched in value table.
According to embodiment, after being predicted known to the value of element, the numerical value in fallout predictor can be updated.It specifically, can be with
The true value of element is updated in predicted value table, and can true value according to element and old history value, carry out the more new calendar
History value table.
Fig. 5 is another Numerical Predicting Method in the coding method based on numerical prediction that this specification embodiment provides
Schematic diagram.
According to another optional embodiment, the element in first array is predicted, obtains predicted value, tool
Body may include: the sequence of differences based on elements several before element to be predicted, search corresponding history sequence of differences;Based on institute
History sequence of differences is stated, corresponding prediction difference is obtained;Previous element and the prediction difference based on the element to be predicted,
Obtain the predicted value of the element to be predicted.
Specifically, it can be obtained in several described preceding elements according to several elements before element to be predicted per adjacent
The sequence of differences that difference between two elements is constituted;It is searched and the consistent history difference of the sequence of differences in history difference table
Sequence;And using the history sequence of differences found as index, corresponding prediction difference is searched in prediction difference table.If not
Find with the consistent history sequence of differences of current difference sequence, then be that current difference to be predicted assigns an initial pre- error of measurement
Value.
More specifically, history value table can store numerical value D to be predicted referring to Fig. 5nPrevious numerical value Dn-1And history is poor
Value sequence, the history sequence of differences are the sequences that the difference between the adjacent element of processed data is constituted.Wherein,
The history sequence of differences can be, for example, the sequence that three differences between continuous four data are constituted.Pre- error of measurement
It may include the difference occurred after the history sequence of differences in history difference table in value table.When being predicted, for example, can
To be primarily based on current value DnFirst four element between difference constitute sequence of differences, in history value table search and should
The consistent history sequence of differences of sequence of differences, for example, (delta1, delta2, delta3);Then it is with the history sequence of differences
Index, finds prediction difference in prediction difference table by hash function, for example, dpre;Again by previous numerical value Dn-1With it is described pre-
Error of measurement value dpre is added, and obtains predicted value Predn=Dn-1+dpre。
According to embodiment, after learning the true value for being predicted element, the numerical value in fallout predictor can be updated.
Specifically, the true value for being predicted element can be updated in history value table;It can calculate when the true value for being predicted element
Difference between the value of previous element, as new difference, then, by the new Difference Storage into prediction difference table,
And according to the new difference and old history difference come more new historical difference table.
Fig. 4 and Fig. 5 respectively describes through different fallout predictors the process for carrying out numerical prediction, in the application, can be with
Predicted value is obtained according to actual needs to use any fallout predictor described in Fig. 4 or Fig. 5.According to the implementation of the disclosure
Example, in order to improve the accuracy of prediction, improve predicted velocity, and improve compression ratio etc., it can be pre- using more than one simultaneously
Survey device.
Fig. 6 is the signal of the coding method based on numerical prediction for two fallout predictors of use that this specification embodiment provides
Figure.
Specifically, referring to Fig. 6, the described of step S130 uses the coding method based on numerical prediction to first array
Encoded, obtain the first coding result, can specifically include: using the first fallout predictor to the element in first array into
Row prediction, obtains the first predicted value;The element in first array is predicted using the second fallout predictor, it is pre- to obtain second
Measured value;XOR operation is executed with the true value for being predicted element to first predicted value, obtains the first exclusive or result;To described
Second predicted value executes XOR operation with the true value for being predicted element, obtains the second exclusive or result;Compare first exclusive or
As a result the leading zero quantity of leading zero quantity and the second exclusive or result, using the result more than leading zero quantity as final different
Or result;The final exclusive or result is encoded, the first coding result is obtained.
According to embodiment, the comparison of comparison and the second exclusive or result to the first exclusive or result, including in exclusive or result
The quantity of leading zero and/or the quantity of tail portion zero compare.For example, the quantity of leading zero can be compared, and by leading zero
Quantity more than exclusive or as a result, as preferred final exclusive or result.In another example the quantity of tail portion zero can be compared, and by tail
Exclusive or more than zero quantity of portion is as a result, as preferred final exclusive or result.For another example the total of leading zero and tail portion zero can be compared
Quantity, by the biggish exclusive or of the sum of the quantity of leading zero and tail portion zero as a result, preferably final exclusive or result.Select preferred exclusive or
As a result mode is without being limited thereto.
Optionally, the first fallout predictor and the second fallout predictor can be identical or different, for example, can separately select such as
Fallout predictor shown in Fig. 4 or Fig. 5, the fallout predictor in addition to also can choose the fallout predictor shown in Fig. 4 and Fig. 5.
For step S140, illustrate by taking this variable length encoding method of Base 128Varints as an example below, to the second number
The detailed process that group is encoded.
One or more bytes can be used to indicate integer data in Base 128Varints coding method.In Base
Highest order (msb) is obtained in the coded strings of 128Varints, in each byte and is used as flag bit, if the position is 1, indicates next byte
Current value is indicated jointly with current byte, if the position is 0, indicates that current byte is the last byte of current value;Word
Remaining seven in section will be encoded due to being used only 7 for storing data in itself, and at most represent less than 27 powers
Numerical value, that is, 0~127.For the numerical value of 7 powers less than 2,2 byte representations can be used;For 14 powers less than 2
2 byte representations can be used in numerical value;For the numerical value of 21 powers less than 2,3 byte representations can be used;For being less than
4 byte representations can be used in the numerical value of 2 28 powers.
Illustrate by taking int32 type array [1,300,1024] as an example below.
For ten's digit 1, binary system (4 bytes) is expressed as 00,000,000 00,000,000 00000000
00000001.A byte is only taken up after variable-length encoding to be indicated, such as: 00000001.1 byte can be used in number 1
It indicates.
For ten's digit 300, binary system (4 bytes) is expressed as 00,000,000 00,000,000 00000001
00101100.Cataloged procedure is as follows: (1) since binary system low byte, every 7 are split, and obtain 0000010
0101100;(2) inverted order obtains 0,101,100 0000010;(3) increase identifier, obtain coded strings 10101100
00000010.That is, 2 bytes can be used to indicate in number 300.Correspondingly, it is solved when to coded strings 10,101,100 00000010
Code when: (1) highest order of the 1st byte is 1, continuation read backward, the highest order of the 2nd byte is 0, show current value by
2 byte representations;(2) low 7 0,101,100 0000010 of two bytes are taken;(3) inverted order, 0,000,010 0101100, i.e., ten into
The 300 of system.
For ten's digit 1024, binary system (4 bytes) is expressed as 00,000,000 00,000,000 00000100
00000000.Cataloged procedure is as follows: (1) since binary system low byte, every 7 are split, and obtain 0001000
0000000;(2) inverted order obtains 0,000,000 0001000;(3) increase identifier, obtain coded strings 10000000
00001000.2 bytes can be used to indicate in number 1024.
For the initial array [1,300,1024] of int32 type block code, it is 12 that byte number is occupied before encoding, by becoming
The byte number occupied after long codes is 5.Variable-length encoding significantly reduces the memory headroom of storage of array occupancy.
As can be seen that variable-length encoding compression is more obvious the compression effectiveness of smaller numerical value.Implementation for the disclosure
, what is stored in the second array is location index value, in array to be compressed in the case where the limited amount of element, location index
The limited and generally smaller numerical value of the numberical range of value, therefore, the advantage that variable-length encoding can be made full use of to be compressed,
Byte number can be effectively reduced, significant compression effectiveness is obtained.
According to foregoing description, present disclose provides a kind of data compression methods, and the data compression method is first by original sequence
Column are converted into two arrays, and according to the data characteristics of two arrays, pointedly select respectively different compression methods come into
Row compression obtains maximum compression ratio to realize the utmostly compression to original series.In this course, it uses
Compression method based on numerical prediction, this compression method can be adapted for the data of random order arrangement, but, for data
More chaotic array is arranged, time complexity and the operation complexity for carrying out the process of numerical prediction are higher.In consideration of it, the disclosure
Provide a kind of following optional data compression method, wherein first array to be encoded is ranked up, then uses base again
It is encoded in the method for numerical prediction.Using the compression method based on numerical prediction for using sequence, numerical value can be not only reduced
The time complexity and operation complexity of prediction process, also can be further improved compression ratio.
Fig. 7 is the schematic diagram for the method that the first array is generated according to original series that this specification embodiment provides.
In another embodiment of the present disclosure, referring to Fig. 7, according to original series, the step of generating the first array, specifically may be used
To include: S111, the different element of value in original series is extracted, the first preparation array is generated;S112, by first preparation
Element in array sorts according to predetermined ordering rule, generates the first array.
Specifically, according to embodiment, the predetermined ordering rule can be according to value sequence from big to small, value from it is small to
The various sequence rule set on demand such as big sequence, sequence, the sequence of tail portion numerical value from small to large of tail portion numerical value from big to small
Then.The sequence of the first array is realized it is, for example, possible to use sort function, to realize according to value from big to small or from small to large
Sequence sequence, such ranking results make the high-order similarity of adjacent numerical value higher, can be with when carrying out XOR operation
Generate more leading zeroes.It realizes that the mode of sequence is without being limited thereto, but any applicable method in the prior art can be used
To carry out the sequence of the first array.
According to embodiment, the relevance of numerical value adjacent in the first array can be made more by being ranked up to the first array
Greatly, the more similitudes for excavating consecutive value, leading zero and/or the number of tail portion zero when carrying out XOR operation, in exclusive or result
Amount is more, and the first array is compressed to a greater degree, and the byte number that the first compression result occupies is less, to reduce
The memory space and/or the transmission time used that compressed data occupies.
In accordance with an embodiment of the present disclosure, above-mentioned data compression method is provided, the data compression method is not only for any
The data sequence compression effect of Distribution value is obvious;And it is possible to the compression effectiveness of the sequence to random order distribution having had,
That is, good compression ratio can also be obtained for the data sequence of unordered/out-of-order distribution.
A data compression according to the embodiment has been shown in particular in embodiment in order to more clearly describe the present invention below
Example:
Original series:
[24,22,11,20,11,6,25,33,7,41,26,46,34,47,49,26,26,10,2,39,35,8,43,3,
28,29,22,1,5,38,36,42,44,29,24,30,0,15,34,6,14,31,38,34,44,15,17,5,19,41,23,
30,37,25,44,27,33,23,11,27,32,3,43,29,18,45,13,4,5,20,11,0,18,3,32,32,31,17,
0,30,7,26,47,30,20,46,35,10,19,18,43,11,29,5,6,39,33,31,14,23]
First array (after sequence):
[0,1,2,3,4,5,6,7,8,10,11,13,14,15,17,18,19,20,22,23,24,25,26,27,28,
29,30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,49]
First coding result:
[118B,1B,-1B,-1B,-1B,-10B,2B,110B,1B,1B,-17B,1B,-18B,1B,1B,-1B,102B,
2B,1B,-1B,-1B,-1B,-1B,-1B,-1B,-1B,-1B,-18B,1B,1B,-1B,-1B,-2B,1B,0B]
Second array:
[20,18,10,17,10,6,21,29,7,36,22,41,30,42,43,22,22,9,2,35,31,8,38,3,
24,25,18,1,5,34,32,37,39,25,20,26,0,13,30,6,12,27,34,30,39,13,14,5,16,36,19,
26,33,21,39,23,29,19,10,23,28,3,38,25,15,40,11,4,5,17,10,0,15,3,28,28,27,14,
0,26,7,22,42,26,17,41,31,9,16,15,38,10,25,5,6,35,29,27,12,19]
Second coding result:
[8B,20B,8B,18B,8B,10B,8B,17B,8B,10B,8B,6B,8B,21B,8B,29B,8B,7B,8B,36B,
8B,22B,8B,41B,8B,30B,8B,42B,8B,43B,8B,22B,8B,22B,8B,9B,8B,2B,8B,35B,8B,31B,
8B,8B,8B,38B,8B,3B,8B,24B,8B,25B,8B,18B,8B,1B,8B,5B,8B,34B,8B,32B,8B,37B,8B,
39B,8B,25B,8B,20B,8B,26B,8B,0B,8B,13B,8B,30B,8B,6B,8B,12B,8B,27B,8B,34B,8B,
30B,8B,39B,8B,13B,8B,14B,8B,5B,8B,16B,8B,36B,8B,19B,8B,26B,8B,33B,8B,21B,8B,
39B,8B,23B,8B,29B,8B,19B,8B,10B,8B,23B,8B,28B,8B,3B,8B,38B,8B,25B,8B,15B,8B,
40B,8B,11B,8B,4B,8B,5B,8B,17B,8B,10B,8B,0B,8B,15B,8B,3B,8B,28B,8B,28B,8B,27B,
8B,14B,8B,0B,8B,26B,8B,7B,8B,22B,8B,42B,8B,26B,8B,17B,8B,41B,8B,31B,8B,9B,8B,
16B,8B,15B,8B,38B,8B,10B,8B,25B,8B,5B,8B,6B,8B,35B,8B,29B,8B,27B,8B,12B,8B,
19B]
Below by taking the first array (for example, first array of Long type) after sorting as an example, to illustrate to carry out the first array
Coding obtains the process of the first coding result:
Current first array to be encoded can be preset and only need to pay close attention to high 14bit's firstly the need of explanation
Exclusive or as a result, namely be up to 14 leading zeroes, it is possible to the leading zero quantity is stored using 4 bit.In view of using 4
A bit places the coding result of leading zero, and a byte can place two digital leading zero coding results, therefore by two
A digital one group is encoded.Cataloged procedure addressed below to data (0,1) in the first array.
(1) Long type array is converted into byte arrays [0x00,0x00,0x00,0x00], [0x00,0x00,0x00,
0x01],……]。
(2) numerical prediction is carried out using two fallout predictors, wherein the first fallout predictor can be using as shown in Figure 4
Prediction technique, initial prediction (predn) it is set as 0;Second fallout predictor can use prediction technique as shown in Figure 5, initially
Predicted value (predn) it is set as 0.
By the initial prediction (for example, being 0) of first digit [0x00,0x00,0x00,0x00] and the first fallout predictor p1
XOR operation is done, exclusive or result diff 1d [0x00,0x00,0x00,0x00] is obtained;Meanwhile by first digit [0x00,
0x00,0x00,0x00] with the initial prediction of the second fallout predictor p2 (for example, being 0) XOR operation is done, obtain exclusive or result
diff 2d[0x00,0x00,0x00,0x00].In the identical situation of two exclusive or results, it can choose the first fallout predictor p1's
Exclusive or result is as final exclusive or result.
According to the final exclusive or result and be currently predicted value true value (being herein 0) Lai Gengxin the first fallout predictor p1 and
Parameter in second fallout predictor p2.Specifically, the history value table and predicted value table in fallout predictor are updated respectively.For example, first is pre-
The update mode for surveying the index of the predicted value table of device p1 can be with are as follows: ((lastIndex<<6) ^ (this updated value>>48))
(table.length-1), wherein " lastIndex " is the last time index of predicted value table, and " this updated value " is this
Input value, " table.length " are the length of predicted value table.For example, the index of the predicted value table of the second fallout predictor p2 is more
New paragon can be with are as follows: ((dfcmHash<<2) ^ ((trueValue-lastValue)>>40)) (table.length-1),
In, " dfcmHash " is this index, and " trueValue " is the true value of this value of being predicted, and " lastValue " is upper one
The true value of a value, " table.length " are the length of predicted value table.
(3) second digit [0x00,0x00,0x00,0x01] and the predicted value of updated first fallout predictor p1 are done
XOR operation obtains exclusive or result diff 1e [0x00,0x00,0x00,0x01];Meanwhile by second digit [0x00,
0x00,0x00,0x01] with the initial prediction of the second fallout predictor p2 do XOR operation, obtain exclusive or result diff 2e [0x00,
0x00,0x00,0x01].In the identical situation of two exclusive or results, the exclusive or result of the first fallout predictor p1 can choose as most
Whole exclusive or result.
According to the final exclusive or result and be currently predicted value true value (being herein 1) Lai Gengxin the first fallout predictor p1 and
Parameter in second fallout predictor p2.
(4) the exclusive or result diff 1d [0x00,0x00,0x00,0x00] of first digit 0, the quantity of leading zero are 8,
In data processing, it when leading zero quantity is greater than 4, carries out subtracting 1 processing, remembers zeroByteSize (0)=7;Second
The exclusive or result diff 1e [0x00,0x00,0x00,0x01] of number 1, leading zero quantity are 7, note zeroByteSize (1)=
6.The leading zero of (0,1) is encoded, is stored using a byte, diff 1d is moved to left 4 and is put into high 4bit, by diff
1e is put into low 4bit: code |=zeroByteSize (0) < < 4 |: code |=zeroByteSize (1), that is, 118B.
(5) then, the non-leading null part of first digit and second digit is stored.Specifically, non-for number 0
Leading null part is without storage;For number 1, storage 0,000 00001, that is, 1B.
Therefore, the result after data (0,1) coding is stored as byte arrays [118B, 1B].Repetitive cycling above-mentioned steps, can
To obtain the first coding result.
Foregoing provide data compression methods according to an embodiment of the present disclosure, are based on same thinking, and this specification is real
It applies example and additionally provides data decompression method corresponding with above-mentioned data compression method.
Fig. 8 is the flow diagram that a kind of data that this specification embodiment provides press decompressing method.
According to embodiment, referring to Fig. 8, a kind of data decompression method the following steps are included:
S210: obtaining compressed data, and the compressed data includes the first compressed data with default mapping relations and the
Two compressed datas;
S220: the data characteristics based on first compressed data, using the coding method based on numerical prediction to described
The decompression of first compressed data, obtains the first array;
S230: the data characteristics based on first compressed data, according to default variable-length encoding rule to second pressure
Contracting data decompression obtains the second array;
S240: being based on first array and second array, according to the default mapping relations, obtains decompression number
According to.
It is described to be obtained based on first array and second array according to the default mapping relations according to embodiment
It to decompression data, specifically includes: using the value of the element in second array as location index, reading in first array
Element value;The value of element in first array read every time is stored, decompression array is obtained.
Described above is data compression methods and data decompressing method that the disclosure provides, are based on same thinking, this theory
Bright book embodiment additionally provides the corresponding device of above-mentioned data compression method and device corresponding with above-mentioned data decompression method.
Fig. 9 is the structural schematic diagram for the data compression device that this specification embodiment provides.As shown in figure 9, the data pressure
Compression apparatus may include:
Generation module 310, for generating the first array and the second array according to original series, the member in first array
Element is different, and includes whole numerical value in the original series, and the element in second array is the original series
In location index of the element in first array;
First coding module 320, for being encoded using the coding method based on numerical prediction to first array,
Obtain the first coding result;
Second coding module 330 obtains the second volume for encoding using variable length encoding method to second array
Code result;
Memory module 340, for storing first coding result and the second coding knot according to default mapping relations
Fruit.
Figure 10 is a kind of structural schematic diagram for data decompressor that this specification embodiment provides.As shown in Figure 10, should
Data decompressor may include:
Module 410 is obtained, for obtaining compressed data, the compressed data includes the first pressure with default mapping relations
Contracting data and the second compressed data;
First data decompression module 420, for the data characteristics based on first compressed data, using pre- based on numerical value
The coding method of survey decompresses first compressed data, obtains the first array;
Second data decompression module 430, for the data characteristics based on first compressed data, according to presetting elongated volume
Code rule decompresses second compressed data, obtains the second array;
Data generation module 440 is closed for being based on first array and second array according to the default mapping
System obtains decompression data.
Based on same thinking, this specification embodiment additionally provides above-mentioned data compression and decompressing method is corresponding sets
It is standby.
Figure 11 is a kind of equipment corresponding to data compression method and data decompressing method that this specification embodiment provides
Structural schematic diagram.As shown in figure 11, a kind of data processing equipment 500 may include:
At least one processor 510;And
The memory 530 being connect at least one described processor communication;Wherein,
The memory 530 is stored with the instruction 520 that can be executed by least one described processor 510, described instruction quilt
Described at least one processor 510 executes so that at least one described processor 510 can:
According to original series, the first array is generated, the element in first array is different, and includes described original
Whole numerical value in sequence;
According to the original series, the second array is generated, the element in second array is in the original series
Location index of the element in first array;
First array is encoded using the coding method based on numerical prediction, obtains the first coding result;
Second array is encoded using variable length encoding method, obtains the second coding result;
According to default mapping relations, first coding result and second coding result are stored;
Alternatively,
So that at least one described processor 510 can:
Compressed data is obtained, the compressed data includes the first compressed data and the second compression with default mapping relations
Data;
Based on the data characteristics of first compressed data, using the coding method based on numerical prediction to first pressure
Contracting data decompression obtains the first array;
Based on the data characteristics of first compressed data, according to default variable-length encoding rule to second compressed data
Decompression, obtains the second array;
Decompression data are obtained according to the default mapping relations based on first array and second array.
It is above-mentioned that this specification specific embodiment is described, in some cases, record in detail in the claims
Movement or step can execute according to the sequence being different from embodiment and desired result still may be implemented.In addition,
Process depicted in the drawing not necessarily requires the particular order shown or consecutive order and is just able to achieve desired result.At certain
In a little embodiments, multitasking and parallel processing are also possible or may be advantageous.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device,
For apparatus embodiments, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to method
The part of embodiment illustrates.
This specification embodiment provide device, apparatus and method for be it is corresponding, therefore, device, equipment also have with it is right
The similar advantageous effects of induction method, since the advantageous effects of method being described in detail above,
The advantageous effects of which is not described herein again corresponding intrument, equipment.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example,
Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So
And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit.
Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause
This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device
(Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate
Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer
Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker
Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled
Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development,
And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language
(Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL
(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer
This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages,
The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can
Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller
Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited
Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to
Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic
Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc.
Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it
The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions
For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment
The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method of element, commodity or equipment.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group
Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art
For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal
Replacement, improvement etc., should be included within the scope of the claims of this application.