CN110266316A - A kind of data compression, decompressing method, device and equipment - Google Patents

A kind of data compression, decompressing method, device and equipment Download PDF

Info

Publication number
CN110266316A
CN110266316A CN201910380862.4A CN201910380862A CN110266316A CN 110266316 A CN110266316 A CN 110266316A CN 201910380862 A CN201910380862 A CN 201910380862A CN 110266316 A CN110266316 A CN 110266316A
Authority
CN
China
Prior art keywords
array
coding
result
data
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910380862.4A
Other languages
Chinese (zh)
Other versions
CN110266316B (en
Inventor
唐德荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910380862.4A priority Critical patent/CN110266316B/en
Publication of CN110266316A publication Critical patent/CN110266316A/en
Application granted granted Critical
Publication of CN110266316B publication Critical patent/CN110266316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3071Prediction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4093Variable length to variable length coding

Abstract

This specification embodiment discloses a kind of data compression, decompressing method, device and equipment.The data compression method includes: to generate the first array, the element in first array is different, and includes whole numerical value in the original series according to original series;According to the original series, the second array is generated, the element in second array is location index of the element in first array in the original series;First array is encoded using the coding method based on numerical prediction, obtains the first coding result;Second array is encoded using variable length encoding method, obtains the second coding result;And according to default mapping relations, store first coding result and second coding result.

Description

A kind of data compression, decompressing method, device and equipment
Technical field
This application involves field of computer technology more particularly to a kind of data compression, decompressing method, device and equipment.
Background technique
In scientific algorithm environment, it is often necessary to store or transmit between the computers mass data on computers. The basic thought of data compression technique is by treating repeated data occupied space in compressed data less symbol or code It replaces, so that compressed data occupy less disk storage control or shorter transmission time.
Compression for categorical datas such as long type, double type or float types, traditional approach are directly to use elongated volume The mode of code is compressed, and when numerical value is generally smaller in sequence, this method has preferable compression effectiveness, but in the sequence The non-uniform situation of Distribution value under, the compression effectiveness of this method is poor.It is supplied in view of of the existing technology, it is necessary to mention A kind of more efficiently data compression side for the more data type of the occupy-places such as long type, double type or float type out Method, to improve the compression ratio of the data for the sequence of values that logarithm is unevenly distributed.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of data compression, decompressing method, device and equipment, for increasing To the level of data compression of the more categorical data of the occupy-places such as long type, double type or float type, improve to any number point The data compression rate of cloth or numeric distribution unevenness sequence of values.
In order to solve the above technical problems, this specification embodiment is achieved in that
A kind of data compression method that this specification embodiment provides, comprising: according to original series, the first array is generated, Element in first array is different, and includes whole numerical value in the original series;According to the original series, The second array is generated, the element in second array is position of the element in first array in the original series Index;First array is encoded using the coding method based on numerical prediction, obtains the first coding result;Using change Long codes method encodes second array, obtains the second coding result;According to default mapping relations, described the is stored One coding result and second coding result.
A kind of data decompression method that this specification embodiment provides, comprising: obtain compressed data, the compressed data packets Include the first compressed data and the second compressed data with default mapping relations;Data based on first compressed data are special Sign, decompresses first compressed data using the coding method based on numerical prediction, obtains the first array;Based on described The data characteristics of one compressed data decompresses second compressed data according to default variable-length encoding rule, obtains the second number Group;Decompression data are obtained according to the default mapping relations based on first array and second array.
A kind of data compression device that this specification embodiment provides, comprising: generation module, for raw according to original series At the first array and the second array, the element in first array is different, and includes the whole in the original series Numerical value, the element in second array are location index of the element in first array in the original series;The One coding module obtains the first coding for encoding using the coding method based on numerical prediction to first array As a result;Second coding module obtains the second coding knot for encoding using variable length encoding method to second array Fruit;Memory module, for storing first coding result and second coding result according to default mapping relations.
A kind of data decompressor that this specification embodiment provides, comprising: module is obtained, for obtaining compressed data, The compressed data includes the first compressed data and the second compressed data with default mapping relations;First data decompression mould Block, for the data characteristics based on first compressed data, using the coding method based on numerical prediction to first pressure Contracting data decompression obtains the first array;Second data decompression module, it is special for the data based on first compressed data Sign, decompresses second compressed data according to default variable-length encoding rule, obtains the second array;Data generation module is used In being based on first array and second array, according to the default mapping relations, decompression data are obtained.
A kind of data compression device that this specification embodiment provides, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes so that at least one described processor can:
According to original series, the first array is generated, the element in first array is different, and includes described original Whole numerical value in sequence;
According to the original series, the second array is generated, the element in second array is in the original series Location index of the element in first array;
First array is encoded using the coding method based on numerical prediction, obtains the first coding result;
Second array is encoded using variable length encoding method, obtains the second coding result;
According to default mapping relations, first coding result and second coding result are stored,
Alternatively,
So that at least one described processor can:
Compressed data is obtained, the compressed data includes the first compressed data and the second compression with default mapping relations Data;
Based on the data characteristics of first compressed data, using the method based on numerical prediction to the first compression number According to decompression, the first array is obtained;
Based on the data characteristics of first compressed data, according to default variable-length encoding rule to second compressed data Decompression, obtains the second array;
Decompression data are obtained according to the default mapping relations based on first array and second array.
At least one above-mentioned technical solution that this specification embodiment uses can reach following the utility model has the advantages that by will be former Begin Sequence Transformed to be two arrays, and the feature based on two arrays, different methods is respectively adopted and is encoded, to obtain most Whole coding result.This data compression method is increased to the more type of the occupy-places such as long type, double type or float type The level of data compression of data improves the data compression rate for the sequence of values that logarithm is unevenly distributed.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is a kind of flow diagram for data compression method that this specification embodiment provides;
Fig. 2 is a kind of schematic illustration for data compression method that this specification embodiment provides;
Fig. 3 is the schematic diagram for the coding method based on numerical prediction that this specification embodiment provides;
Fig. 4 is one of the coding method based on numerical prediction that this specification embodiment provides Numerical Predicting Method Schematic diagram;
Fig. 5 is another Numerical Predicting Method in the coding method based on numerical prediction that this specification embodiment provides Schematic diagram;
Fig. 6 is the signal of the coding method based on numerical prediction for two fallout predictors of use that this specification embodiment provides Figure;
Fig. 7 is the schematic diagram for the method that the first array is generated according to original series that this specification embodiment provides;
Fig. 8 is the flow diagram that a kind of data that this specification embodiment provides press decompressing method;
Fig. 9 is a kind of structural schematic diagram for data compression device that this specification embodiment provides;
Figure 10 is a kind of structural schematic diagram for data decompressor that this specification embodiment provides;
Figure 11 is a kind of equipment corresponding to data compression method and data decompressing method that this specification embodiment provides Structural schematic diagram.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall in the protection scope of this application.
In the compression schemes of categorical datas such as traditional long type, double type or float type, elongated volume can be directly used The mode of code is compressed, and the method is uneven for the Distribution value of the categorical datas sequences such as long type, double type or float type Even situation, can bring instead negative optimization (when the value of the categorical datas such as long type, double type or float type is bigger, Byte number > 8B of compression);Also, conventional compression schemes are bad for the compression effectiveness for the sequence that data are upset at random.In view of This, need to propose a kind of data more effective for types such as long type, double type or float types, not by data The data compression method that Distribution value and/or randomness influence.
For example, figure engine Sharkgraph is capable of providing the figure storage of vast capacity in a specific application scenarios It is calculated with figure, and supports timing diagram, the point side attribute in Sharkgraph can be stored according to column, wherein point side attribute In the data comprising a large amount of such as long/double types.The memory space that such a chart database occupies is usually very Greatly, therefore, it is necessary to propose a kind of effective data compression scheme, to cope with the calculating of super large figure, preferably to utilize storage Space.
Below in conjunction with attached drawing, the technical scheme provided by various embodiments of the present application will be described in detail.
Fig. 1 is a kind of flow diagram for data compression method that this specification embodiment provides.For program angle, The executing subject of process can be to be equipped on the program or application client of application server.
Referring to Fig.1, which may comprise steps of:
S110: according to original series, the first array is generated, the element in first array is different, and includes institute State whole numerical value in original series.
Wherein, original series can be the sequence of arbitrary data types, it is preferable that can be long type, double type or The sequence of the data for the more-figure number that float type etc. is 32 or 64.That is, the data pressure that embodiment of the disclosure provides Contracting method can be adapted for the data sequence of any data type, wherein for classes such as long type, double type or float types The compression effectiveness of the data sequence of type is especially prominent.
According to embodiment, the first array can be generated by the method for sequence duplicate removal according to original series, so that the Include all mutually different elements in original series in one array.Can be generated using any existing De-weight method One array, the element in the first array obtained according to different methods can have consistent with the element in original series Sequentially, or it can have random order.Specifically, for example, if in original series comprising [100,400,200,400,500, 100,500], the order of elements in the first array can be consistent with original series, that is, can be [100,400,200,500];Or Order of elements in the first array of person can be it is random, for example, can be [400,200,100,500] etc..
For example, the first array can be obtained by distinct sentence, may include in the first obtained array original Mutually different element in sequence, wherein in the first array the sequence of element can with respective element in original series for the first time The sequence consensus of appearance.The data type of element in first array can be with the data type phase of the element in original series Together.
S120: according to the original series, the second array is generated, the element in second array is the original series In location index of the element in first array.
Wherein, the element in the second array is index value, specifically the element in original series is in the first array Location index.Since the amount of the element in original series is limited, so in the second array of the set as location index Element numberical range it is limited, and the numerical value of the element in the second array is usually relatively small, is highly suitable for using change Long codes method realizes effective compression.The data type of element in second array can be with the element in original series Data type is different, for example, the element in the second array can be when the data type in original series is Long type 32int type data.
According to embodiment, according to different array generation methods, the first array is generated according to original series and according to original Sequence generates two steps of the second array and can successively carry out, and can also carry out simultaneously, the application does not limit this.
S130: first array is encoded using the coding method based on numerical prediction, obtains the first coding knot Fruit.
Specifically, Numerical Predicting Method may include such as Last value fallout predictor (Last Value Predictor), step Long fallout predictor (Stride Predictor), finite context prediction technique (Finite Context Method, FCM), difference The Numerical Predicting Method of finite context prediction technique (Differential Finite Context Method, DFCM) etc., But it is not limited to cited herein.
Specifically, it based on the coding method of Numerical Predicting Method, refers to and is predicted by treating code element, obtained The difference is carried out code storage, that is, realizes the coding to element by the difference between predicted value and element to be encoded.
According to embodiment, after obtaining the first coding result, the first coding result can be carried out in the form of byte arrays Storage.
S140: second array is encoded using variable length encoding method, obtains the second coding result.
According to embodiment, variable length encoding method, that is, elongated byte coding method is to same type of data using not The coding method indicated with the byte data of length, can advantageously reduce the total amount of byte of data, to reduce occupancy Memory space.The method of variable-length encoding includes the elongated volume based on high digit separator of UTF-8, Base 128Varints etc. Code method and the variable length encoding method that distinguishes is removed using agency.
According to embodiment, after obtaining the second coding result, the second coding result can be carried out in the form of byte arrays Storage.According to embodiment, when the element in the second array is 32int type data, before coding, each data occupy 4 bytes; After coding, each data can occupy 1 to 4 byte.
S150: according to default mapping relations, first coding result and second coding result are stored.
Specifically, first coding structure and second coding result together constitute the compression number of original series According to.Wherein, the mapping relations include, and the element representation in second coding result is element in original series Location index in one coding result.
It may include compression in the coding result comprising the first coding result and the second coding result according to embodiment Head can store the default mapping relations in the compression head.Optionally, the first coding can also be stored in the compression head As a result with the storage organization relevant information of the second coding result.
After the completion of the first array and the second array are encoded, corresponding first coding knot can be stored according to byte Fruit and the second coding result, that is, the first coding result and the second coding result are stored as byte arrays.Here, storage medium It is not specifically limited.
Fig. 2 is a kind of schematic illustration for data compression method that this specification embodiment provides.
In conjunction with Fig. 1 and Fig. 2 it is found that the disclosure above mentioned embodiment provide a kind of data compression method, creatively lead to It crosses and converts two arrays for original series to compress to sequence, and carried out according to the feature of the different arrays after conversion Specified compression.The compression effectiveness of the compression method of the embodiment of the present disclosure is unrelated with Distribution value, may be implemented for any number The compression of the sequence of values of range, even with the sequence that numerical values recited is unevenly distributed, compression effectiveness is also very good, obtain compared with Big data compression rate;When sequence intermediate value is densely distributed, or when having many identical values, compression effectiveness is extremely prominent, tool There is bigger compression ratio.It can reduce the text for needing to transmit or store by the data compression method that embodiment of the disclosure provides The total amount of byte of part, thus the utilization realized the disk occupied space of data faster transmitted, reduce file, improve memory space Rate.
It describes in detail below in conjunction with specific implementation step of the attached drawing to the embodiment of the present disclosure.
It is described according to the original series according to embodiment, the second array (step S120) is generated, can specifically include: Read the element in original series;Element identical with the value of the element is searched in the first array;Obtain first number Subscript value of the element in first array in group, stores to the second array.The method for generating the second array is unlimited In this method described herein.
It is described that first array is encoded using the coding method based on numerical prediction according to embodiment, it obtains First coding result (step S130), can specifically include: the element in first array predicted using fallout predictor, Obtain predicted value;XOR operation is executed with the true value for being predicted element to the predicted value, obtains exclusive or result;To described different Or result is encoded, and the first coding result is obtained.
According to embodiment, if the predicted value of element and the value of element are close, the predicted value and element of that identical element element Described value several high positions it is identical, then their exclusive or result can have several leading zeroes, it is thus possible to by exclusive or As a result several leading zeroes are compressed.Obviously, predicted value and the true value of element are closer, then the leading zero in exclusive or result It is more, then exclusive or result can be carried out to the compression of bigger degree.
Fig. 3 is the schematic diagram for the coding method based on numerical prediction that this specification embodiment provides.
It is in step s 130, described that the exclusive or result is encoded according to optional embodiment, obtain the first coding As a result, can specifically include: being encoded to the leading zero of the exclusive or result, obtain the first coding section;By the exclusive or knot Part in fruit in addition to the leading zero is determined as the second coding section;According to first coding section and second coding section Obtain the first coding result.
It optionally, can be successively including the first coding section and the second coding section of each element, specifically in the first coding result Ground, sequentially can include by initial position to end position in the first coding result the first element the first coding section, first Second coding section of element, the first coding section of second element, second element the second coding section ..., the n-th element first Second coding section of coding section, the n-th element.
Optionally, in the first coding result can with every two element be one group come include the first coding section and second coding Section specifically, in the first coding result sequentially can include the first coding of the first element by initial position to end position Section, the first coding section of second element, the second coding section of the first element, second element the second coding section ..., (n-1)th First coding section of element, the first coding section of the n-th element, the second coding section of the (n-1)th element, the n-th element second coding Section.Wherein, the first coding section of the first element and the first coding section of second element can be stored in a byte.It needs Bright, in the first coding result, the first coding section of each element and the storage mode of the second coding section are not limited to above two Method, but can be with any reasonable sequential storage.
Specifically, referring to Fig. 3, data DnCode compression method may include: fallout predictor generate predicted value Predn;It will be to Compressed value DnWith predicted value PrednXOR operation is done, exclusive or result Diff is obtainednIf, wherein predicted value PrednWith value to be compressed DnIt approaches, then exclusive or result DiffnInclude several leading zeroes;The exclusive or result Diff is encoded by leading zero count valuenIn Several leading zeroes;Obtain coding result LZCnBitsn.Wherein, with the first coding section LZCnIndicate leading zero count value; With the second coding section BitsnIndicate remaining bit.According to embodiment, the bit (for example, 4 bit) that preset quantity can be used carrys out table Show the first coding section LZCn
Correspondingly, when carrying out data decompression, by compressed data LZCnBitsnDecompression is reduced to DnMethod may include: Read the first coding section LZC of the bit (for example, 4 bit) of preset quantitynWith corresponding residue bit (for example, 64 subtract LZCn In bit number shared by the leading zero that reads out) the second coding section Bitsn, obtain Diffn;It is identical when using with data compression Predicted value PrednWith DiffnXOR operation is done, decompression data D is obtainedn
When including not only leading zero in exclusive or result, when further including tail portion zero, can simultaneously to leading zero and tail portion zero into Row coding compression.
It is in step s 130, described that the exclusive or result is encoded according to another optional embodiment, obtain first Coding result specifically can also include: to encode to the leading zero of the exclusive or result, obtain the first coding section;To described The tail portion zero of exclusive or result is encoded, and the second coding section is obtained;The leading zero and the tail will be removed in the exclusive or result Part except portion zero is determined as third coding section;It is compiled according to first coding section, second coding section and the third Code section obtains the first coding result, wherein successively includes first coding section, third volume in first coding result Code section and second coding section.
It optionally, can successively include the first coding section, the second coding section and the third of each element in the first coding result Coding section sequentially can include specifically the first of the first element by initial position to end position in the first coding result Coding section, the second coding section of the first element, the third coding section of the first element, the first coding section of second element, second yuan Element the second coding section, second element third coding section ..., the first coding section of the n-th element, the n-th element second compile The third coding section of code section, the n-th element.
Optionally, in the first coding result can with every two element be one group come include the first coding section, second coding Section and third coding section sequentially can include specifically first yuan by initial position to end position in the first coding result Element the first coding section, the first coding section of second element, the second coding section of the first element, second element second coding Section, the third coding section of the first element, second element third coding section ..., the first coding section of the (n-1)th element, n-th yuan Element the first coding section, the second coding section of the (n-1)th element, the second coding section of the n-th element, the (n-1)th element third coding The third coding section of section, the n-th element.Wherein, the first coding section of the first element and the first coding section of second element can be protected There are in a byte, the third coding section of the first element and the third coding section of second element can be stored in a byte In.It should be noted that in the first coding result, the storage of the first coding section, the second coding section, third coding section of each element Mode is not limited to above two method, but can be with any reasonable sequential storage.
Specifically, to data DnCode compression method may include: fallout predictor generate predicted value Predn;By value to be compressed DnWith predicted value PrednXOR operation is done, exclusive or result Diff is obtainedn, wherein exclusive or result DiffnComprising n1 leading zero and N2 tail portion zero;The n1 leading zero is encoded by leading zero count value, and is encoded by tail portion count value of zero described N2 tail portion zero;Obtain coding result LZCnBitsnTZCn.Wherein, in coding result, with the first coding section LZCnIndicate leading Count value of zero, with the second coding section TZCnIndicate tail portion count value of zero;With third coding section BitsnIndicate remaining bit.According to implementation Example, can be used the bit (for example, 4 bit) of the first preset quantity to indicate the first coding section LZCn, it is pre- to can be used second If the bit (for example, 4 bit) of quantity indicates the second coding section TZCn
According to embodiment, in practical applications, the pressure of leading zero and/or tail portion zero can be carried out according to actual needs Contracting, to realize that data are farthest compressed, to obtain maximum compression ratio.
Fig. 4 is one of the coding method based on numerical prediction that this specification embodiment provides Numerical Predicting Method Schematic diagram.
According to optional embodiment, in step s 130, the element in first array is predicted, is obtained Predicted value can specifically include: the sequence constituted based on elements several before element to be predicted searches corresponding history value sequence Column;Based on the history value sequence, corresponding predicted value is obtained.
Specifically, the sequence that can be constituted according to (for example, first 3) elements several before element to be predicted, in history It is worth lookup and the consistent history value sequence of the sequence in table, and using the history value sequence found as index, in predicted value table It is middle to search corresponding predicted value.If do not find with the consistent history value sequence of current sequence, for current element to be predicted Assign an initial predicted value.
More specifically, the coding method based on numerical prediction is specifically as follows the numerical value based on context referring to Fig. 4 Prediction technique, wherein context can be the history value that several adjacent elements are constituted.It include processed number in history value table According to sequence, for example, may include the number that continuous element processed in array to be compressed is constituted in the data sequence According to sequence.Wherein, the quantity of the data in data sequence can according to need to set, this numerical value is arranged bigger, prediction As a result more accurate, but predict that time-consuming is more, for example, the quantity can be set to 3.Wherein, include in predicted value table is The numerical value occurred after the data sequence in history value table.When being predicted, searches and work as in history value table first The corresponding history value sequence of the adjacent element of preceding element;Then it is index with the history value sequence, is being predicted by hash function Predicted value is searched in value table.
According to embodiment, after being predicted known to the value of element, the numerical value in fallout predictor can be updated.It specifically, can be with The true value of element is updated in predicted value table, and can true value according to element and old history value, carry out the more new calendar History value table.
Fig. 5 is another Numerical Predicting Method in the coding method based on numerical prediction that this specification embodiment provides Schematic diagram.
According to another optional embodiment, the element in first array is predicted, obtains predicted value, tool Body may include: the sequence of differences based on elements several before element to be predicted, search corresponding history sequence of differences;Based on institute History sequence of differences is stated, corresponding prediction difference is obtained;Previous element and the prediction difference based on the element to be predicted, Obtain the predicted value of the element to be predicted.
Specifically, it can be obtained in several described preceding elements according to several elements before element to be predicted per adjacent The sequence of differences that difference between two elements is constituted;It is searched and the consistent history difference of the sequence of differences in history difference table Sequence;And using the history sequence of differences found as index, corresponding prediction difference is searched in prediction difference table.If not Find with the consistent history sequence of differences of current difference sequence, then be that current difference to be predicted assigns an initial pre- error of measurement Value.
More specifically, history value table can store numerical value D to be predicted referring to Fig. 5nPrevious numerical value Dn-1And history is poor Value sequence, the history sequence of differences are the sequences that the difference between the adjacent element of processed data is constituted.Wherein, The history sequence of differences can be, for example, the sequence that three differences between continuous four data are constituted.Pre- error of measurement It may include the difference occurred after the history sequence of differences in history difference table in value table.When being predicted, for example, can To be primarily based on current value DnFirst four element between difference constitute sequence of differences, in history value table search and should The consistent history sequence of differences of sequence of differences, for example, (delta1, delta2, delta3);Then it is with the history sequence of differences Index, finds prediction difference in prediction difference table by hash function, for example, dpre;Again by previous numerical value Dn-1With it is described pre- Error of measurement value dpre is added, and obtains predicted value Predn=Dn-1+dpre。
According to embodiment, after learning the true value for being predicted element, the numerical value in fallout predictor can be updated. Specifically, the true value for being predicted element can be updated in history value table;It can calculate when the true value for being predicted element Difference between the value of previous element, as new difference, then, by the new Difference Storage into prediction difference table, And according to the new difference and old history difference come more new historical difference table.
Fig. 4 and Fig. 5 respectively describes through different fallout predictors the process for carrying out numerical prediction, in the application, can be with Predicted value is obtained according to actual needs to use any fallout predictor described in Fig. 4 or Fig. 5.According to the implementation of the disclosure Example, in order to improve the accuracy of prediction, improve predicted velocity, and improve compression ratio etc., it can be pre- using more than one simultaneously Survey device.
Fig. 6 is the signal of the coding method based on numerical prediction for two fallout predictors of use that this specification embodiment provides Figure.
Specifically, referring to Fig. 6, the described of step S130 uses the coding method based on numerical prediction to first array Encoded, obtain the first coding result, can specifically include: using the first fallout predictor to the element in first array into Row prediction, obtains the first predicted value;The element in first array is predicted using the second fallout predictor, it is pre- to obtain second Measured value;XOR operation is executed with the true value for being predicted element to first predicted value, obtains the first exclusive or result;To described Second predicted value executes XOR operation with the true value for being predicted element, obtains the second exclusive or result;Compare first exclusive or As a result the leading zero quantity of leading zero quantity and the second exclusive or result, using the result more than leading zero quantity as final different Or result;The final exclusive or result is encoded, the first coding result is obtained.
According to embodiment, the comparison of comparison and the second exclusive or result to the first exclusive or result, including in exclusive or result The quantity of leading zero and/or the quantity of tail portion zero compare.For example, the quantity of leading zero can be compared, and by leading zero Quantity more than exclusive or as a result, as preferred final exclusive or result.In another example the quantity of tail portion zero can be compared, and by tail Exclusive or more than zero quantity of portion is as a result, as preferred final exclusive or result.For another example the total of leading zero and tail portion zero can be compared Quantity, by the biggish exclusive or of the sum of the quantity of leading zero and tail portion zero as a result, preferably final exclusive or result.Select preferred exclusive or As a result mode is without being limited thereto.
Optionally, the first fallout predictor and the second fallout predictor can be identical or different, for example, can separately select such as Fallout predictor shown in Fig. 4 or Fig. 5, the fallout predictor in addition to also can choose the fallout predictor shown in Fig. 4 and Fig. 5.
For step S140, illustrate by taking this variable length encoding method of Base 128Varints as an example below, to the second number The detailed process that group is encoded.
One or more bytes can be used to indicate integer data in Base 128Varints coding method.In Base Highest order (msb) is obtained in the coded strings of 128Varints, in each byte and is used as flag bit, if the position is 1, indicates next byte Current value is indicated jointly with current byte, if the position is 0, indicates that current byte is the last byte of current value;Word Remaining seven in section will be encoded due to being used only 7 for storing data in itself, and at most represent less than 27 powers Numerical value, that is, 0~127.For the numerical value of 7 powers less than 2,2 byte representations can be used;For 14 powers less than 2 2 byte representations can be used in numerical value;For the numerical value of 21 powers less than 2,3 byte representations can be used;For being less than 4 byte representations can be used in the numerical value of 2 28 powers.
Illustrate by taking int32 type array [1,300,1024] as an example below.
For ten's digit 1, binary system (4 bytes) is expressed as 00,000,000 00,000,000 00000000 00000001.A byte is only taken up after variable-length encoding to be indicated, such as: 00000001.1 byte can be used in number 1 It indicates.
For ten's digit 300, binary system (4 bytes) is expressed as 00,000,000 00,000,000 00000001 00101100.Cataloged procedure is as follows: (1) since binary system low byte, every 7 are split, and obtain 0000010 0101100;(2) inverted order obtains 0,101,100 0000010;(3) increase identifier, obtain coded strings 10101100 00000010.That is, 2 bytes can be used to indicate in number 300.Correspondingly, it is solved when to coded strings 10,101,100 00000010 Code when: (1) highest order of the 1st byte is 1, continuation read backward, the highest order of the 2nd byte is 0, show current value by 2 byte representations;(2) low 7 0,101,100 0000010 of two bytes are taken;(3) inverted order, 0,000,010 0101100, i.e., ten into The 300 of system.
For ten's digit 1024, binary system (4 bytes) is expressed as 00,000,000 00,000,000 00000100 00000000.Cataloged procedure is as follows: (1) since binary system low byte, every 7 are split, and obtain 0001000 0000000;(2) inverted order obtains 0,000,000 0001000;(3) increase identifier, obtain coded strings 10000000 00001000.2 bytes can be used to indicate in number 1024.
For the initial array [1,300,1024] of int32 type block code, it is 12 that byte number is occupied before encoding, by becoming The byte number occupied after long codes is 5.Variable-length encoding significantly reduces the memory headroom of storage of array occupancy.
As can be seen that variable-length encoding compression is more obvious the compression effectiveness of smaller numerical value.Implementation for the disclosure , what is stored in the second array is location index value, in array to be compressed in the case where the limited amount of element, location index The limited and generally smaller numerical value of the numberical range of value, therefore, the advantage that variable-length encoding can be made full use of to be compressed, Byte number can be effectively reduced, significant compression effectiveness is obtained.
According to foregoing description, present disclose provides a kind of data compression methods, and the data compression method is first by original sequence Column are converted into two arrays, and according to the data characteristics of two arrays, pointedly select respectively different compression methods come into Row compression obtains maximum compression ratio to realize the utmostly compression to original series.In this course, it uses Compression method based on numerical prediction, this compression method can be adapted for the data of random order arrangement, but, for data More chaotic array is arranged, time complexity and the operation complexity for carrying out the process of numerical prediction are higher.In consideration of it, the disclosure Provide a kind of following optional data compression method, wherein first array to be encoded is ranked up, then uses base again It is encoded in the method for numerical prediction.Using the compression method based on numerical prediction for using sequence, numerical value can be not only reduced The time complexity and operation complexity of prediction process, also can be further improved compression ratio.
Fig. 7 is the schematic diagram for the method that the first array is generated according to original series that this specification embodiment provides.
In another embodiment of the present disclosure, referring to Fig. 7, according to original series, the step of generating the first array, specifically may be used To include: S111, the different element of value in original series is extracted, the first preparation array is generated;S112, by first preparation Element in array sorts according to predetermined ordering rule, generates the first array.
Specifically, according to embodiment, the predetermined ordering rule can be according to value sequence from big to small, value from it is small to The various sequence rule set on demand such as big sequence, sequence, the sequence of tail portion numerical value from small to large of tail portion numerical value from big to small Then.The sequence of the first array is realized it is, for example, possible to use sort function, to realize according to value from big to small or from small to large Sequence sequence, such ranking results make the high-order similarity of adjacent numerical value higher, can be with when carrying out XOR operation Generate more leading zeroes.It realizes that the mode of sequence is without being limited thereto, but any applicable method in the prior art can be used To carry out the sequence of the first array.
According to embodiment, the relevance of numerical value adjacent in the first array can be made more by being ranked up to the first array Greatly, the more similitudes for excavating consecutive value, leading zero and/or the number of tail portion zero when carrying out XOR operation, in exclusive or result Amount is more, and the first array is compressed to a greater degree, and the byte number that the first compression result occupies is less, to reduce The memory space and/or the transmission time used that compressed data occupies.
In accordance with an embodiment of the present disclosure, above-mentioned data compression method is provided, the data compression method is not only for any The data sequence compression effect of Distribution value is obvious;And it is possible to the compression effectiveness of the sequence to random order distribution having had, That is, good compression ratio can also be obtained for the data sequence of unordered/out-of-order distribution.
A data compression according to the embodiment has been shown in particular in embodiment in order to more clearly describe the present invention below Example:
Original series:
[24,22,11,20,11,6,25,33,7,41,26,46,34,47,49,26,26,10,2,39,35,8,43,3, 28,29,22,1,5,38,36,42,44,29,24,30,0,15,34,6,14,31,38,34,44,15,17,5,19,41,23, 30,37,25,44,27,33,23,11,27,32,3,43,29,18,45,13,4,5,20,11,0,18,3,32,32,31,17, 0,30,7,26,47,30,20,46,35,10,19,18,43,11,29,5,6,39,33,31,14,23]
First array (after sequence):
[0,1,2,3,4,5,6,7,8,10,11,13,14,15,17,18,19,20,22,23,24,25,26,27,28, 29,30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,49]
First coding result:
[118B,1B,-1B,-1B,-1B,-10B,2B,110B,1B,1B,-17B,1B,-18B,1B,1B,-1B,102B, 2B,1B,-1B,-1B,-1B,-1B,-1B,-1B,-1B,-1B,-18B,1B,1B,-1B,-1B,-2B,1B,0B]
Second array:
[20,18,10,17,10,6,21,29,7,36,22,41,30,42,43,22,22,9,2,35,31,8,38,3, 24,25,18,1,5,34,32,37,39,25,20,26,0,13,30,6,12,27,34,30,39,13,14,5,16,36,19, 26,33,21,39,23,29,19,10,23,28,3,38,25,15,40,11,4,5,17,10,0,15,3,28,28,27,14, 0,26,7,22,42,26,17,41,31,9,16,15,38,10,25,5,6,35,29,27,12,19]
Second coding result:
[8B,20B,8B,18B,8B,10B,8B,17B,8B,10B,8B,6B,8B,21B,8B,29B,8B,7B,8B,36B, 8B,22B,8B,41B,8B,30B,8B,42B,8B,43B,8B,22B,8B,22B,8B,9B,8B,2B,8B,35B,8B,31B, 8B,8B,8B,38B,8B,3B,8B,24B,8B,25B,8B,18B,8B,1B,8B,5B,8B,34B,8B,32B,8B,37B,8B, 39B,8B,25B,8B,20B,8B,26B,8B,0B,8B,13B,8B,30B,8B,6B,8B,12B,8B,27B,8B,34B,8B, 30B,8B,39B,8B,13B,8B,14B,8B,5B,8B,16B,8B,36B,8B,19B,8B,26B,8B,33B,8B,21B,8B, 39B,8B,23B,8B,29B,8B,19B,8B,10B,8B,23B,8B,28B,8B,3B,8B,38B,8B,25B,8B,15B,8B, 40B,8B,11B,8B,4B,8B,5B,8B,17B,8B,10B,8B,0B,8B,15B,8B,3B,8B,28B,8B,28B,8B,27B, 8B,14B,8B,0B,8B,26B,8B,7B,8B,22B,8B,42B,8B,26B,8B,17B,8B,41B,8B,31B,8B,9B,8B, 16B,8B,15B,8B,38B,8B,10B,8B,25B,8B,5B,8B,6B,8B,35B,8B,29B,8B,27B,8B,12B,8B, 19B]
Below by taking the first array (for example, first array of Long type) after sorting as an example, to illustrate to carry out the first array Coding obtains the process of the first coding result:
Current first array to be encoded can be preset and only need to pay close attention to high 14bit's firstly the need of explanation Exclusive or as a result, namely be up to 14 leading zeroes, it is possible to the leading zero quantity is stored using 4 bit.In view of using 4 A bit places the coding result of leading zero, and a byte can place two digital leading zero coding results, therefore by two A digital one group is encoded.Cataloged procedure addressed below to data (0,1) in the first array.
(1) Long type array is converted into byte arrays [0x00,0x00,0x00,0x00], [0x00,0x00,0x00, 0x01],……]。
(2) numerical prediction is carried out using two fallout predictors, wherein the first fallout predictor can be using as shown in Figure 4 Prediction technique, initial prediction (predn) it is set as 0;Second fallout predictor can use prediction technique as shown in Figure 5, initially Predicted value (predn) it is set as 0.
By the initial prediction (for example, being 0) of first digit [0x00,0x00,0x00,0x00] and the first fallout predictor p1 XOR operation is done, exclusive or result diff 1d [0x00,0x00,0x00,0x00] is obtained;Meanwhile by first digit [0x00, 0x00,0x00,0x00] with the initial prediction of the second fallout predictor p2 (for example, being 0) XOR operation is done, obtain exclusive or result diff 2d[0x00,0x00,0x00,0x00].In the identical situation of two exclusive or results, it can choose the first fallout predictor p1's Exclusive or result is as final exclusive or result.
According to the final exclusive or result and be currently predicted value true value (being herein 0) Lai Gengxin the first fallout predictor p1 and Parameter in second fallout predictor p2.Specifically, the history value table and predicted value table in fallout predictor are updated respectively.For example, first is pre- The update mode for surveying the index of the predicted value table of device p1 can be with are as follows: ((lastIndex<<6) ^ (this updated value>>48)) (table.length-1), wherein " lastIndex " is the last time index of predicted value table, and " this updated value " is this Input value, " table.length " are the length of predicted value table.For example, the index of the predicted value table of the second fallout predictor p2 is more New paragon can be with are as follows: ((dfcmHash<<2) ^ ((trueValue-lastValue)>>40)) (table.length-1), In, " dfcmHash " is this index, and " trueValue " is the true value of this value of being predicted, and " lastValue " is upper one The true value of a value, " table.length " are the length of predicted value table.
(3) second digit [0x00,0x00,0x00,0x01] and the predicted value of updated first fallout predictor p1 are done XOR operation obtains exclusive or result diff 1e [0x00,0x00,0x00,0x01];Meanwhile by second digit [0x00, 0x00,0x00,0x01] with the initial prediction of the second fallout predictor p2 do XOR operation, obtain exclusive or result diff 2e [0x00, 0x00,0x00,0x01].In the identical situation of two exclusive or results, the exclusive or result of the first fallout predictor p1 can choose as most Whole exclusive or result.
According to the final exclusive or result and be currently predicted value true value (being herein 1) Lai Gengxin the first fallout predictor p1 and Parameter in second fallout predictor p2.
(4) the exclusive or result diff 1d [0x00,0x00,0x00,0x00] of first digit 0, the quantity of leading zero are 8, In data processing, it when leading zero quantity is greater than 4, carries out subtracting 1 processing, remembers zeroByteSize (0)=7;Second The exclusive or result diff 1e [0x00,0x00,0x00,0x01] of number 1, leading zero quantity are 7, note zeroByteSize (1)= 6.The leading zero of (0,1) is encoded, is stored using a byte, diff 1d is moved to left 4 and is put into high 4bit, by diff 1e is put into low 4bit: code |=zeroByteSize (0) < < 4 |: code |=zeroByteSize (1), that is, 118B.
(5) then, the non-leading null part of first digit and second digit is stored.Specifically, non-for number 0 Leading null part is without storage;For number 1, storage 0,000 00001, that is, 1B.
Therefore, the result after data (0,1) coding is stored as byte arrays [118B, 1B].Repetitive cycling above-mentioned steps, can To obtain the first coding result.
Foregoing provide data compression methods according to an embodiment of the present disclosure, are based on same thinking, and this specification is real It applies example and additionally provides data decompression method corresponding with above-mentioned data compression method.
Fig. 8 is the flow diagram that a kind of data that this specification embodiment provides press decompressing method.
According to embodiment, referring to Fig. 8, a kind of data decompression method the following steps are included:
S210: obtaining compressed data, and the compressed data includes the first compressed data with default mapping relations and the Two compressed datas;
S220: the data characteristics based on first compressed data, using the coding method based on numerical prediction to described The decompression of first compressed data, obtains the first array;
S230: the data characteristics based on first compressed data, according to default variable-length encoding rule to second pressure Contracting data decompression obtains the second array;
S240: being based on first array and second array, according to the default mapping relations, obtains decompression number According to.
It is described to be obtained based on first array and second array according to the default mapping relations according to embodiment It to decompression data, specifically includes: using the value of the element in second array as location index, reading in first array Element value;The value of element in first array read every time is stored, decompression array is obtained.
Described above is data compression methods and data decompressing method that the disclosure provides, are based on same thinking, this theory Bright book embodiment additionally provides the corresponding device of above-mentioned data compression method and device corresponding with above-mentioned data decompression method.
Fig. 9 is the structural schematic diagram for the data compression device that this specification embodiment provides.As shown in figure 9, the data pressure Compression apparatus may include:
Generation module 310, for generating the first array and the second array according to original series, the member in first array Element is different, and includes whole numerical value in the original series, and the element in second array is the original series In location index of the element in first array;
First coding module 320, for being encoded using the coding method based on numerical prediction to first array, Obtain the first coding result;
Second coding module 330 obtains the second volume for encoding using variable length encoding method to second array Code result;
Memory module 340, for storing first coding result and the second coding knot according to default mapping relations Fruit.
Figure 10 is a kind of structural schematic diagram for data decompressor that this specification embodiment provides.As shown in Figure 10, should Data decompressor may include:
Module 410 is obtained, for obtaining compressed data, the compressed data includes the first pressure with default mapping relations Contracting data and the second compressed data;
First data decompression module 420, for the data characteristics based on first compressed data, using pre- based on numerical value The coding method of survey decompresses first compressed data, obtains the first array;
Second data decompression module 430, for the data characteristics based on first compressed data, according to presetting elongated volume Code rule decompresses second compressed data, obtains the second array;
Data generation module 440 is closed for being based on first array and second array according to the default mapping System obtains decompression data.
Based on same thinking, this specification embodiment additionally provides above-mentioned data compression and decompressing method is corresponding sets It is standby.
Figure 11 is a kind of equipment corresponding to data compression method and data decompressing method that this specification embodiment provides Structural schematic diagram.As shown in figure 11, a kind of data processing equipment 500 may include:
At least one processor 510;And
The memory 530 being connect at least one described processor communication;Wherein,
The memory 530 is stored with the instruction 520 that can be executed by least one described processor 510, described instruction quilt Described at least one processor 510 executes so that at least one described processor 510 can:
According to original series, the first array is generated, the element in first array is different, and includes described original Whole numerical value in sequence;
According to the original series, the second array is generated, the element in second array is in the original series Location index of the element in first array;
First array is encoded using the coding method based on numerical prediction, obtains the first coding result;
Second array is encoded using variable length encoding method, obtains the second coding result;
According to default mapping relations, first coding result and second coding result are stored;
Alternatively,
So that at least one described processor 510 can:
Compressed data is obtained, the compressed data includes the first compressed data and the second compression with default mapping relations Data;
Based on the data characteristics of first compressed data, using the coding method based on numerical prediction to first pressure Contracting data decompression obtains the first array;
Based on the data characteristics of first compressed data, according to default variable-length encoding rule to second compressed data Decompression, obtains the second array;
Decompression data are obtained according to the default mapping relations based on first array and second array.
It is above-mentioned that this specification specific embodiment is described, in some cases, record in detail in the claims Movement or step can execute according to the sequence being different from embodiment and desired result still may be implemented.In addition, Process depicted in the drawing not necessarily requires the particular order shown or consecutive order and is just able to achieve desired result.At certain In a little embodiments, multitasking and parallel processing are also possible or may be advantageous.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For apparatus embodiments, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to method The part of embodiment illustrates.
This specification embodiment provide device, apparatus and method for be it is corresponding, therefore, device, equipment also have with it is right The similar advantageous effects of induction method, since the advantageous effects of method being described in detail above, The advantageous effects of which is not described herein again corresponding intrument, equipment.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims (14)

1. a kind of data compression method, comprising:
According to original series, the first array is generated, the element in first array is different, and includes the original series In whole numerical value;
According to the original series, the second array is generated, the element in second array is the element in the original series Location index in first array;
First array is encoded using the coding method based on numerical prediction, obtains the first coding result;
Second array is encoded using variable length encoding method, obtains the second coding result;
According to default mapping relations, first coding result and second coding result are stored.
2. the method as described in claim 1, described according to original series, the first array is generated, is specifically included:
The different element of value in original series is extracted, the first preparation array is generated;
Element in the first preparation array is sorted according to predetermined ordering rule, generates the first array.
3. the method as described in claim 1, described to be carried out using the coding method based on numerical prediction to first array Coding, obtains the first coding result, specifically includes:
The element in first array is predicted using fallout predictor, obtains predicted value;
XOR operation is executed with the true value for being predicted element to the predicted value, obtains exclusive or result;
The exclusive or result is encoded, the first coding result is obtained.
4. method as claimed in claim 3, the element in first array is predicted, predicted value is obtained, is had Body includes:
Based on the sequence that elements several before element to be predicted are constituted, corresponding history value sequence is searched;
Based on the history value sequence, corresponding predicted value is obtained.
5. method as claimed in claim 3, the element in first array is predicted, predicted value is obtained, is had Body includes:
Based on the sequence of differences of several elements before element to be predicted, corresponding history sequence of differences is searched;
Based on the history sequence of differences, corresponding prediction difference is obtained;
Previous element and the prediction difference based on the element to be predicted, obtain the predicted value of the element to be predicted.
6. method as claimed in claim 3, described to encode to the exclusive or result, the first coding result is obtained, specifically Include:
The leading zero of the exclusive or result is encoded, the first coding section is obtained;
Part in the exclusive or result in addition to the leading zero is determined as the second coding section;
The first coding result is obtained according to first coding section and second coding section.
7. method as claimed in claim 3, described to encode to the exclusive or result, the first coding result is obtained, specifically Include:
The leading zero of the exclusive or result is encoded, the first coding section is obtained;
The tail portion zero of the exclusive or result is encoded, the second coding section is obtained;
By the part in the exclusive or result in addition to the leading zero and the tail portion zero, it is determined as third coding section;
The first coding result is obtained according to first coding section, second coding section and the third coding section.
8. the method as described in claim 1, described to be carried out using the coding method based on numerical prediction to first array Coding, obtains the first coding result, specifically includes:
The element in first array is predicted using the first fallout predictor, obtains the first predicted value;
The element in first array is predicted using the second fallout predictor, obtains the second predicted value;
XOR operation is executed with the true value for being predicted element to first predicted value, obtains the first exclusive or result;
XOR operation is executed with the true value for being predicted element to second predicted value, obtains the second exclusive or result;
The leading zero quantity of the first exclusive or result and the leading zero quantity of the second exclusive or result are compared, by leading zero number The more result of amount is as final exclusive or result;
The final exclusive or result is encoded, the first coding result is obtained.
9. the method as described in claim 1, described to be encoded using variable length encoding method to second array, specific to wrap It includes:
Each element in second array is encoded using the variable length encoding method based on high digit separator.
10. a kind of data decompression method, comprising:
Compressed data is obtained, the compressed data includes the first compressed data and the second compression number with default mapping relations According to;
Based on the data characteristics of first compressed data, using the coding method based on numerical prediction to the first compression number According to decompression, the first array is obtained;
Based on the data characteristics of first compressed data, second compressed data is decompressed according to default variable-length encoding rule Contracting, obtains the second array;
Decompression data are obtained according to the default mapping relations based on first array and second array.
11. method as claimed in claim 10, described to be based on first array and second array, according to described default Mapping relations obtain decompression data, specifically include:
Using the value of the element in second array as location index, the value of the element in first array is read;
The value of element in first array read every time is stored, decompression array is obtained.
12. a kind of data compression device, comprising:
Generation module, for generating the first array and the second array according to original series, the element in first array is mutually not It is identical, and include whole numerical value in the original series, the element in second array is the member in the original series Location index of the element in first array;
First coding module obtains for being encoded to first array using the coding method based on numerical prediction One coding result;
Second coding module obtains the second coding result for encoding using variable length encoding method to second array;
Memory module, for storing first coding result and second coding result according to default mapping relations.
13. a kind of data decompressor, comprising:
Module is obtained, for obtaining compressed data, the compressed data includes the first compressed data with default mapping relations With the second compressed data;
First data decompression module, for the data characteristics based on first compressed data, using the volume based on numerical prediction Code method decompresses first compressed data, obtains the first array;
Second data decompression module, for the data characteristics based on first compressed data, according to default variable-length encoding rule Second compressed data is decompressed, the second array is obtained;
Data generation module, for being obtained based on first array and second array according to the default mapping relations Decompress data.
14. a kind of data processing equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one Manage device execute so that at least one described processor can:
According to original series, the first array is generated, the element in first array is different, and includes the original series In whole numerical value;
According to the original series, the second array is generated, the element in second array is the element in the original series Location index in first array;
First array is encoded using the coding method based on numerical prediction, obtains the first coding result;
Second array is encoded using variable length encoding method, obtains the second coding result;
According to default mapping relations, first coding result and second coding result are stored,
Alternatively,
So that at least one described processor can:
Compressed data is obtained, the compressed data includes the first compressed data and the second compression number with default mapping relations According to;
Based on the data characteristics of first compressed data, using the method based on numerical prediction to the first compressed data solution Compression, obtains the first array;
Based on the data characteristics of first compressed data, second compressed data is decompressed according to default variable-length encoding rule Contracting, obtains the second array;
Decompression data are obtained according to the default mapping relations based on first array and second array.
CN201910380862.4A 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment Active CN110266316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910380862.4A CN110266316B (en) 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910380862.4A CN110266316B (en) 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment

Publications (2)

Publication Number Publication Date
CN110266316A true CN110266316A (en) 2019-09-20
CN110266316B CN110266316B (en) 2023-02-21

Family

ID=67914392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910380862.4A Active CN110266316B (en) 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment

Country Status (1)

Country Link
CN (1) CN110266316B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995273A (en) * 2019-10-21 2020-04-10 武汉神库小匠科技有限公司 Data compression method, device, equipment and medium for power database
CN113539264A (en) * 2021-07-13 2021-10-22 广东金鸿星智能科技有限公司 Voice instruction data transmission method and system for voice-controlled electrically operated gate
CN113630124A (en) * 2021-08-10 2021-11-09 优刻得科技股份有限公司 Method, system, device and medium for processing time sequence integer data
CN113987556A (en) * 2021-12-24 2022-01-28 杭州趣链科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114610952A (en) * 2022-02-28 2022-06-10 广州鼎甲计算机科技有限公司 Effective data indexing method, system, device and storage medium
CN115118788A (en) * 2022-06-29 2022-09-27 北京中科心研科技有限公司 Time sequence data compression method and device, wearable intelligent device and storage medium
CN116861271A (en) * 2023-09-05 2023-10-10 智联信通科技股份有限公司 Data analysis processing method based on big data

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1182518A (en) * 1996-03-19 1998-05-20 三菱电机株式会社 Encoder, decoder, their method and image processor
US6757437B1 (en) * 1994-09-21 2004-06-29 Ricoh Co., Ltd. Compression/decompression using reversible embedded wavelets
CN1545813A (en) * 2002-04-26 2004-11-10 株式会社Ntt都科摩 Image encoding device, image decoding device, image encoding method, image decoding method, image encoding program, and image decoding program
CN101115208A (en) * 2006-07-27 2008-01-30 松下电器产业株式会社 Picture coding apparatus
CN103460209A (en) * 2011-04-11 2013-12-18 阿尔卡特朗讯公司 Method of encoding a data identifier
CN103581684A (en) * 2012-07-30 2014-02-12 英特尔公司 Compression encoding and decoding method and apparatus
US20140169476A1 (en) * 2011-06-06 2014-06-19 Canon Kabushiki Kaisha Method and Device for Encoding a Sequence of Images and Method and Device for Decoding a Sequence of Image
CN105451026A (en) * 2014-09-19 2016-03-30 想象技术有限公司 Data compression
CN106202172A (en) * 2016-06-24 2016-12-07 中国农业银行股份有限公司 Text compression methods and device
CN107547907A (en) * 2016-06-27 2018-01-05 华为技术有限公司 The method and apparatus of encoding and decoding
CN108366257A (en) * 2012-04-26 2018-08-03 索尼公司 data decoding apparatus and method, data encoding apparatus and method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6757437B1 (en) * 1994-09-21 2004-06-29 Ricoh Co., Ltd. Compression/decompression using reversible embedded wavelets
CN1182518A (en) * 1996-03-19 1998-05-20 三菱电机株式会社 Encoder, decoder, their method and image processor
CN1545813A (en) * 2002-04-26 2004-11-10 株式会社Ntt都科摩 Image encoding device, image decoding device, image encoding method, image decoding method, image encoding program, and image decoding program
CN101115208A (en) * 2006-07-27 2008-01-30 松下电器产业株式会社 Picture coding apparatus
CN103460209A (en) * 2011-04-11 2013-12-18 阿尔卡特朗讯公司 Method of encoding a data identifier
US20140169476A1 (en) * 2011-06-06 2014-06-19 Canon Kabushiki Kaisha Method and Device for Encoding a Sequence of Images and Method and Device for Decoding a Sequence of Image
CN108366257A (en) * 2012-04-26 2018-08-03 索尼公司 data decoding apparatus and method, data encoding apparatus and method
CN103581684A (en) * 2012-07-30 2014-02-12 英特尔公司 Compression encoding and decoding method and apparatus
CN105451026A (en) * 2014-09-19 2016-03-30 想象技术有限公司 Data compression
CN106202172A (en) * 2016-06-24 2016-12-07 中国农业银行股份有限公司 Text compression methods and device
CN107547907A (en) * 2016-06-27 2018-01-05 华为技术有限公司 The method and apparatus of encoding and decoding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘大鹏等: "流媒体技术中的数据压缩", 《河南大学学报(自然科学版)》 *
陈秋华: "一种内存读写数据压缩算法", 《中国集成电路》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995273A (en) * 2019-10-21 2020-04-10 武汉神库小匠科技有限公司 Data compression method, device, equipment and medium for power database
CN110995273B (en) * 2019-10-21 2023-04-07 武汉神库小匠科技有限公司 Data compression method, device, equipment and medium for power database
CN113539264A (en) * 2021-07-13 2021-10-22 广东金鸿星智能科技有限公司 Voice instruction data transmission method and system for voice-controlled electrically operated gate
CN113539264B (en) * 2021-07-13 2023-10-13 广东金鸿星智能科技有限公司 Voice command data transmission method and system for voice-controlled electric door
CN113630124A (en) * 2021-08-10 2021-11-09 优刻得科技股份有限公司 Method, system, device and medium for processing time sequence integer data
CN113630124B (en) * 2021-08-10 2023-08-08 优刻得科技股份有限公司 Method, system, equipment and medium for processing time sequence integer data
CN113987556A (en) * 2021-12-24 2022-01-28 杭州趣链科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114610952A (en) * 2022-02-28 2022-06-10 广州鼎甲计算机科技有限公司 Effective data indexing method, system, device and storage medium
CN115118788A (en) * 2022-06-29 2022-09-27 北京中科心研科技有限公司 Time sequence data compression method and device, wearable intelligent device and storage medium
CN115118788B (en) * 2022-06-29 2023-01-31 北京中科心研科技有限公司 Time sequence data compression method and device, wearable intelligent equipment and storage medium
CN116861271A (en) * 2023-09-05 2023-10-10 智联信通科技股份有限公司 Data analysis processing method based on big data
CN116861271B (en) * 2023-09-05 2023-12-08 智联信通科技股份有限公司 Data analysis processing method based on big data

Also Published As

Publication number Publication date
CN110266316B (en) 2023-02-21

Similar Documents

Publication Publication Date Title
CN110266316A (en) A kind of data compression, decompressing method, device and equipment
US9792310B2 (en) Run index compression
US10747737B2 (en) Altering data type of a column in a database
US8200915B2 (en) Management of very large streaming data sets for efficient writes and reads to and from persistent storage
CN113572479B (en) Method and system for generating finite state entropy coding table
CN110771161A (en) Digital perspective method
Kim et al. SBH: Super byte-aligned hybrid bitmap compression
JP5584203B2 (en) How to process numeric data
US10756758B1 (en) Length-limited huffman encoding
KR102589299B1 (en) Method and apparatus for vertex attribute compression and decompression in hardware
CN111078723B (en) Data processing method and device for block chain browser
Wang et al. A simplified variant of tabled asymmetric numeral systems with a smaller look-up table
CN103428502B (en) Decoding method and decoding system
CN116680269A (en) Time sequence data coding and compressing method, system, equipment and medium
CN103840835A (en) Data decompression method and device
US11669572B2 (en) Accelerated operations on compressed data stores
CN109478199A (en) The system and method for piecewise linear approximation
EP4256710A1 (en) Systems, methods and devices for exploiting value similarity in computer memories
Sun et al. SPLZ: An efficient algorithm for single source shortest path problem using compression method
Tran et al. Increasing the efficiency of GPU bitmap index query processing
CN110032565A (en) A kind of method, system and electronic equipment generating statistical information
JP2023503034A (en) Pattern-based cache block compression
Bae et al. Cache compression with Golomb-Rice code and quantization for convolutional neural networks
Velez et al. Improving bitmap execution performance using column-based metadata
CN115952859A (en) Data processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant