CN1728561A

CN1728561A - Positioning compress/decompress method of multiple sequences of number-local digital features

Info

Publication number: CN1728561A
Application number: CN 200410071153
Authority: CN
Inventors: 郭鹏
Original assignee: Individual
Current assignee: Individual
Priority date: 2004-07-30
Filing date: 2004-07-30
Publication date: 2006-02-01

Abstract

Through analyzing and recording features of data, the method records data so as to reduce space of storing data and to reach purpose of compressing data. Using specific method for representing data, the invention does not carry out probability calculation for features of data self. Thus, there are no disadvantages of current existed compression method in the invented method. The invention is able to carry out compression for data in any type. The invention does not carry out any weighting analysis for object to be compressed. Thus, compression is lossless compression in the invention.

Description

A kind of most row---local digital feature location compressing/decompressing method

Technical field the present invention relates to a kind of compression/decompression algorithm, a kind of specifically most row---local digital feature location compressing/decompressing method.

The proper data compression of background technology originates from the understanding of people to probability.When we encoded to Word message, if give short coding for the higher letter of probability of occurrence, for long coding given in the lower letter of probability of occurrence, total code length just can shorten many.So, compress an information, at first to analyze the probability of knowing that each symbol occurs in the information.Different condensing routines is determined the probability of occurrence of symbol by diverse ways, must be accurate more to the probability calculation of symbol, and the also easy more compression effectiveness that obtains just.In condensing routine, be used for handling input information, the module of the probability of compute sign and decision output which or which code is called model.

What use in the real condensing routine is the thing of a kind of crying " adaptive model " mostly.Adaptive model can be described as an automaton with learning functionality.He knew nothing and supposed the probability of occurrence equalization of each character to the information content before information is transfused to, along with character constantly is transfused to and encodes, the probability of the character that his statistics and record had occurred also is applied to coding to successive character with these probability.That is to say, adaptive model when beginning compression compression effectiveness unsatisfactory, but along with the carrying out of compression, he can be more and more near the exact value of character probabilities, and reach desirable compression effectiveness.Adaptive model can also adapt to the unexpected variation that character in the input information distributes, can adapt to that character in the different files distributes and not needs preserve probability tables.

Model above-mentioned can be referred to as " statistical model ", obtains character probabilities because they are based on statistics to each character occurrence number.Another big class model is called " dictionary model ".In fact, when we mentioned " industrial and commercial bank " this speech in life when, we knew that its meaning is meant " Industrial and Commercial Bank of China ", and analogous cases also have many, but common prerequisite is for we an abbreviation dictionary sanctified by usage are arranged all in the heart.The dictionary model also is like this, the probability that his not direct calculating character occurs, and be to use a dictionary, along with reading in of input information, model is found out the longest character string that input information mates in dictionary, export the index information of this character string in dictionary then.Mate longly more, compression effectiveness is good more.In fact, the dictionary model remains in essence based on to the calculating of character probabilities, and only, the dictionary model uses the coupling of whole character string to replace statistics to a certain character number of repetition.

So from essence, the purpose of data compression is exactly the redundancy in will elimination information.Present general condensing routine is all made under this guiding theory.But because the feature of various data is not quite similar, condensing routine can not be optimized by a pair of its, so, generally all can uses special-purpose tool of compression to compress just and can obtain compression effectiveness preferably for certain data.

Summary of the invention the purpose of this invention is to provide a kind of method that can compress, decompress all types of data.

Basic thought of the present invention is: because the form of expression the during data storing of any kind is a series of numeral, when this a succession of numeral is regarded as an integer, this a succession of numeral on number axis as people's fingerprint, has uniqueness, when therefore the feature of the numeral that this is a succession of is sought out, its character representation with it can be come out.In the present invention, when above-mentioned a series of numeral is considered as a positive integer, this positive integer shows as a point on number axis, the method of using the present invention to describe is sought the feature of this positive integer and is described, just can it be showed, so just can realize compression data with minimum amount of information.

The computing parameter interpretation that the present invention relates to is as follows:

Compression goal: the form of expression when compression goal stores is a series of numeral, and the present invention is considered as a positive integer with this a succession of numeral, is referred to as T.

The location ordered series of numbers: on single number axis, compression goal is carried out N (N is a positive integer) the bar ordered series of numbers of addition, subtraction operation, the first number of these ordered series of numbers is T or 0.The type of location ordered series of numbers is present known any ordered series of numbers.

The local digital feature: be meant the feature that the numeral on the ad-hoc location of T or character is considered as T, and numeral on this ad-hoc location or character are referred to as local digital information S, the position of local digital information S is referred to as local digital positional information R; Both are referred to as the local digital feature.

Coincide point: the location ordered series of numbers shows as some spots to the result that T carries out behind the addition, subtraction operation on number axis, if certain a bit belongs to above-mentioned N bar location ordered series of numbers simultaneously, then this is a coincide point.

Available point: if the feature of the numeral on the coincide point meets the local digital feature, then this coincide point is referred to as available point.

Effectively heavy and number of times Y: record is positioned at the number of times of available point appearance, the every appearance of available point once, Y+1, the initial amount of Y are 1.According to computing flow process of the present invention, T itself satisfies the condition of available point, so the initial amount of Y is 1.

Compression process of the present invention is as follows:

Regard compression goal as a positive integer T, use N location ordered series of numbers respectively with the T computing that adds deduct, operation result forms some spots on number axis, if certain in these points a bit be simultaneously above-mentioned N bar ordered series of numbers respectively with the add deduct result of computing of T, then this is a coincide point.Use the local digital feature that numeral on the particular number of bits that is positioned at the numeral on the coincide point or character are carried out the available point selection, if this coincide point is an available point, then effectively heavy and number of times Y+1;

Described N location ordered series of numbers adds deduct the operation result of computing less than 0 o'clock with T respectively, then finish the computing that adds deduct of this location ordered series of numbers and T, and write down the previous operation result of this operation result, with it as the add deduct end product of computing of location ordered series of numbers and T; When the result of the computing that adds deduct of described N bar location ordered series of numbers and T all less than 0 the time, stop the computing that adds deduct;

After all coincide points were all passed through available point selection, compression was finished, and record Y value, location ordered series of numbers, local digital feature, N bar are located ordered series of numbers and the T last operation result of computing that adds deduct;

During decompression, compression process is carried out complete inverse operation, can finish decompress(ion).

Use the present invention can produce following beneficial effect:

1. because the present invention only has been to use specific method that data are showed, does not relate to the feature of data itself is carried out probability calculation, therefore do not have the drawback of at present existing compression method.So the present invention can compress the data of any kind.

2. the present invention does not carry out any weight analysis to compression goal, and therefore compressing the compression of carrying out is lossless compress.

The present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.

Description of drawings Fig. 1 is a conceptual schematic view of the present invention; Among this figure, T represents in the number axis all from all numerals of 0 to T, rather than the positive integer of T representative itself.

Fig. 2 is a number axis schematic diagram of the present invention;

Fig. 3 is compression process figure of the present invention.

In the specific embodiment present embodiment, the location ordered series of numbers uses two arithmetic progression A and B, and the tolerance of A is d ₁, the d that the tolerance of B is ₂, A ₁=B ₁=0, d ₁＞0, d ₂＞0, d ₂=d ₁+ 1, A _n=A ₁-(n-1) * d ₁, B _M=B ₁-(m-1) * d ₂Local digital positional information R in local data's feature is the 1st, 2 and the 1st, 2 of mantissa of T, and local digital information S then is the numeral at 2 at 2 on head being positioned at T and end.

Explanation of nouns among the embodiment is as follows:

The end product of computing: T subtracts the operation result of computing respectively less than 0 o'clock to location ordered series of numbers A and B, then finish the subtract computing of T to this location ordered series of numbers, and write down the previous operation result of this operation result, it is subtracted the end product of computing to this location ordered series of numbers as T.

Fig. 1 is a conceptual schematic view of the present invention.Because the form of expression the during data storing of any kind is a series of numeral, when the present invention is considered as a positive integer T with this numeral, T shows on the number axis to be a point, the present invention uses two location ordered series of numbers A and B that T is folded to subtract, use local data's feature to select to the folded coincide point that subtracts among the result then, the number Y of the available point that selects is as compression parameters.

Fig. 2 is the folded number axis schematic diagram that subtracts the result of the present invention.Point on number axis A and the number axis B is folded the expression of numeral on number axis that subtracts the back generation for locating ordered series of numbers A and B to T, and wherein available point point produces after using the local digital feature that coincide point is selected.

Schematic flow sheet when Fig. 3 compresses for the present invention:

1. compression goal is considered as positive integer T;

2.T-A _nT-B _mT subtracts computing respectively, m and n operation independent to location ordered series of numbers A and B.

3. judge (T-A _n)-(T-B _m) the result, if less than 0, then enter step 4.1; If greater than 0, then enter step 4.2; If equal 0, then enter step 4.3.

4.1 n+0; M+1 enters step 5.1;

4.2 n+1; M+0; Enter step 5.2;

4.3 carry out available point selection---use the local digital feature to judge whether coincide point is available point, if Y+1 then, if not Y+0 then.So-called available point selection is uses local digital positional information R location to the numeral on the coincide point, and numeral and local digital information S behind the location are compared, if identical, promptly this coincide point is an available point.After finishing, judgement enters step 5.3.

5.1 judge T-A _nWhether greater than 0, if, then enter step 2, if not, step 6.1 then entered;

5.2 judge T-B _mWhether greater than 0, if, then enter step 2, if not, step 6.2 then entered;

5.3 n+1; M+1; Enter step 2.

6.1 T finishes the computing that subtracts of ordered series of numbers A, record subtracts the end product of computing;

6.2 T finishes the computing that subtracts of ordered series of numbers B, record subtracts the end product of computing;

As T during to the subtracting computing and be all over of location ordered series of numbers A and B, compression finishes, recording compressed parameter: Y, d ₁, d ₂, T-A _nEnd product, T-B _mEnd product and local digital feature.Above-mentioned compression parameters can be complete expression T, the memory space that so takies just can taper to very little magnitude, has reached the purpose of packed data thus.

When the data through the present invention compression were decompressed, the complete inverse process that uses the compression parameters of above-mentioned record that the decompress(ion) target is compressed can be with data decompression.

Further replenish as of the present invention, in order to make compression result of the present invention more accurate, the present invention has used the end product verification, and method of calibration is as follows:

T is divided exactly d ₁If remainder equal T location ordered series of numbers A subtracted the end product of computing, then T-A _nComputing correct;

T is divided exactly d ₂If remainder equal T location ordered series of numbers B subtracted the end product of computing, then T-B _mComputing correct.

Claims

1. the compression and the decompression method of most row-local digital feature location, it is characterized in that: regard compression goal as a positive integer T, use N location ordered series of numbers respectively with the T computing that adds deduct, operation result forms some spots on number axis, if certain in these points a bit belongs to above-mentioned N bar ordered series of numbers simultaneously respectively to the add deduct result of computing of T, then this is a coincide point, use the local digital feature that numeral on the particular number of bits that is positioned at the numeral on the coincide point or character are carried out the available point selection, if this coincide point is an available point, then effectively heavy and number of times Y+1;

After all coincide points were all passed through available point selection, compression was finished, and record Y value, location ordered series of numbers, local digital feature, all N bars are located the add deduct end product of computing of ordered series of numbers and T;

Decompression process is the complete inverse process of compression process.

2. the compression and the decompression method of a kind of most row-local digital feature location according to claim 1, it is characterized in that: the initial number of described location ordered series of numbers can be 0, also can be T.

3. the compression and the decompression method of a kind of most row-local digital feature location according to claim 1 and 2, it is characterized in that: described N is a positive integer.