WO2004051862A1

WO2004051862A1 - Automated method for compressing binary strings without information loss, and related automated method for decompressing

Info

Publication number: WO2004051862A1
Application number: PCT/IT2002/000761
Authority: WO
Inventors: Elena Leanza
Original assignee: Atop Innovation S.P.A.
Priority date: 2002-12-04
Filing date: 2002-12-04
Publication date: 2004-06-17
Also published as: AU2002368407A1

Abstract

The invention concerns an automated method for compressing without information loss, or lossless compression method, an input binary string S into an output binary string E, characterised in that it constructs a window F of length N for scanning the input binary string S, verifies whether within the window F at least one bit of value equal to b0 is present, and, in the positive, it updates the output binary string E by juxtaposing one bit of value equal to e0 and a binary address of the first bit of value equal to b0 within the window F, while in the negative it updates the output binary E by juxtaposing one bit of value equal to e1. The invention further concerns the related automated lossless decompression method. Moreover, the invention concerns the instruments necessary to perform the automated methods and to the apparatus performing the same.

Description

AUTOMATED METHOD FOR COMPRESSING BINARY STRINGS WITHOUT INFORMATION LOSS, AND RELATED AUTOMATED

METHOD FOR DECOMPRESSING.

* * * * * * * The present invention concerns an automated method for compressing binary strings without information loss, or lossless compression method, and related automated method for decompressing, which makes possible, in a simple, fast and inexpensively producible way, to minimise in a lossless way the dimension of binary strings, being particularly advantageous in the case when the binary strings have a predominant presence of values "1" or of values "0".

The present invention further concerns the instruments necessary to perform the automated methods and to the apparatus performing the same. It is known that one of the main problems of computer technology lies in the even increasing amount of data needed for processing. This causes a heavy occupation of computer memories and a consequent long occupation of telecommunications networks during information transmission, which involves higher transmission error probability. In order to solve such problem, many solutions have been developed, providing for a compression of data, represented in binary format, according to suitable encoding methods so as to reduce the memory occupation. Generally, such encoding methods are particularly advantageous when applied to specific types of data to be encoded, having specific characteristics. In other words, each individual encoding method is more or less advantageous depending on if it is applied, for instance, to data representing stationary images or to data representing video images or to data representing sounds. The compression methods can be classified according to the congruence between decompressed data and original data, dividing the concerned compression methods into methods having no information loss or "lossless" methods, in the case when the data reconstructed or decompressed from the compressed data are identical to the original ones, and methods having information losses or "lossy" methods, in the case when the reconstructed data lose a portion of the original data information. Obviously lossless methods are preferable and, in some specific applications when all the information must be kept, are the only ones that may be used. Examples of lossless methods are the Huffman encoding method and the Lempel-Ziv-Welch, or LZW, encoding method. Such methods have some drawbacks.

For instance, the Huffman method operates only on static strings, creating a corresponding dictionary for each string, while the LZW method operates on dynamic strings, creating a dynamically updated dictionary. Consequently, both said methods are complex and their computer execution is slow.

In the field of data compression, the search of improved encoding methods aimed at even more compressing the data to be stored and/or transmitted is being even more developed.

In particular, in many presently widely used applications, information is represented by number strings, in binary format, having the characteristic to be "rarefied", that is in which there is a predominant presence of bits of value "1" (or otherwise of value "0") with respect to the bits of value "0" (or otherwise of value "1"). In such kind of applications, it is further important to generate compressed data strings which maintain all the information content of original data (lossless compression) and which are suitably organised so as to be possibly further compressed by conventional compression methods.

In such case, specific compression methods have been developed, among which the method of Reed-Solomon offers the best performance.

However, as known to those skilled in the art, the Reed-Solomon method is particularly complex and, therefore, its computerised execution is slow.

It is an object of this invention, therefore, to provide an automated method for compressing binary strings without information loss, or lossless compression method, which is simple and fast executed by a computer, and which makes possible to minimise the number of bits needed to represent the number under consideration, particularly in the case when such binary strings have a predominant presence of values "1" or otherwise of values "0".

It is still object of this invention to provide such a compression method which generates compressed data strings which are suitably organised so as to be possibly further compressed by conventional compression methods.

It is specific subject matter of this invention an automated method for compressing, without information loss, or lossless compression method, an input string S having binary representation, comprising M bits, each bit assuming one of two binary values b₀ and bγ , into an output string E having binary representation, the bits of which assume one of two binary values e₀ and ei , the method comprising the following steps: A. initialising the output string E to the null, or void, string ""; B. initialising a pointer /^' to the address of the first bit of the input string S;

C. constructing a window F of length N, comprising one or more consecutive bits of the input string S, the first bit of the window F being the bit corresponding to the address of the pointer /;

D. verifying whether at least one bit of value equal to ό₀ is present within the window F

E. if the result of step D is positive:

E.1 updating the output string E by juxtaposing one bit of value equal to e₀ , and a binary address of the first bit of value equal to b_Q within the window F, and E.2 updating the pointer i to the address of the first bit of the string S following the first bit of value equal to b₀ of the window F;

F. if the result of step D is negative:

F.1 updating the output string E by juxtaposing one bit of value equal to βγ , and F.2 updating the pointer i to the address of the first bit of the string S following the last bit of the window F G. verifying whether the pointer / has a value higher than the one of the address of the -th bit of the string S; H. if the result of step G is negative, repeating steps C, D, E, F, e G.

Always according to the invention, step C may comprise the following sub-steps:

C.1 verifying that, starting from the bit corresponding to the address of the pointer i, the number Z of consecutive bits of the input string S is not lower than the length N of the window F;

C.2 if the result of sub-step C.1 is positive, constructing the window F with the N consecutive bits of the input string S starting from the bit corresponding to the address of the pointer /^'; C.3 if the result of sub-step C.1 is negative, constructing the window J by juxtaposing to the Z consecutive bits of the input string S, starting from the bit corresponding to the address of the pointer , (N-Z) bits of value equal to .

Still according to the invention, the method may comprise the following step: I. inserting in the output string E an information related to which of the two sub-steps C.2 or C.3 has been carried out. Furthermore according to the invention, the information inserted during step I may be the length M of the input string S. Preferably according to the invention, the length N of the window F is equal to an involution of 2:

N = 2" . Always according to the invention, the binary address of the first bit of value equal to b₀ within the window F, which is juxtaposed to the output string E, is equal to the distance of such bit from the first bit of the window F, that is from the bit corresponding to the address of the pointer i.

Still according to the invention, the method may be iterated at least once, so as to compress at each iteration the output string E obtained from the preceding iteration.

Preferably according to the invention, at each iteration the length N of the window F is not longer than the length used in the preceding iteration.

In a preferred embodiment of the invention, the binary value b₀ is equal to "1" and the binary value bγ is equal to "0". Furthermore according to the invention, step D may comprise the comparison of the numeric value of the window F with the value 0.

In an alternative embodiment of the invention, the binary value b₀ is equal to "0" and the binary value bγ is equal to "1".

Always according to the invention, the method may further comprise the following steps:

J. carrying out an inversion logic operation (logic NOT) on the string S to be compressed; K. inserting in the output compressed string E an information related to the inversion logic operation carried out during step J. In a preferred embodiment of the invention, the binary value e₀ is equal to "1" and the binary value ei is equal to "0".

In an alternative embodiment of the invention, the binary value e₀ is equal to "0" and the binary value e_λ is equal to "1".

It is further subject matter of this invention a computer, comprising processing means, characterised in that it is apt to execute the automated lossless compression method previously illustrated.

It is also subject matter of this invention a computer program characterised in that it comprises code means adapted to execute, when running on a computer, the automated lossless compression method previously illustrated.

It is still subject matter of this invention a Memory medium readable by a computer, storing a program, characterised in that the program is the computer program just described.

It is also subject matter of this invention an automated lossless method for decompressing an input string E having binary representation, the bits of which assume one of two binary values e₀ and ei , into an output string S having binary representation, each bit assuming one of two binary values b₀ and bγ , the method being characterised in that the input string E is obtained from the automated lossless compression method according to anyone of the claims from 1 to 15 and in that it comprises the following steps: L. initialising the output string S to the null string ""; M. initialising a pointer h to the address of the first bit of the input string E; N. verifying whether the bit of the input string E corresponding to the address of the pointer h is equal to the value e₀ ; O. if the result of step N is positive: P.1 reading from the input string £ a binary address represented by a first sub-string of n bits following the bit of the input string E corresponding to the address of the pointer h, P.2 updating the output string S by juxtaposing a second sub-string comprising a number of bits, ranging from 1 to N, depending on the value of said binary address, said second sub-string comprising only one bit of value equal to b₀ , placed in the last position of said second sub-string, and P.3 updating the pointer h to the address of the first bit of the input string E following said first sub-string; P. if the result of step Ν is negative:

Q.1 updating the output string S by juxtaposing a number N of bits of value equal to , and Q.2 updating the pointer h to the address of the first bit of the input string E following the bit corresponding to the address of the pointer h

R. verifying whether the pointer h has a value higher than the one of the address of the last bit of the input string E; S. if the result of step R is negative, repeating steps Ν, P, Q, e R.

It is further subject matter of this invention a computer, comprising processing means, characterised in that it is apt to execute the automated lossless decompression method just illustrated.

It is also subject matter of this invention a computer program characterised in that it comprises code means adapted to execute, when running on a computer, the automated lossless decompression method just illustrated.

It is still subject matter of this invention a Memory medium readable by a computer, storing a program, characterised in that the program is the computer program just described. The present invention will be now described, by way of illustration and not by way of limitation, according to its preferred embodiments, by particularly referring to the Figures of the enclosed drawings, in which:

Figures 1a-1m show an input string S to which a preferred embodiment of the automated method according to the invention is applied;

Figures 2a-2m show the output string E which is progressively constructed by the method applied to the string S of Figures 1a-1 m;

Figure 3 shows a flow chart of the preferred embodiment of the method applied to the string S of Figures 1a-1 m; Figure 4 shows a flow chart related to a first operation of the method of Figure 3;

Figure 5 shows a flow chart related to a second operation of the method of Figure 3;

Figures 6a-6p show an input string S to which a further preferred embodiment of the automated method according to the invention is applied;

Figures 7a-7p show the output string E which is progressively constructed by the method applied to the string S of Figures 6a-6p. With reference to Figures 1 a-1 m and to Figures 2a-2m, it is possible to observe that a preferred embodiment of the automated method according to the invention begins with a reading step to read the input string S, of length =19, to be compressed, an example of which is shown in Figure 1a. As shown in Figure 2a, the output compressed string E is initialised to the null, or void, string, "" and a pointer / is initialised to the address of the first bit of the string S.

As shown in Figure 1 , the method then constructs a window F of length N= 2, the first bit of which is the bit corresponding to the address of the pointer /^'. Since it is detected the presence of at least one bit of value equal to "1" within the window F (for instance by comparing the numeric value of F with the value 0), the method updates the compressed string E by juxtaposing to the same a flag of value "1" (which signals the presence of at least one bit of value equal to "1" within the window F) and the address of the first bit of value equal to "1" within the window F. In particular, since the length N of the window F is equal to 2, such address is represented by only one bit. Moreover, in the embodiment of the Figures, the address is equal to the distance of the first bit of value equal to "1" of the window F from the first bit of the window F (i.e. from the bit corresponding to the address of the pointer /). Therefore, as shown in Figure 2b, the method updates the compressed string E by juxtaposing one bit of value "1" (the flag of presence of at least one bit "1" within the window F) and one bit of value "0" (the address of the first bit of value "1" of the window F).

Afterwards, the method updates the pointer to the address of the first bit of the string S following the first bit of value "1" of the window F (which coincides with the first bit of the window F).

As shown in Figure 1c, the method then constructs a new window F starting from the bit corresponding to the address of the pointer /^' and it verifies whether at least one bit of value equal to "1" is present within the window F. In the case of Figure 1c such verification is negative and the method consequently updates the compressed string E by juxtaposing to the same a flag of value "0" (absence of bits of value equal to "1" within the window F), as shown in Figure 2c.

In such case, the method then updates the pointer /^' to the address of the first bit of the string S following the last bit of the window F and constructs a new window F (that is the window F is moved), as shown in Figure 1d. In the example of Figure 1d, the method updates la compressed string E by juxtaposing to the same a flag of value "0" (absence of bit of value equal to "1" within the window F), as shown in Figure 2d, and it updates the pointer /^' as shown in Figure 1e.

In the example of Figure 1e, the method detects the presence of at least one bit of value equal to "1" within the window F and, as shown in Figure 2e, it updates the compressed string E by juxtaposing to the same a flag of value "1" (presence of at least one bit of value equal to "1" within the window F) and the address of the first bit of value equal to "1" within the window F, such address being equal to the value "1" since it corresponds to the second bit of the window F. The method then updates the pointer / to the address of the first bit of the string S following the first bit of value "1" of the window F (which coincides with the second bit of the window F), as shown in Figure 1f.

In the example of Figure 1f, the method detects the presence of at least one bit of value equal to "1" within the window F and, as shown in Figure 2f, it updates the compressed string E by juxtaposing to the same a flag of value "1" (presence of at least one bit of value equal to "1" within the window F) and the address of the first bit of value equal to "1" within the window F, such address being equal to the value "0" since such first bit corresponds to the first bit of the window F (which has in this case both bits of value equal to "1"). The method then updates the pointer /^' to the address of the first bit of the string S following the first bit of value "1" of the window F (which coincides with the first bit of the window F), as shown in Figure 1g. On the basis of what described so far, the subsequent operations of the method shown in Figures 1g-1 m and in Figures 2g-2m are immediately understandable.

Purely by way of example, Figures 3-5 show the flow charts of the steps of the preferred embodiment of the method according to the invention, already described with reference to the previous Figures 1 a-1m

and 2a-2m. In particular, Figure 4 gives details of the flow chart related to

the operation for constructing the window F (represented in Figure 3 by

block 10), while Figure 5 gives details of the flow chart related to the

operation for updating the output compressed string E and the pointer /

(represented in Figure 3 by block 20). In the Figures 3-5, the symbol &

indicates the juxtaposition operator between strings, while the symbol z2

& a(t) t=z\ indicates the ordered juxtaposition operator of (z2- zl) bits a(t),

for t variable from zl to z2.

As it can be noticed in Figure 4, the operation for constructing the

window F preliminarily verifies that the remaining bits of the string S are in

a sufficient number to construct a new window F of length F, providing in

the negative for the padding of the last bits of the window F with a tail of

bits of value "0". In such case, the method may advantageously provide

for the insertion in the output compressed string E of an information related

to such padding of values "0", for instance by storing the actual length M

of the input string S.

Obviously the length N of the window F may be different from 2, and

may advantageously be equal to an involution of 2:

N = 2ⁿ

in such a way that the representation of the address of the first

bit of value equal to "1" within the window F can be represented with fully

significant n bits. By way of example, with reference to Figures 6a-6p and to Figures 7a-7p, it may be observed the preferred embodiment of the automated method according to the invention wherein the length N of the window F is equal to 2² = 4 , so that the address of the bits within the window F may be represented by n=2 bits. In Figures 6a-6p there are shown the input string S, the subsequent windows F constructed by the method, and the pointer / subsequently updated by the method. In Figures 7a-7p is shown the output compressed string E progressively constructed by the method. In the light of what previously described, such Figures are immediately understandable.

The method according to the invention may be iterative, preferably using at each iteration a length N of the window F not longer than the one used in the preceding iteration, even more preferably using a length N of the window F shorter than the one used in the preceding iteration. In the example 7p, the output compressed string E coincides with the string S shown in Figure 1 : therefore, in case the compressed string E is subjected to a further iteration of the method using a length N of the window F equal to 2, the output string shown in Figure 2m would be obtained.

The advantages offered by the application of the automated method according to the invention appear evident, particularly in the case when the string S to be compressed has a predominance of bits equal to "0" with respect to bits equal to "l". Moreover, in the case when the string S has a predominance of bits equal to "1", the method is similarly applicable by inverting the binary symbols between them with respect to what described so far, or, alternatively, by carrying out an inversion logic operation (logic NOT) on the string S to be compressed and inserting in the output compressed string E an information related to this operation, such as for example a flag that indicates whether said logic NOT has been carried out or not.

The preferred embodiments have been above described and some modifications of this invention have been suggested, but it should be understood that those skilled in the art can make other variations and changes, without so departing from the related scope of protection, as defined by the following claims.

Claims

CLAIMS 1. Automated method for compressing without information loss, or lossless compression method, an input string S having binary representation, comprising M bits, each bit assuming one of two binary values b₀ and bγ , into an output string E having binary representation, the bits of which assume one of two binary values e₀ and e_l , the method comprising the following steps:

A. initialising the output string E to the null, or void, string "";

B. initialising a pointer i to the address of the first bit of the input string S C. constructing a window F of length N comprising one or more consecutive bits of the input string S, the first bit of the window F being the bit corresponding to the address of the pointer /^'; D. verifying whether at least one bit of value equal to b₀ is present within the window F; E. if the result of step D is positive:

E.1 updating the output string E by juxtaposing one bit of value equal to e₀ , and a binary address of the first bit of value equal to b₀ within the window F, and E.2 updating the pointer /^' to the address of the first bit of the string S following the first bit of value equal to b₀ of the window F;

F. if the result of step D is negative:

F.1 updating the output string E by juxtaposing one bit of value equal to ei , and F.2 updating the pointer i to the address of the first bit of the string S following the last bit of the window F; G. verifying whether the pointer i has a value higher than the one of the address of the -th bit of the string S; H. if the result of step G is negative, repeating steps C, D, E, F, e G.

2. Method according to claim 1 , characterised in that step C comprises the following sub-steps:

C.1 verifying that, starting from the bit corresponding to the address of the pointer , the number Z of consecutive bits of the input string S is not lower than the length N of the window F; C.2 if the result of sub-step C.1 is positive, constructing the window

F with the N consecutive bits of the input string S starting from the bit corresponding to the address of the pointer /^'; C.3 if the result of sub-step C.1 is negative, constructing the window F by juxtaposing to the Z consecutive bits of the input string S, starting from the bit corresponding to the address of the pointer i, (N-Z) bits of value equal to bγ .

3. Method according to claim 2, characterised in that it further comprises the following step:

I. inserting in the output string E an information related to which of the two sub-steps C.2 or C.3 has been carried out.

4. Method according to claim 3, characterised in that the information inserted during step I is the length M of the input string S.

5. Method according to anyone of the preceding claims, characterised in that the length N of the window F is equal to an involution of 2:

N = 2^M 6. Method according to anyone of the preceding claims, characterised in that the binary address of the first bit of value equal to b₀ within the window F, which is juxtaposed to the output string E, is equal to the distance of such bit from the first bit of the window F, that is from the bit corresponding to the address of the pointer i.

7. Method according to anyone of the preceding claims, characterised in that it is iterated at least once, so as to compress at each iteration the output string E obtained from the preceding iteration.

8. Method according to claim 7, characterised in that at each iteration the length N of the window F is not longer than the length used in the preceding iteration.

9. Method according to claim 8, characterised in that at each iteration the length N of the window F is shorter than the length used in the preceding iteration.

10. Method according to anyone of the preceding claims, characterised in that the binary value b₀ is equal to "1" and the binary value bγ is equal to "0".

11. Method according to claim 10, characterised in that step D comprises the comparison of the numeric value of the window F with the value 0.

12. Method according to anyone of the claims from 1 to 9, characterised in that the binary value b₀ is equal to "0" and the binary value b is equal to "1".

13. Method according to anyone of the preceding claims, characterised in that it further comprises the following steps:

J. carrying out an inversion logic operation (logic NOT) on the string S to be compressed; K. inserting in the output compressed string E an information related to the inversion logic operation carried out during step J.

14. Method according to anyone of the preceding claims, characterised in that the binary value e₀ is equal to "1" and the binary value ei is equal to "0".

15. Method according to anyone of the claims from 1 to 13, characterised in that the binary value e₀ is equal to "0" and the binary value e_x is equal to "1".

16. Computer, comprising processing means, characterised in that it is apt to execute the automated lossless compression method according to anyone of the preceding claims from 1 to 15.

17. Computer program characterised in that it comprises code means adapted to execute, when running on a computer, the automated lossless compression method according to anyone of the preceding claims from 1 to 15.

18. Memory medium readable by a computer, storing a program, characterised in that the program is the computer program according to claim 17.

19. Automated lossless method for decompressing an input string E having binary representation, the bits of which assume one of two binary values e_Q and ei , into an output string S having binary representation, each bit assuming one of two binary values b₀ and bγ , the method being characterised in that the input string E is obtained from the automated lossless compression method according to anyone of the claims from 1 to 15 and in that it comprises the following steps: L. initialising the output string S to the null string "";

M. initialising a pointer h to the address of the first bit of the input string E; N. verifying whether the bit of the input string E corresponding to the address of the pointer h is equal to the value e₀ ; O. if the result of step N is positive: P.1 reading from the input string E a binary address represented by a first sub-string of n bits following the bit of the input string E corresponding to the address of the pointer h, P.2 updating the output string S by juxtaposing a second sub-string comprising a number of bits, ranging from 1 to N depending on the value of said binary address, said second sub-string comprising only one bit of value equal to b₀ , placed in the last position of said second sub-string, and P.3 updating the pointer h to the address of the first bit of the input string E following said first sub-string; P. if the result of step N is negative:

Q.1 updating the output string S by juxtaposing a number N of bits of value equal to bγ , and Q.2 updating the pointer h to the address of the first bit of the input string E following the bit corresponding to the address of the pointer h R. verifying whether the pointer h has a value higher than the one of the address of the last bit of the input string E S. if the result of step R is negative, repeating steps Ν, P, Q, e R.

20. Computer, comprising processing means, characterised in that it is apt to execute the automated lossless decompression method according to claim 19.

21. Computer program characterised in that it comprises code means adapted to execute, when running on a computer, the automated lossless decompression method according to claim 19.

22. Memory medium readable by a computer, storing a program, characterised in that the program is the computer program according to claim 21.