WO2001029972A1

WO2001029972A1 - Data file processing method

Info

Publication number: WO2001029972A1
Application number: PCT/EP2000/010178
Authority: WO
Inventors: Enzo Criscione
Original assignee: Pulsar Dna, Inc.
Priority date: 1999-10-19
Filing date: 2000-10-16
Publication date: 2001-04-26
Also published as: WO2001029752A3; TW487854B; AU7788300A; AR026177A1; AU1338401A; WO2001029752A2

Abstract

Very large compression ratios are obtained by processing a digital file in a temporal or 'time-based' manner, wherein 'time' may also indicate the number of times that a certain loop has to be executed. In a preferred embodiment, a compressed version of the file is a representation of the amount of time required to process the digital file against a given function. As the digital file is processed against the given function, a count is incremented at a given rate. The size of the file determines how long the file will take to be processed by the given function. When the processing is complete, a value of the count is then converted into a representation of the file. The digital file is later reconstructed by applying an inverse of the given function.

Description

DATA FILE PROCESSING METHOD

Technical Field

The present invention relates generally to data conversion and, in particular, to lossless data compression and decompression. Background Art

Data compression is a well-defined art. Thus, for example, lossless compression, as the name implies, denotes a compression scheme in which the decompressed or reconstructed data exactly matches the original file. Lossless compression techniques include such techniques as arithmetic coding, entropy coding, Huffman coding, predictive coding, Universal coding, and Ziv-Lempel coding. Arithmetic coding is a technique that maps source sequences to intervals between a pair of numbers. Entropy coding involves fixed-to-variable length coding of statistically independent source symbols. Huffman coding schemes map source symbols to binary codewords of integral length such that no codeword is a prefix of a longer codeword. Predictive coding is a form of coding where a prediction is made for the current event based on previous events and the error in prediction is transmitted. The JPEG standard, which finds widespread use on the Internet, is based on a linear predictive coding technique. Universal coding is a form of coding that is designed without knowledge of source statistics but that converges to an optimal code as the source sequence length increases. Ziv- Lempel coding is a family of dictionary coding-based schemes that encode strings of symbols by sending information about their location in a dictionary. Thus, for example, the LZ77 family uses a portion of the already encoded string as a dictionary, and the LZ78 family builds a dictionary of strings that are encountered in the data stream. The LZ77-based schemes are used commercially to create zip files, while the LZ78-based schemes find widespread use, for example, in GIF compression of image data.

While such techniques are well-known in the art, their performance, when measured in terms of compression ratio, is surprisingly modest. A compression ratio of a given compression scheme is defined as the ratio of the size of the original data versus the size of the compression data. Current data compression algorithms are capable of up to 10:1 lossless compression in true-fidelity scenarios, in which none of the original data may be sacrificed during the reduction is size; 25:1 lossy compression in high- fidelity scenarios (largely audio, video, and still photography), in which some of the original data may be omitted while still preserving the operative sensory elements of the data; and up to 200:1 lossless compression of sparse files, i.e., files with extremely large amounts of repetitive, unnecessary, or blank data, such as facsimile transmissions.

There remains a long-felt need in the art to provide new compression and decompression schemes that provide significantly higher compression ratios as compared to the known prior art.

The present invention solves this problem. Disclosure of the invention

It is a primary object of the present invention to provide a lossless data compression scheme with millicompression (1000:1) or higher compression ratios, irrespective of the size of the source file.

It is another primary object of this invention is to provide a novel compression/decompression technique that has widespread software and hardware applications.

Another primary object of the present invention is to provide a lossless compression/decompression technique that works as an add-on to lossy compression methods. A further primary object of this invention is to provide a lossless data compression coder-decoder (a "codec") that may be implemented in software or hardware and that may be used to facilitate very large scale data compression for numerous applications including, among others, telecommunications, aerospace, commercial, industrial, and consumer applications. It is yet another object of the present invention to provide a data file processing technique that may be used as an effective encryption/decryption method.

These and other objects of the present invention are achieved by processing a digital file to be compressed in a time-based or "temporal" manner. In one preferred embodiment, the compressed version of the digital file is a representation of the amount of time required to process the digital file against a given mathematical function, wherein the term "time" may also indicate the number of times that a certain loop has to be executed. As the digital file is processed against the given function, a count (e.g., a timer) is incremented at a given rate. The size of the file determines how long the file will take to be processed by the given function.

When the processing is complete, a value of the timer, together with any optional ancillary information, is then converted into a representation of the file. The digital file is later reconstructed from the representation by applying an inverse of the given function.

In a preferred embodiment, the given function includes the following iterative steps. A bit string comprising the file is decremented against a given value (e.g., 1). After decrementing the bit string against the given value, a test is made to determine whether the given value equals an end value (e.g., 0). If not, the decrementing step is repeated against the result of the first iteration. The result is then tested to determine whether the end value has been reached. This process is then repeated for however long is required to reach the end value. The time, which may represent the actual time taken by the whole process or equivalent information, like the number of times that the process has been repeated, is then encoded as a representation of the file, together with optional ancillary data, as it will become clear in the description.

The foregoing has outlined some of the more pertinent objects and features of the present invention. These objects should be construed to be merely illustrative of some of the more prominent features and applications of the invention. Many other beneficial results can be attained by applying the disclosed invention in a different manner or modifying the invention as will be described. Accordingly, other objects and a fuller understanding of the invention may be had by referring to the following Detailed Description of the Preferred Embodiments. Brief Description of the Drawings

For a more complete understanding of the present invention and the advantages thereof, reference should be made to the following Detailed Description taken in connection with the accompanying drawings in which: Figure 1 is a flowchart of a preferred routine for processing a file according to the present invention;

Figure 2 is a flowchart of a preferred routine for reprocessing the file that has been compressed according to the routine of Figure 1 ; Figure 3 is a flowchart of a more detailed embodiment for compressing a file according to the present invention;

Figure 4 is a flowchart of a more detailed embodiment for reprocessing the file that has been compressed according to the routine of Figure 3; and Figure 5 illustrates a preferred chip-based implementation of the inventive routines;

Figure 6 is a flowchart of another preferred routine for processing a file according to the present invention;

Figure 7 is a flowchart of a preferred routine for reprocessing the file that has been processed according to the routine of Figure 6; Figure 8 is a flowchart of a preferred routine for compressing a file according to the present invention, wherein the file is first processing by using a lossy compression method;

Figure 9 is a flowchart of a preferred routine for reprocessing the file that has been compressed according to the routine of Figure 8. Ways of carrying out the Invention

For the sake of clarity, in the following preferred embodiments it will be referred to the processing method mainly as a compression method, and to the reprocessing method as a decompression method. This is only for illustrative purposes: it should be clear to the skilled in the art that "compression" may by substituted by the word "encryption" and "decompression" by the word "decryption", as clearly illustrated in Example 5. Example 1. Figure 1 illustrates the basic operation of the inventive compression method. Figure 2 illustrates a corresponding decompression routine. The compression routine begins at step 100 to input the file A to be compressed. Regardless of its size, the file comprises a bit string: the sequence of bits in the string univocally identifies a finite, positive, integer value. The file may be of any given format, e.g., sound, video, image, data, or the like. The file may also represent a compressed from of a file, e.g., a zip file. At step 102, the file to be compressed is read. At step 104, an index n is set. For illustrative purposes, the index n = 1. At step 106, a count is initiated. A convenient way to increment the count is to initiate a timer from zero and count given time increments at a given rate. A test is then performed at step 108 to determine whether the file, as represented by the bit string, is equal to a given end value. For illustrative purposes, the given value is zero, although this is not a limitation of the present invention. If the output of the test at step 108 indicates that the file is not equal to the given end value, the routine continues at step 110 to apply a given function to the file. In the illustrative example, the given function decrements the file, namely, the bit string, by the index n. In this example, the given function sets A = A - 1. Control then returns back to step 108. When the outcome of the test at step 108 indicates that the file has reached the given end value, the routine branches to step 112 to save the then-current value of the count. The then-current value of the count represents the compressed version of the file. If desired, the then- current value of the count may be augmented with additional information, e.g., a file name, a timestamp, an identifier of the processor used to execute the routine, the count rate, or the like. Figure 2 is a flowchart illustrating a preferred routine for decompressing the file that is compressed according to the routine of Figure 1. To decompress the file ( and thus reconstruct the file A), the routine begins at step 200 by reading the then-current value of the count that was saved at step 112. At step 202, the index "n" is set to the same value that was set at step 104 during the compression routine. In the illustrative example, step 202 sets n = 1. At step 204, a variable "count" is set equal to "-count" and then incremented at the same given rate that was used during the compression routine. Thus, in the example, the count is a timer that begins at a certain negative value and increments, at the given rate, to a final value. At step 206, a test is performed to determine whether the count has reached the final value. If not, the routine continues at step 208 to set A equal to a given function, in this case, A = A + 1. Control then returns to step 206 to test whether the result is equal to the final value (namely, zero) .When the outcome of the test at step 206 is positive, the routine branches to step 210 to save the value A as the original file that was compressed. This completes the basic processing of the routine.

As can be seen, in a preferred embodiment, the inventive compression routine (and its complementary decompression routine) process the digital file in a temporal or "time-based" manner. In particular, the compressed version of the file is a representation (preferably an encoded numeric value) of the amount of time required to process the digital file against a given mathematical function. As the digital file is processed against the given function, a timer is incremented at a given rate. Ideally, the size of the file determines how long the file will take to be processed by the given function (e.g., A = A -1). When the processing is complete, a value of the timer is then saved as a representation or encoding of the original file. Ideally, the original file is then later reconstructed (i.e., decompressed) from the representation itself by applying an inverse of the given function (in this example, A = A + 1) over the same time period. One of ordinary skill in the art will appreciate that if the same index "n" and the same count rate are used for the respective compression and decompression routines, the file A generated by the decompression routine will exactly match the original file input to the compression routine. Thus, the inventive routine provides lossless compression. In the particular embodiment of Figures 1-2, the index n has a value of

1. In this case, the decrementing function is carried out at each iteration by reducing the value of the bit string by - 1. This function may be readily implemented (at each iteration) by simply complementing the least significant bit (LSB) of the bit string and complementing any higher order bit that has all of its lower significant bits equal to 0. This operation is a conventional down counter. As noted above, during the compression routine, if the result of applying the function is non-zero, the process is repeated for as many iterations as are necessary to decrement the bit string to the given end value, e.g., zero. As described above, the time taken for this process is the value that represents the "compressed" version of the original digital file. Example 2.

The routines illustrated in Figures 1-2 are illustrative techniques in which the value of the index "n" does not vary during the compression (or decompression) and is a given value, such as 1. Given the constraints of existing hardware and software, one of ordinary skill in the art will appreciate that the time that may be required to apply the given function (e.g., A = A - 1) to an arbitrary long bit string comprising the file A may be quite lengthy. Thus, according to the present invention, it may be desired to vary the index "n" throughout the given compression routine in the manner now described in Figure 3. Figure 3 is a flowchart of a preferred compression routine using conventional computational resources. In this example, the routine begins at step 300 by reading an input file A (in effect, a bit string of arbitrary length) that is desired to be compressed. At step 301, a variable i = 1 is set. This variable is used for indexing purposes as will be seen. At step 302, an index n = ni is set. The routine then continues at step 303, wherein a count t = t_; is set. This value represents a given start value (e.g., a time zero). At step 304, the count is incremented at a given rate. Thus, for example, the given rate may represent the clock speed of the processor being used to compress the file. The routine then enters a given first processing loop as follows.

A test is performed at step 306 to determine whether the file A (or some substantial portion thereof) is greater than a predetermined value, e.g., zero. As indicated in the flowchart, the file A may be partitioned such that all but the least significant bits (the "end of file" or EOF) are processed through this loop. If the outcome of the test at step 306 is positive, the routine continues at step 308 to set a variable B = A. During step 308, the count tj is paused. At step 310, the routine then applies the given function, namely, A - n_{i 5} to A to generate a result. Control then returns to step 306 and the process repeats. During this processing loop, of course, the count continues to increment. When the result of the test at step 306 is negative, e.g., because the value of A (or A| EOF, as the case may be) is now a negative number (as a result of applying the function in step 310), control then branches out of the first loop to step 312.

At step 312, a test is performed to determine whether the value of A (or A| EOF, as the case may be) is equal to a given end value (in this example, 0). If not, the routine continues at step 314 to save a time value ti (in the first iteration) corresponding to the index n^ Thus, at step 314, the index-time value pair (n_{l s} ) is saved in any convenient location (e.g., in a register, in system memory, on disk, or the like). At step 316, the variable i is then indexed to i + 1. Following step 316, the value A is set equal to the value B . This is step 318. Control then returns back to step 302 and the process repeats again for the new value of the index n. In each next iteration, the count preferably is started anew.

Thus, during a second and any subsequent iteration, other index-time value pair(s) (n_i? tj) are generated and saved. When, however, the result of the test at step 312 indicates that the value of A is now equal to the given end value, the routine branches to step 320. At this step, the routine saves all of the index-time value pair(s) as the representation of the original data file.

Thus, for example, if the given file A went through (i = 3) iterations of the processing loop, the resulting "compressed" file would be represented as follows (where | is a concatenation operator):

{name (optional)| other identifying data (optional) | (n_{1 ;} ti) | (n₂, t₂) | (n₃ 1 1₃) |

EOF (optional) } (1) or {name (optional)| other identifying data (optional)| (ni, ti) | EOF (optional) }

(2)

This completes the compression routine.

Although the values of the index may be selected in any convenient manner, one technique for selecting these values is now described. In a given iteration, the index n is selected by identifying a number of bytes then in the file, dividing the number of bytes by two, and then counting the number of given bit values (e.g., l 's) that are in a given half of the file. Thus, for the first iteration, the index may be generated by dividing by two the number of bytes in the original file and then counting the number of l's in either half. For the second iteration, the same calculation is then performed on the value B, which is generated in step 318. In an optimal scheme, if the file is odd, then at least one value (e.g., the last iteration) of the index "n" is 1. If the file is even, then at least one value (e.g., the last iteration) of the index "n" is 2. In a representative example, the compression routine was used to generate a compressed version of two audio files, file Ai being a 45Kbyte .wav file and file A₂ being a 75Kbyte .wav file. Using two iterations, with n_x = 500, and n₂ = 1, the resulting compressed file were approximately the same size, about 110 - 120 bytes. The time values in each index-time value pair, of course, were different due to the difference in the original size of the files.

One of ordinary skill in the art will appreciate that, regardless of the size of the original file, the resulting compressed version of the file will generally reduce to a relatively small data file that merely encodes the information set forth in equation (1) above. Thus, for example, the inventive compression routine of Figure 3 produced the following results (unless otherwise noted, all files are in bytes):

Sample Beginning Size Ending Size 1 72405 102

2 80, 302 115

3 2,276,345 123

4 334 Mbytes 117

Figure 4 is a flowchart illustrating how the compressed file representation of equation (1) is decompressed to obtain the original file A.

The routine begins at step 400 by retrieving the compressed file representation. At step 401, the routine sets the variable i = 1. At step 402, the routine gets a next index n,. The routine then continues at step 404 to set a count t, = -t,. Thus, in the first iteration of the decompression routine, t, is equal to the first time value t₁ that was saved during the compression routine, and n, is equal to the first index value n_! used during the compression routine. At step 406, a test is run to determine whether the time increment -t, is then equal to a given end value, e.g., "0" (or (0 | EOF, as the case may be).

If not, the routine continues at step 408, to set a value A = A + n, (which, in the first iteration, is equal to A = A + n^. Control then returns to step 406. When the result of the test at step 406 indicates that the time value has reached the given end value (namely, "0" or 0 | EOF), control then branches to step 410. At this step, a test is performed to determine whether there are any more time values "tj" remaining to be processed. If so, the routine sets i = i + 1 at step 412. Control then returns to step 402 and the process repeats with the next index. Thus, in the second iteration, the index - time value pair (n₂, t₂) is processed, during the third iteration, the index - time value pair (n₃, t₃) is processed, and so forth. When there are no more index - time value pairs to process, the outcome of the test at step 410 is negative. Control then branches to step 414 to save the value A as the original file (or the majority portion, if the file is actually A | EOF). This completes the decompression processing.

In an illustrative embodiment, the compression and decompression routines may be asymmetric, with the compression routine taking a longer period of time to execute (in real time) as compared with the decompression function. Primarily, this time differential is due to the fact that some additional temporal overhead is accumulated each time B is set equal to A (step 308) during the compression routine. This operation, as illustrated above, is not required during the decompression routine. Example 3.

Figure 6 illustrates a further example of the inventive compression method. Figure 7 illustrates the corresponding decompression routine. The compression routine begins at step 601, where the digital file A to be compressed is read. Regardless of its size, the file comprises a bit string, and the sequence of bits in the string univocally identifies a finite, positive, integer value, identified with the same label A. At step 602, an index i is set to a start value, for instance i = 0. This index is used to identify each temporary version of three variables, named, respectively, B, N and A again, that store temporary values used by the inventive compression process. These variables are linked together by the mutual relation: B AΓNJ. At start-up 602, Ao is set equal to A, while N₀ is set equal to A multiplied by a positive coefficient f, which coefficient is preferably less than 1. Summarising, at step 602: N₀ = A * f .

At step 603, the main loop is entered, and the aforementioned arithmetic computation is carried out, setting

A test is then performed at step 604 to determine whether the current value of B is lower than the current value of N, in which case the routine continues at step 605. On the contrary, if B is greater or equal to N, the routine branches to step 606, where the values of B, N and i are saved, taken back by one cycle, as the compressed version of original file A. Each of the three elements, B, N and i, identifies a portion of the compressed file, which elements, for illustrative purposes, have been named in block 610 as L (for Left), R (for Right), and T (for Times). If desired, the saved data may be augmented with additional ancillary information, e.g., a file name, a timestamp, an identifier of the processor used to execute the routine, the count rate, or the like. Of course, any equivalent computation can be selected and carried out by the skilled in the art to set the condition that makes the routine branch out of the main loop. If the output of the test at step 604 indicates that the B is lower than N, the routine continues at step 605, where the counter i is incremented, the new value of A is set equal to the current value of N, and the new value of N is set equal to the current value of B. The whole cycle is then repeated until B is no longer lower than N . Figure 7 is a flowchart illustrating a preferred routine for decompressing the file that is compressed according to the routine of Figure 7. To decompress the file (and thus reconstruct the file A), the routine begins at step 701 by initiating a count i and reading the values L, R and T that were saved at step 610. At step 702, the main loop is started. At each cycle, a temporary value of A is set equal to the sum of the current values identified by and L and R. It is then checked, at step 703, whether the current value of the count i is equal to the value T indicating the number of times that the loop must be repeated. If so, the current value of A_; matches the value of the original file A: file A is then fully decompressed and the process ends at step 705. If the count i is not equal to T, then it is incremented, at step 704, and the cycle is repeated.

It shall be clear to the skilled in the art that one of the inventive concepts at the basis of the present invention lies in the following considerations: a) any different data file comprises a bit string which identifies an unique, finite, positive and integer value; b) each value can be obtained by starting from any lower value and iteratively applying a certain mathematical function. Particularly, each value can be obtained by starting from two different lower values and processing them for a certain time, or a certain number of times.

Contrary to any existing compression algorithm, therefore, the present invention further adds the "time" dimension to the data processing.

Given this revolutionary starting point, a main problem that the skilled in the art has to face is how to determine an optimum function to be iteratively applied and how to determine the parameters to be used by said function.

From a technical point of view, the fastest mathematical operations that can be performed by a computer are addition and subtraction, while any other operation, including multiplying and dividing a number, is much heavier in terms of required processor time. The preferred embodiment here described, therefore, makes use of subtraction in the compression phase and addition in the decompression phase.

Particularly, the preferred embodiment here described is based on the principle that any integer value can be obtained by starting from two numbers and iteratively summing the second term to the sum of the two terms, as explained in the following scheme: count L + R = A i=0 4 + 10 = 14 i=l 10 + 14 = 24 i=2 14 + 24 = 38 i=3 24 + 38 = 62 i=4 38 + 62 = 100

The original file A, i.e. the sequence of 0 and 1 bits comprised in its bit string, identifies the number 100. Starting from the two values 10 and 14, and applying a sum function for 5 times, the number 100 is obtained. The compressed file, therefore, is identified by L=10, R=14, T=5 (or 4, if the count starts at 0, as often happens), and the decompression algorithm simply performs "add" operation to decompress, i.e. re-obtain, the original file. It is clear, therefore, that the compression phase substantially comprises the same steps above described, applying, however, an opposite of the chosen function. In this case, the compression routine iteratively performs subtractions on the original file first and on temporary data next.

A major problem that the skilled in the art has to face, however, is how to start the compression algorithm, i.e. how to determine the first value to be subtracted to the value identified by the original data file A.

It has been found that an optimum solution is given by multiplying A for a coefficient f, wherein f is the so called "ratio of the golden section", well known from mathematics, which identifies irrational number 0,61803389... .

Therefore, the compression process as described in example 3, would start from the value A=100, the original data file, and compute the first value of N as A * 0,618. The coefficient value can be taken up to any desired decimal place, according to the performance that one wants to achieve, also according to the size of data file A. So doing, the first value of N (or R in the final representation of the compressed file), which must be an integer, is then subtracted from A to obtain the first value of B (L in the final representation of the compressed file), then A is set equal to N and the process is iterated until B is no longer lower than N. However, it will be clear to the skilled in the art that several other tests can be performed to stop the process, for example checking whether one of the two values B or N is lower than a preset value or stopping the iteration after a certain number of times or certain amount of time.

The following table shows an example of the processing algorithm:

i=0 A=100 N=A*0,618= =62 B=A-N=38 i=l A=N=62 N=B=38 B=A-N=24 i=2 A=N=38 N=B=24 B=A-N=14 i=3 A=N=24 N=B=14 B=A-N=10 i=4 A=N=14 N=B=10 B=A-N=4 i=5 A=N=10 N=B=4 B=A-N=6

At step i=5 , the end condition B<N is met, therefore data taken from the record marked with an asterisk, namely B=4, N=10 and i=4 (or 5, if one wants to save the times that the decompressing loop is cycled instead of the value of the count), are concatenated and saved as the compressed version of the file. Of course, it would be possible also to save data at step i=5 (or 6), adding a more iteration in the decompression process. Example 4. It has been found that the inventive method described in the above preferred embodiments is also effectively applied as an add-on to lossy compression algorithms. It has already been described that lossy compression algorithms, which are mainly used for picture and sound data, operate a reduction on the original data, still maintaining, however, a good video or sound quality.

The present invention can further overcome all the problems deriving from the loss of data involved in the use of such lossy techniques by applying the same inventive concepts above described in parallel to lossy algorithms. An example is shown in the flowcharts of Figure 8 and 9.

Starting at step 801, the digital file A to be compressed is read. At step 802 a lossy compression algorithm is applied to the file, for instance any known compression method, like JPEG, JPEG2000, MPEG, MP3, MP4, DIVX, or any other proprietary lossy compression algorithm. At step 803, the file so obtained is saved as A' , as shown in block 811. A' is then decompressed using the dual of the chosen lossy algorithm, and the resulting data A _ossy, is then subtracted from A, so as to obtain file A". At step 804, the same process described in Example 3 with regard to digital file A is now performed on file A", steps 804 to 808 corresponding to steps 602 to 606. The corresponding values of B, N and i are then saved in the already described form now shown in block 812. The processed version 810 of original file A is then represented by a first part A' 811 , obtained by compressing original file A in a lossy manner, and a second part A" 812. The resulting file is significantly smaller than the originally file: moreover, the whole process now results in no loss of data, even if a lossy algorithm is used in part of the process.

Figure 9 is a flowchart illustrating a preferred routine for decompressing the file that is compressed according to the routine of Figure 8. To decompress the file (and thus reconstruct the file A), the routine begins at step 901 by reading the version of the file compressed in lossy manner, thus obtaining an uncompressed version of the file, A _ossy, affected, however, by loss of data. Then, at step 903, the decompression process described with regard to example 3 and Figure 7 is performed, steps 903 to 906 corresponding to steps 701 to 704. When the routine branches out of the main loop, at step 907, the current version of A" is added to the previously obtained uncompressed version of A labelled as A ^. The sum A^^y + A" corresponds bit by bit to original file A. Example 5. Finally, it shall be noted that the inventive concepts described in the present application, can be used not only for compression/decompression methods, but also as encryption/decryption systems. In fact, the resulting, processed files are completely unreadable at the end of the process, and can only be reconstructed by reprocessing the file against the function that was chosen for the compression on a certain time basis.

Generalizing, the above-described functionality may be implemented in many different ways, and the present invention is not limited to any particular implementation. One convenient implementation of the invention is in software executable in a processor, namely, as a set of instructions (program code) in a code module resident, for example, in the random access memory of the computer. In addition, although the various methods described are conveniently implemented in a general purpose computer selectively activated or reconfigured by software, one of ordinary skill in the art would also recognize that such methods may be carried out in hardware, in firmware, or in more specialized apparatus constructed to perform the required method steps. A preferred implementation is a dedicated device, such as an integrated circuit chip, that includes its own on-board processor, memory and system clock. In such case, the compression and decompression routines are stored in firmware or the like. Figure 5 illustrates a simplified configuration of one such device. In this example, the chip 500 includes a processor 502, system memory 504, firmware 506 (comprising the compression /decompression routines), and a system clock 508. The device is connectable to a computer motherboard, for example, via the link 510. Link 510, for example, may be an RS232, USB, or the like. When a given file A is desired to be compressed, the file is provided to the chip 500 in the usual manner. The resulting compressed file representation is then returned.

The invention routines may also be implemented in other known hardware such as a digital signal processor (DSP), a field programmable gate array (FPGA), an application specific integrated circuit (ASIC) chip, a microcontroller, in hardwired logic, or the like.

The codec of the present invention offers virtually unlimited lossless, content-insensitive compression. As a consequence, the applications for the codec are quite diverse. The following are several representative applications, although this description should not be taken to limit the scope of the present invention.

Even in its software-only form, the codec may be used as a substitute for existing archival algorithms. Once the codec is ported from software source code to a dedicated integrated circuit, it can be used to retrofit existing storage media with unprecedented usable capacity and allow enormous amounts of data to be transmitted across existing telecommunications and data communications infrastructures. The hardware codec may also be paired with custom-tailored operating systems that manage memory structures, with custom hooks into codec-driven storage devices and codec-enabled networking hardware. Software Applications

The following are areas in which the codec may be employed in software under existing operating systems.

^• Electronic Software Distribution (ESD): HTTP- and FTP-based distribution of software has exploded with the rise of e-commerce. Using the codec for file compression (as a product is only packaged for ESD once, the time needed for file compression would not be a problem) and wrapping up a freely-distributable decoder in the ESD package, any software company could sell their products on-line, regardless of file size. • Web Browser Acceleration: With a codec-enabled web browser (through a Netscape plug-in or ActiveX control) and a web site posting compressed HTML pages, JPEGs, and GIFs, bringing up a page would be hundreds of times faster. This application would work for static pages, applets, static frames, images, and prerecorded streaming audio and video. Moreover, because the codec would not alter fundamental Web content, it could be offered as a third-party add-in to existing web servers and supplied as a free add-in for existing web browsers. Hardware Applications Once ported to integrated circuit or other such hardware device with integrated processor capable of real-time compression, the codec provides additional improvements in performance. The chip has applications in both the data communications vector and data storage vector. Data Communication Vector • WAN Acceleration Appliance: One application of the Codec is in a hardware "black box" (chip and custom firmware) on both ends of a wide-area data transmission. Data packets are transmitted in a raw form over a local Gigabit Ethernet or ATM LAN, then compressed right before entering a slow-speed transmission medium (i.e. between the last router and its CSU/DSU). At the other end, an identical box decodes the data stream. Using the present invention, full-motion streaming video and audio would not only be possible but commonplace . The device could be used for both Intranet (that is, connecting multiple sites within a company) and Internet connectivity - the latter requiring that one's ISP use the device. The codec may also be used to implement an Acceleration

Appliance, an end-node version with a lower-end processor to handle a certain volume of traffic in real-time; and an Analog Acceleration Appliance, targeted at asynchronous Internet connections to local POPs.

• LAN/WAN Acceleration Chip: The integration of the hardware codec directly into routers, switches, and network interface cards would be quite useful. Soldering the chip to hardware and codec-enabling firmware would (a) reduce overall costs of implementing the codec, (b) enable "codec to desktop" through integration onto NICs and within switch/router backplanes; and (c) simplify internetworking architectures by reducing the number of devices involved. The IC would have such a small physical footprint that it could easily be implemented in a wide range of Ethernet - compatible interfaces, from laptop PC Cards to 802.11 wireless ports tolOOOBase-T NICs.

• RS-232/RS-422 Acceleration Module: While data transmission using serial protocols will always have physical speed limits, placing a host- powered codec "dongle" between the serial port and the cable medium at both ends would have a dramatic impact on data collection and instrumentation monitoring in manufacturing environments, eliminating all bottlenecks due to old-school transmission media. ^• Satellite Communications Accelerator: As with the similar products above, this would codec-enable both orbiting reconnaissance, cable, and telecommunications satellites and out-of-orbit research satellites. The time needed to transmit celestial images, sports simulcasts, and other photos would be negligible; when used on earthbound wireless media, mobile phone fidelity would be perfect due to a higher sampling frequency and wireless CDPD data connections, currently restricted to 19.2 Kbps, could come up to current leased-line speeds. Data Storage Vector

^• DASD Interface: The DASD Interface would be an IC-based traffic accelerator for the DASD (Direct Access Storage Device, or hard drive) communication channel.

• DASD Compression: Codec-enabled motherboards would allow OS storage divers to pass all data through the codec before it hits the DASD, and decode the data automatically on the way back. • ROM/NVRAM Retrofit: With the addition of a hardware code, the existing chips in cell phones, Palm OS handhelds, digital watches - any device currently equipped with non-volatile memory of any kind- would have an effectively limitless off-line storage capacity and a real-time data access capacity limited only by their working RAM.

• ROM Compression: The hardware code would also make practical an entire new generation of portable devices built from the ground up with huge storage capacities: handheld libraries, wallet-size medical IDs with a copy of your extended family's medical history and your DNA fingerprint on-board, portable audio players with a true fidelity copy of a large number of CDs in current release, and feather-light laptops booting an operating system such as Windows NT from ROM.

• NVRAM Storage Devices: By combining the size, speed and portability of existing flash media such as the Sony Memory Stick with codec- enabled motherboards, a vastly improved device is created. Zip desks and other "high capacity" removable storage devices would not be required.

^• Digital Tape Compression: codec compression for long-term archiving is another application. By embedding the codec IC into DLT or DDS tape drive hardware, a business can keep decade-long digital histories on a single tape.

^• Medical Applications: The following are exemplary: (a) enabling the real-time transmission of MRI and X-ray images, the most complex loss- sensitive digital images in the world, currently compressed using the CCITT algorithms, over any connection; (b) medical monitoring equipment would be able to keep a complete signal history at a very high sample rate, rather than having to dump data histories to printers or overwrite them; and (c) replicated databases of patient information could be kept at every hospital in the world, rather than being haphazardly transmitted after emergencies or transfers. • New Operating Systems: Operating systems developers may write code that uses the codec's on-board logic to multiply and manipulate system memory structures with an efficiency never before possible.

It has been shown that the present method fulfils the proposed aim and objects. Clearly, several modifications will be apparent to and can be readily made by the skilled in the art without departing from the scope of the present invention. Therefore, the scope of the claims shall not be limited by the illustrations or the preferred embodiments given in the description in the form of examples, but rather the claims shall encompass all of the features of patentable novelty that reside in the present invention, including all the features that would be treated as equivalents by the skilled in the art. For example, it is clear that the data processing algorithms can be iteratively applied even during the same processing of a digital file A. For instance, the compression process applied to a certain digital file A may come to an end resulting in a compressed file A' that is deemed to be not sufficiently small.

In this case, the skilled in the art understands that it sufficient to reapply the same method taking A' as a starting point so as to obtain A", i.e. a compressed version of A' , which is in turn a compressed version of A. The process may be repeated until the size of A^k' is satisfactory. In a similar way, it is clear to the skilled in the art that the source digital file A to be processed can conveniently be partitioned into several smaller parts, A]... A_n, and that the processing method can then be separately applied to each of these parts. The resulting, compressed file would be a concatenation of the outcome of the processing method applied to each partition of the file .

Having thus described my invention, what I claim as new and desire to secure by Letters Patent is set forth in the following claims.

The disclosures in US Patent Application No. 09/420,667 from which this application claims priority are incorporated herein by reference.

Claims

1. A method for processing a digital file comprising a bit string, comprising the steps of: initiating a count that increments at a given rate; as the count increments, iteratively decrementing the bit string, by a given value, until a given end value is reached; and upon reaching the given end value, saving then a current count increment as a representation of the digital file.

2. The method as described in Claim 1 wherein the count is a timer.

3. The method as described in Claim 1 wherein the given binary value is 1.

4. The method as described in Claim 1 wherein the end value is 0.

5. The method as described in Claim 1 further including the step of reconstructing the file by : setting the representation to a given count value and incrementing the given count value at the given rate; as the given count value increments, iteratively incrementing a given number, by the binary value, until a given end value is reached; and upon reaching the given end value, saving a then-current value of the given number as the file.

6. The method as described in Claim 1, used for compressing a data file.

7. The method as described in Claim 1, used for encrypting a data file.

8. The method as described in Claim 5, used for decompressing a data file.

9. The method as described in Claim 5, used for decrypting a data file.

10. A method of compressing a file A, comprising the steps of: (a) getting a next index n_;, where i = 1 , 2, ...

(b) initiating a count tj, wherein 1 = 1 , 2, ...

(c) as the count tj increments:

(1) determining if A is greater than 0

(2) if so, setting B = A; (3) calculating A = A - n₂ ; and (4) repeating step (c) (1) until A is greater than 0;

(d) when A is greater than 0, determining if A is equal to a given value;

(e) if not, setting A = B ;

(f) saving an index, time value pair (n,, t,); (g) setting i = i + 1 ;

(h) repeating steps (a) -(f); and

(i) if A is equal to the given value, concatenating the index, time value pairs as a compressed version of the file A.

11. The method as described in Claim 10 wherein the given value is 0.

12. The method as described in Claim 10 wherein the given value is (0 | EOF).

13. The method as described in Claim 10 wherein a given value of the index is calculated by: dividing a number of bytes of the file A by a given number; determining a number of bits in a portion of the file having a given value; setting the index equal to the number of bits.

14. The method as described in Claim 10 wherein a given value of the index n, is equal to 1.

15. The method as described in Claim 10 wherein a given value of the index n, is equal to 2.

16. The method as described in Claim 10, used for encrypting a data file.

17. A method for decompressing a file represented by a concatenation of index, time value pairs, comprising the steps of:

(a) getting a next index n,, where i = 1, 2, ... (b) initiating a count t„ wherein i = 1 , 2,...

(c) as the count t, increments from a value -t,:

(1) determining whether t, is equal to a given value;

(2) if not, calculating A = A - n,; and

(3) repeating step ( c) (1) until t, is equal to the given value; (d) determining whether there are any count values remaining to be processed;

(e) if so, setting i = i + 1

(f) repeating steps (a) - (e); (g) if there are no count values remaining to be processed, saving A as the file.

18. The method as described in Claim 17 wherein the given value is 0.

19. The method as described in Claim 17 wherein the given value is (0 | EOF).

20. The method as described in Claim 17, used for decrypting a data file.

21. A computer program product in a computer readable medium for compressing a file represented as a bit string, comprising: a set of instructions executable in a processor for performing the following compression method steps: initiating a count that increments at a given rate; as the counts increments, iteratively decrementing the bit string, by a given value, until a given end value is reached; and upon reaching the end value, saving a then-current increment as a representation of the file.

22. The computer program product as described in Claim 21 further including: a set of instructions executable in a processor for performing the following decompression method steps: setting the representation to a given count value and incrementing the given count value at the given rate; as the given count value increments, iteratively incrementing a given number, by the binary value, until a given end value is reached; and upon reaching the given end value, saving a then-current value of the given number as the file.

23. The computer program product as described in Claim 22 wherein the processor used to perform the decompression method operates at a clock speed equal to the clock speed of the processor used to perform the compression method.

24. An integrated circuit codec, comprising: a processor; a clock operating at a given clock rate; and code executable by the processor at the given clock rate to compress a file represented as a bit string, by: initiating a count that increments at the given rate; as the counts increments, iteratively decrementing the bit string, by a given value, until a given end value is reached; and upon reaching the given end value, saving a then-current count increment as a representation of the file.

25. The integrated circuit as described in Claim 24 further including: code executable by the processor at the given rate to decompress the representation to regenerate the file, by: setting the representation to a given count value and incrementing the given count value at the given rate; as the given count value increments, iteratively incrementing a given number, by the binary value, until a given end value is reached; and upon reaching the given end value, saving a then-current value of the given number as the file.

26. A method for processing a digital file A comprising a bit string, which bit string univocally identifies a finite, positive integer value, comprising the steps of: a) initiating a count i=0 and setting A_i=0=A; b) calculating a value N_i=0, where N_i=0 is a positive, finite integer value; c) setting

d) determining if B, is lower than N,; e) if so, incrementing i; f) setting A, = N,__ι ; g) setting N^ B,^; h) repeating steps (c) to (g) until B_! is not lower than N,; i) concatenating and saving B_{1-l 5} N,_ι and i-1 as a version of said digital file A.

27. The method of Claim 26, wherein N,₌₀ is calculated according to the formula N₁₌₀=A₁₌₀ * f .

28. The method of Claim 27, wherein f is the ratio of the golden section, taken up to a finite decimal place

29. The method as described in Claim 26, used for compressing a data file.

30. The method as described in Claim 26, used for encrypting a data file.

31. A method for processing a digital file represented by the concatenation of finite, positive, integer values including a first value L, a second value R and a third value T, comprising the steps of: a) initiating a count i; b) setting L,=o=L and R₁₌₀=R; c) calculating a value A^I^+R, d) setting L,=R,; e) setting R,=A,; f) repeating steps (c) to (e) for T times; g) saving A, as the final decompressed version of the digital file.

32. A method of performing a lossless data compression on a digital file A comprising the steps of: a) performing a first compression of file A to A' by using a lossy compression algorithm; b) setting A" = A - A^^_y, wherein A^_ssy i the lossly decompressed version of A' , decompressed according to the dual of the lossy compression algorithm used in step (a); c) initiating a count i=0 and setting A"_i=0=A"; d) calculating a value N_i=0, where N;₌o is a positive, finite integer value; e) setting

f) determining if B_; is lower than N;; g) if so, incrementing i; h) setting A" = N ; i) setting Ni.ι = Bi.ι; j) repeating steps (b) to (f) until Bj is not lower than N_J; k) concatenating and saving A' , B_;-1, Nj.i and i-1 as a version of said file A.

33. The method of Claim 32, wherein N_i=0 is calculated according to the formula N_i=0=A_i=o * f .

34. The method of Claim 33, wherein f is the ratio of the golden section, taken up to a finite decimal place.

35. The method of Claim 34, wherein said lossy compression algorithm is selected from the plurality of lossy compression algorithms comprising: a) JPEG data compression; b) JPEG2000 data compression; c) MPEG data compression; d) MP3 data compression; e) MP4 data compression; f) DIVX data compression; g) proprietary data compression.

36. A method for decompressing a digital file represented by the concatenation of data A' compressed in a lossy manner and finite, positive, integer values including a first value L, a second value R and a third value T, comprising the steps of: a) decompressing said data A', compressed in a lossy manner, to A _oSSy, using the dual of the algorithm used for compression; b) initiating a count i; c) setting Lj₌o=L and R_i=0=R; d) calculating a value

e) setting L_J=R_J; f) setting Ri=A" ; g) repeating steps (d) to (f) for T times; h) concatenating

and A"; as the final decompressed version of said digital file.