WO2005067152A2

WO2005067152A2 - Realtime & streaming perfect compression 2

Info

Publication number: WO2005067152A2
Application number: PCT/IB2003/006302
Authority: WO
Original assignee: Van Gucht, Jurgen
Priority date: 2003-12-29
Filing date: 2003-12-29
Publication date: 2005-07-21

Description

Realtime & Streaming Perfect Compression 2

Description:

Following here is a description of compressing a 15-bit size item. Of course; any item-size may be used by the algorithm; this is just an example description to describe how the algorithm works in general.

We are going to try to write 15 unique bits (2^Λ15) into a shorter bit-string and this without necessary having to make use of a dictionary & DPCM processing and/or variants, but simply by using the sorting-algorithm in bits. As a result, we are not that much dependent on our randomness of our bit-string, and we could compress smaller packages (like win 1 bit out of every 90 bits) and thus get a higher compression rate, suitable for Realtime & Streaming Perfect Compression. This is thus a more advanced version of my original Perfect Compression patent.

Compression steps:

1) First, we convert our 15 bits, to the 10-base (decimal) cardinal, because we need to know a unique identification of the number representing these bits. With 2^Λ15 possible values, we have 32768 possible combinations (going from 0 to 32767). In other words, our 15-bit number may represent: any number such as 478, or 19054 or 8448,... within the range of 2^Λ15. We may also skip this step and convert our bit-string directly to our desired string (see next step)

2) We use this 10-base (decimal) number, representing the 15-bits, to convert, for instance, to the 8-base cardinal (0-7), and not just this, we also need to make sure that this 8-base cardinal number has unique digits, for instance, the number 10423675 is a possibility. So, this means that for 8 numbers of 3-bit (8-base cardinal), we have 8! = 8*7*6*5*4*3*2*1 = 40320 unique possible values/combinations that we can represent. It is of importance that the numbers, the digits representing the number, (in our example: 10423675) are unique, because we want to use a sorting algorithm on them and so there are no numbers missing or doubles (like in 10423474, the 4 is coming up three times and the numbers 5, 6 are missing). In other words, without missing numbers and/or doubles, we don't need any dictionary to know what they represent and thus by using simply our sorting algorithm' bits, we can correctly reform our number (in our example, the number 10423675) Although, not required, we could make use of a dictionary that would enable us know the doubles and/or missing digits in non-unique digits (like in 10423474).

3) Now, we use our sorting algorithm to un-sort 01234567 to 10423675, or 76543210 to 10423675. The resulting bits (of the sorting algorithm) may be fewer than our 15 bits (for instance 14, or 12, then we win, 1 bit and 3 bits respectively) In other words, the purpose is to find an optimal sorting algorithm to produce the fewest amount of bits in order to un-sort, and thus reproduce our number. We may also use other compression algorithms to further compress this bit-string.

The de-compression part is doing the opposite of these 3 steps, thus recovering our original bit- string.

Also, we need to note that, in this example, our 8 numbers of 3-bits (unique), can represent 8! = ₈*₇*₆*₅*4*3*2*-| = 40320 possible combinations, while our 2^Λ15 bits represent only 32768 possible combinations, thus, in effect our 40320, may represent actually 15.3 bits (=log(40320,2)). We thus lose 7552 (= 40320-32768) combinations, or 0.3 bits in the process. How can we solve this? We could, for instance, combine many of our our 8 numbers of 3-bits and also in order.to come closer to our 2^ΛX number. For example, these 3 combinations: 10423675, 61432570 and 21037654, each represent 15.3 bits, in total = 15.3*3 = 45.9 bits and when combined, they may represent 46 bits, or 2^Λ46 possible combinations). Also, it is not required to convert our original bit-string to an 8-base cardinal number (unique digits), me may, as well, convert it to a 16-base (hexadecimal) cardinal number (unique digits)

De-compression steps: 1) We use our sorting bits, that represent our compressed bit-string, and apply the un-sort algorithm on the number 01234567, in order to get 10423675. Of course, we may also apply the un-sort algorithm to its reverse; like 76543210 or make use of a dictionary that would enable us know the doubles and/or missing digits in non-unique digits. 2) We reverse the process to convert this 8-base cardinal number with unique digits, and convert it to our 10-base (decimal) cardinal number, or convert it directly to our original bit-string (step3)

3) Convert this 10-base (decimal) cardinal number to our binary format, our original 15 bits. We can further optimize our compression algorithm by using different increment sequences of the sorting algorithm (like in Shellsort). Or simply, by trying different sorting algorithms that results in less comparisons than the one described here. Of course, the compression speed of the chosen algorithm may be as important as the compression rates, since we can continue compressing the resulting data for a second pass and so on...

Now that we can, for instance, compress 90 bits (6 * 15 bits) by 1 bit or more, we can also easily compress big data files (e.g. 1Gb), in blocks of 90 bits. We can now, for instance, use a 'pointer', a number that tells us how many times/steps a file has been compressed by perfect compression. So that each time it decompresses, e.g. from 100 to 99, (we decrement each time by 1), we know the file still has to be decompressed by 99 times. And we may also include a second pointer that tells us in how many blocks the file is compressed; the smaller the compressed file gets, the fewer blocks are used. It is now possible, for example, to compress 1GB of data by an amazing compression factor. Or, compress small 4K-64K clusters for direct use in harddisks or for compressed, streaming data transmission. Of course, there is a theoretical limit that will tell us how many times a file can be compressed by this perfect compression algorithm, plus we need to store our pointers, telling us how many times a file has been compressed and/or in how many blocks. We may as well decide to compress selectively (like with ShellSort, per 15 bits, choose an n-th part and select the first 7 bits from the bit-string, then the second 8 bits, but 2 bits apart from the first, this continuing until we reaches the end of the file, then we select like 3 bits apart, ...) Or we may also decide to use statistics, for example; if we win 10 bits (with one particular sample), and use only 2^Λ9, or 9 bits to show its position in the file, then we have won 1 bit.

Because the algorithm has a rather small footprint (in memory and length of the algorithm), and we can compress using very small blocks (in this example, blocks of only 90 bits each), the algorithm can easily be programmed on fast parallel FPGA & ASIC circuits (more gates in parallel), for direct implementation in hard disks, modems, satellites, cell phones, etc. As such, we can now achieve Realtime & Streaming Perfect Compression. time/streaming (random) data, for use to compress video / images, or for use in voice recorders, memory sticks, data banks, external data storage devices, all types of memory cards and external storage media, like Compact Flash, MicroDrive, SmartMedia, SD, MMC, MemoryStick (- 110 Pro), Datapak, Dataflash, Smart Media, USB Memory, PC Card Hard Drives etc. 22. Any application of claims 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 using this technology as software based or hardware based, included but not limited to chips such as FPGA, ASIC, CPLD, EPLD, PLD

115 23. Any application of claims 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 using this technology in combination with any other data compression algorithms such as those used by GIF, JPEG, ZIP, MPEG, MP3 and others

120

Claims

Claims:

1. A compression/decompression algorithm to compress/decompress random/not random data and comprising the following processes: a) compression process that performs compression mainly based on the bits of a sorting algorithm and takes advantage of the fact that a sorting's algorithm bits may be smaller than the combinations it may represent and so it doesn't necessary require the use of DPCM and/or doesn't necessary require a dictionary with DPCM compression and/or variants and/or b) decompression process that reverses the process in 1 a and thus uses the 'compressed' bits of the algorithm to sort/un-sort in order to form the uncompressed, original bit-string

2. Any variation of claim 1 that doesn't necessary require any other type of dictionary- compression algorithm with any sort of encoding techniques including, but not limited to: Lempel-Zif, Huffman coding, BWT (Burrows-Wheeler Transform), PCM (Pulse Coded Modulation), DPCM (Differential Pulse Coded Modulation), or any code modulation formats, Discrete Cosine Transform (DCT), Wavelet Transform (WT), Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), hashing techniques or any form of Integer Compression or any other form of lossless data compression/encoding

3. Any variation of claim 1 that improves its performance resulting in only faster (or lower) compression/decompression speeds or resulting in only higher (or lower) compression rates/ratios.

4. Any variation of claim 1 that performs compression mainly based on the bits of a sorting algorithm and that takes advantage of the fact that a sorting algorithm's bits may be smaller than n! combinations that it may represent (for instance; 8! = 8*7*6*5*4*3*2*1 = 40320 combinations, or 15.3 bits and the sorting algorithm may only require 13 bits, so we win 2 bits for this sample)

5. Any variation of claim 1 using any other kind of sorting/un-sorting algorithm including all sorting/unsorting algorithms based on encoding, hashing and of any compressed type

6. Any variation of claim 1 using other compression algorithms in order to further compress the data, such as the BWT, hashing techniques or any encoding techniques or such as those mentioned in claim 2.

7. Any variation of claim 1 resulting in compression of other size(s) of blocks (other than described in the description) or using any statistical method to win bits (other than described in the description), or converting the bits to any cardinal number and back (other than described in the description)

8. Any variation of claim 1 by compressing any item size, for example, a 90-bit block size may contain 6 items of 15-bit so here we use a 15-bit item size, so any item-size is possible, even a 3-bit size or a 1024-bit size is possible

9. Any variation of claim 1 whether the data to be compressed/decompressed as stated in claim 1 is truly random data or not.

10. Any variation o c a m 1 to compress ecompress data more than once, over multiple passes in order to increase the compression rate/ratio, for example 100 passes may compress a file of 1Gbyte to 1Mbyte

11. Any variation of claim 1 to compress/decompress data more than once, with an n-th selection in order to continually further compress the data in another way

12. Any variation of claim 1 in combining the combinations of several conversions; for instance, we could combine many of our our 8 numbers of 3-bits and also in order to come closer to our 2^ΛX number, for instance; these 3 combinations: 10423675, 61432570 and 21037654, each represent 15.3 bits, in total = 15.3*3 = 45.9 bits and when combined, they may represent 46 bits, or 2^Λ46 possible combinations

13. Any variation of claim 1 to compress/decompress data in a realtime / streaming way including any hardware and/or software required to perform it

14. Any variation of claim 1 by using a separate bit-table or a pre-processing table that may be required by the algorithm in order to improve the statistical change of winning bits, an example of a bit-table is: to include a bit per compressed 15-bit item saying 0 = needs shell sort increments of type A, or 1 = needs shell sort increments type B, another example is: 0 = don't compress 15-bit item, 1 = compress 15-bit item

15. Any variation of claim 1 by making use of any 'statistics' (for instance; winning 10 bits and using a pointer of 9 bits to tell its position in the bit-string and thus win one bit in the process)

16. Any variation of claim 1 by making use of the modular (mod) function for direct calculations and/or any-table-like structure to convert/compress/decompress our original bitstring to our sorting-bitstring and back

17. Any variation of claim 1 to build the compression/decompression steps with unique or not- unique digits, other than described in the description, for instance 10423675 has 8 unique digits, 10423474, has 4 coming up 3 times and the numbers 5 and 6 are missing

18. Any variation of claim 1 to unsort any digits/numbers, other than described in the description, in order to convert/compress/decompress our original bitstring to our sorting- bitstring and back, for instance; for 8 numbers of 3-bit (0-7): 01234567 becomes 10423675, or, for 4 number of 2-bit (0-3); 0123 becomes 3120

19. Any variation or application of claims 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13, 14, 15, 16, 17, 18 by using any part of the algorithm and making it part of another algorithm and so as to achieve the same goal but in another way

20. Any application of claims 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18,19 using this technology in any industry including but not limited to software industry, data storage industry, digital entertainment industry, data transmission industry, telecommunications industry

21. Any application of claims 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 to use this technology including but not limited for areas such as file compression in general, compressed solid state disks, data transmission, video compression (like MPEG, AVI etc), network/modem compression (Radio/Telephone/DSL/Cable/Satellite etc), music compression (like MP3 etc), graphic compression (like JPG etc), database compression (like SQL Server etc), speech compression (like cell phones etc), communications (like Cell Phone, PDA, Pocket PC, broadcast TV, Interactive TV, cell phone video conferencing, Fax Devices, Handhelds etc), digital entertainment (like DVD, CD, etc), any form of data encryption, to compress real-