WO2017037502A1 - Compression code and method by location - Google Patents

Compression code and method by location Download PDF

Info

Publication number
WO2017037502A1
WO2017037502A1 PCT/IB2015/056562 IB2015056562W WO2017037502A1 WO 2017037502 A1 WO2017037502 A1 WO 2017037502A1 IB 2015056562 W IB2015056562 W IB 2015056562W WO 2017037502 A1 WO2017037502 A1 WO 2017037502A1
Authority
WO
WIPO (PCT)
Prior art keywords
values
bit
bits
code
encoded
Prior art date
Application number
PCT/IB2015/056562
Other languages
French (fr)
Inventor
Kam Fu CHAN
Original Assignee
Chan Kam Fu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chan Kam Fu filed Critical Chan Kam Fu
Priority to PCT/IB2015/056562 priority Critical patent/WO2017037502A1/en
Publication of WO2017037502A1 publication Critical patent/WO2017037502A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3066Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction by means of a mask or a bit-map

Definitions

  • This invention relates to compression for the use and protection of intellectual property, expressed in the form of digital information, including digital data as well executable code for use in device(s), including computer system(s) or computer-controlled device(s) or operating-system-controlled device(s) or system(s) that is/are capable of running executable code or using digital data.
  • device(s) is/are mentioned hereafter as Device(s).
  • this invention relates to the method and schema as well as its application in processing, distribution and use in Device(s) of digital information, including digital data as well as executable code, such as boot code, programs, applications, device drivers, or a collection of such executables constituting an operating system in the form of executable code embedded or stored into hardware, such as embedded or stored in all types of storage medium, including read-only or rewritable or volatile or nonvolatile storage medium (referred hereafter as the Storage Medium) such as physical memory or internal DRAM (Dynamic Random Access Memory) or hard disk or solid state flash disk or ROM (Read Only Memory), or read-only or rewritable
  • the Storage Medium such as physical memory or internal DRAM (Dynamic Random Access Memory) or hard disk or solid state flash disk or ROM (Read Only Memory), or read-only or rewritable
  • this invention reveals the method and schema as well as its application that could be used to make compression of digital information.
  • it makes possible the processing, distribution and use of digital information in Device(s) connected over local clouds or internet clouds for the purpose of using and protecting intellectual property.
  • the compressed code could also considered an encrypted code as well.
  • invention is not limited to delivery or exchange of digital information over clouds, i.e. local area network or internet, but could be used in other modes of delivery or exchange of information.
  • Lossless data compression algorithms usually exploit statistical redundancy to represent data more concisely without losing information, so that the process is reversible. Lossless compression is possible because most real-world data has statistical redundancy. For example, an image may have areas of colour that do not change over several pixels; instead of coding "red pixel, red pixel, " the data may be encoded as "279 red pixels". This is a basic example of run-length encoding; there are many schemes to reduce file size by eliminating redundancy.
  • LZ Lempel-Ziv
  • DEFLATE is a variation on LZ optimized for decompression speed and compression ratio, but compression can be slow.
  • DEFLATE is used in PKZIP, Gzip and PNG.
  • LZW Lempel-Ziv-Welch
  • LZR Lempel-Ziv-Renau
  • LZ methods use a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX).
  • a current LZ-based coding scheme that performs well is LZX, used in Microsoft's CAB format.
  • the class of grammar-based codes are gaining popularity because they can compress highly repetitive text, extremely effectively, for instance, biological data collection of same or related species, huge versioned document collection, internet archives, etc.
  • the basic task of grammar-based codes is constructing a context-free grammar deriving a single string. Sequitur and Re-Pair are practical grammar compression algorithms for which public codes are available.
  • Arithmetic coding invented by Jorma Rissanen, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bi-level image compression standard JBIG, and the document compression standard Dj Vu.
  • the text entry system Dasher is an inverse arithmetic coder. [8]"
  • table-based methods is how longer entries of digital data code could be represented in shorter entries of code and yet could be recoverable. While shorter entries could be used for substituting longer data entries, it seems inevitable that some other information, in digital form, has to be added in order to make it possible or tell how it is to recover the original longer entries from the shortened entries. If too much such digital information has to be added, it makes the compression efforts futile and sometimes, the result is expansion rather than compression.
  • the additional information for one or more entries of the digital information is stored interspersed with the compressed data entries, how to differentiate the additional information from the original entries of the digital information is a problem and the separation of the compressed entries of the digital information during recovery presents another challenge, especially where the original entries of the digital information are to be compressed into different lengths and the additional information may also vary in length accordingly.
  • C if built in with such flexibility and capability, could also divide the digital information into pieces or units for compression. For instance, portions of long series of 0s or Is or other similar patterns could lie interspersed within the digital information file at different locations; in this case unit processing could be selected as the preferred way of coding. So this invention does allow for compressors and decompressors to take part in the compressing and
  • Compression Unit 1 byte has 8 bits and a 1 byte ASCII code table has 256 variations or 256 different ASCII characters.
  • the CU could be 1 byte or 2 bytes or in other sizes, such as 1 bit with 2 variations or 2 bits with 4 variations or 3 bits with 8 variations, etc. etc.
  • LT1 is the Table Name, its Table Binary Number, i.e. using binary notation, is 0. This LT has only 1 location only. Usually, the Location Binary Address (LB A) for it could be given a binary number as 0. But this LT1 has only 1 location, so its LB A could be omitted, designating the TBN is designating the only location of this LT. So to designate this only location in LT1, one could just use 0: as the Table Location Binary Address (TLBA) instead of 0:0. To write into the compressed code, one could omit the : symbol and write 0 only.
  • TLBA Table Location Binary Address
  • LT2 has two locations. To designate its first location, one could use 1 :0 and write 10, the second location 1 : 1 and write 11 as the compressed code. If this LT2 has instead only one location, then one could just use 1 : to designate it and write 1 as the compressed code just like the case in Paragraph 20.
  • LT2 The above LT is called LT2, because it assumes there are two LTs being used altogether. So in the case of two LTs discussed above, the situation is like:
  • the first one LT1 its TBN 0; because it has only one location, so one could just write in 0 as its compressed code of its TLBA.
  • the TBN of LT2 is 1.
  • the LBA portion could not be 8 bits if one bit is given to TBN of the LT or the TBN of the LT has to be given no bit at all.
  • this 256 LB As are then used as its TLBAs to hold the 256 values. As such no values could be compressed into fewer bits. So the only option is to use more than one LT. But using more than one LT, one has to use the TBN portion in designating the TLBAs; the more the LTs are used, the bigger the TBN portion.
  • TBN:LBA should be less than 8 bits, i.e. TBNS+TS
  • Table Binary Number Size plus Table Size should be less than 8 bits.
  • TBN:LBA In order to achieve compression for this CU of 1 byte having all the 256 values, there are many possible combinations of TBN:LBA to try. Which combination of these TBN:LBA is the best one also depends on the nature and characteristics of the digital information, which is to be collected in the DP stage by A or C.
  • the size of CU may not be just one byte, one could use other CU sizes as well, such as 2 bytes or half a byte or any other bit lengths. This further complicates the problem as one may first wonder which size of CU is appropriate to the digital information of the input file to be compressed.
  • the modeling i.e.
  • the present invention therefore is the first attempt to provide an initial answer to the above questions. With the advent of super computers with enormous computing power, such problems could be approached in a better and better way.
  • tie- breaking could be done by other ways. For instance, one could break the tie by finding out which values within the tie appear more frequently after the preceding value in the ranking. For example if the preceding values is 0011, the values of the tie are 1100 and 0110. Then if 0110 appear more frequently after 0011. Then 0110 should come next before 0110.
  • the present invention does not assume any particular characteristics of the digital input file, whether it is a text file, audio file or image file.
  • the input files are all treated in the same way, and they appear only as a series of 0s and Is in different orders and patterns.
  • TLBA consists of two parts, i.e. TBN:LBA, which uses at least one bit for specifying a TLBA. In short, no gain is to be achieved using the present invention for this. Using run-length encoding seems more appropriate.
  • a 2-bit CU has four values, 00, 01, 10 and 11. Putting them into two LTs, there are two combinations for this. Either the first LT has 1 value and the second LT has 3 values or the first LT has 2 values and the second 2 also. In the latter case of using 2 LTs each having 2 values, the TLB As are 0:0, 0: 1 and 1 :0, 1 : 1, again 2 bits have to be used for each values and no gain at all.
  • TLB As (TBN:LBA) are 0: and 1 :00, 1 :01 and 1 : 10.
  • the present invention below mainly uses 3 -bit CU to illustrate the techniques used under the schema presented in the present invention. After reading the present invention, one, of course, could apply the techniques revealed to the 2-bit CU scenario or scenarios of CU of other bit sizes if so desired and further investigate how to improve on it.
  • the present invention is not aimed at exhausting such variations but at revealing the
  • one of the TBNs should have at least 2 bits, such as in the combinations of 0:, 10:, 11 : or 00:, 01 :, 1 or etc. This leaves only 1 bit for the LB As. So using 2 bits for all the TBNs (the option chosen for the TBNs of the 3 LTs here for illustration being 00, 01 and 10) and 1 bit for the LB As for all the 3 LTs, the TLB As of these 3 LTs are then 00:0, 00: 1, 01 :0, 01 : 1, 10:0, and 10: 1; only 6 TLBAs (2 in each of the 3 LTs, i.e.
  • 2.2.2 the notation being used for this having separating dots for the ease of expression
  • 6 TLBAs cannot accommodate 8 CU values.
  • 2.2.2 is an even distribution structure in 1 shape (Lower Case 1), which does not serve the fundamental principle and technique for compression science, i.e. assigning less number of bits to more frequent values and more to less frequent ones.
  • TBN 00: and 10 has only 1 address, i.e. 00: and 01 : respectively, TBN 10: having 4 addresses, namely 10:00, 10:01, 10: 10, and 10: 11.
  • This is a skewed LT structure in L (Upper Case L) shape, having extremes on either side. This is good for digital data of skewed distribution, i.e. more frequent values being assigned to the first and second LT and less frequent values to the third LT for achieving data compression. Assigning the other way means data expansion instead.
  • ⁇ shape LT structure is in the middle between the skewed L shape and the even 1 shape.
  • the above 1, L, and ⁇ shape LT structures serve as options to be chosen for modeling and matching the different kinds of data distribution of the digital information to be compressed.
  • the two uneven LT structures, L and ⁇ shapes have uneven bits for different TBNs and LB As for the LTs, providing different numbers of TLBAs.
  • the 1 shape LT structure has the same number of TLBAs for its 3 LTs inside.
  • the 1 and L shape LT structures provide 6 TLBAs and the ⁇ shape provides 7 TLBAs.
  • Another L shape LT structure in the form of 2.2.4 could be considered, whereby the first LT has 2 TLBAs, the second also 2 and the third 4, TLBAs being 00:0, 00: 1, 01 :0, 01 : 1, 10:00, 10:01, 10: 10 and 10: 11. This could accommodate 8 possible values of 3-bit CU. Adopting this LT structure means using one more bit for the third LT, and sometimes this may not be desirable for the digital information being modeled and matched for making
  • LTS 2.2.4 could accommodate all 8 CU values; for instance, one design like 00:0, 00: 1, 01 :0, 01 : 1, 1 :00, 1 :01, 1 : 10 and 1 : 11; that is the TBN of the third LT becomes 1 : instead of 10:.
  • Address Branching is a technique of saving bit size for TLBAs of a particular table. Because 3-bit LBAs have 8 addresses and there are only 5 values assigned to it, this means wasting space of 3 empty TLBAs. And the first 4 values take up LBAs of 3 bits instead of 2 bits as it could have taken. Address Branching means, in this example, branching the 5 th address to the 4 th address and merging them together so that 2-bit LB As could provide 5 to 7 addresses, step by step. If the number of addresses to be created is up to 8, then 3 -bit LB As have to be used instead of 2 bits with Address Branching as no bit saving could be achieved in this case.
  • the PU is then to be scanned for the TLBA of 10 : 10 and 10 : 11 , for the first 1010 and 1011 in the PU, the first Address Branching Bit (ABB) determines if it is the 5 th and 6 th CU values or the 7 th and the 8 th value. If this ABB is 0, then it is given the value of the 5 th or 6 th value, which is then written down into the output file, 7 th or 8 th if the ABB is 1. If there are more ABBs in the PU, this means in the decoded output of the PU being processed there are more 1010 and 1011 to be decoded in the same way one by one.
  • ABB Address Branching Bit
  • ABB could also be put right after the Address Branched TLBA and decoded in the same way as it is put at the end of a PU. That is when decoding comes to an Address Branched TLBA, D reads in one more bit and then determines if it is the 5 th or 6 th value or the 7 th or 8 th value.
  • PU Processing Unit
  • PU Processing Unit
  • the input digital file has 1,792 bits of information, which could be divided into 8 PUs and each PU consists of 24 bits, assuming the CU is 3 bits and the PU 24 bits. So this input digital file has 8 PUs, each having 8 CUs. Selecting 24 bits for the PU size is for the purpose of making it possible to accommodate all the 8 possible values of a 3 -bit CU, being 000, 001, 010, 011, 100, 101, 110, and 111 on binary scale.
  • each PU could have only 1 value inside or 2 or 3, and up to 8 all possible CU values.
  • the Compressor C scans the above input code and finds out that inside this PU of digital information, there are only 2 values found, i.e. 000 and 001 only. So it writes down the header signature of PU, CU Value (CUV) of this PU as 001, indicating within this PU there are only 2 different CU values and then encodes the 24 bits of input code for this PU according to the logics particular to the PU with 2 different CU values only.
  • CUV CU Value
  • the of the output code of Diagram 4 denotes the compressed code encoded by C.
  • LTS Low-Speed Transmission Stream
  • ABB Address Branched Bits
  • LTS 1.2.5 there are only 1 TLBA in LT1, being 00: for assigning to 1 value, and 2 TLB As, being 01 :0 and 01 : 1 in LT2 for 2 values, and 4 TLB As in LT3, being 10:00, 10:01, 10: 10 and 10: 11 for 5 values with the use of Address Branching, where the 8 th CU value of the least frequency count is assigned to 10: 11 with ABB 1.
  • the 7 th one is given the TLBA 10: 11, with ABB 0 and the 6 th 10: 10, the 5 th 10:01, the 4 th 10:00, the 3 rd 01 : 1, the 2 nd 01 :0 and the 1 st , being the one with the highest frequency count, 00:, given the fewest bit.
  • Clustering Index meaning the tendency of different CU values clustering together or separating from each others for the use of grouping CU values under the same or different Location Tables
  • the statistics of the digital information to be compressed indicates that the 8 CU values are ranked in the following order according to frequency count in descending order, i.e. the most frequently occurring values ranked first in the front: 000, 001, 010, 011, 101, 100, 101 and 111, then the assignment of the 8 CU values to the TLB As of the 3 LTs of the LT structure of 1.2.5 chosen above could be:
  • the value 000 should have a very high frequency in order to compensate for the losses due to the assignment into LT3. Whether there is expansion or compression of the digital input depends on how uneven the data distribution of the digital input is and how skewed it is to the most frequently occurring value 000 in this case.
  • the present invention introduces a technique of Dynamic Tabling, dynamically adjusting the LTS to achieve more compression. This will be further elaborated later.
  • Dynamic Tabling if less than 8 CU values are found in the whole digital input file, for instance, if only 3 CU values are there as mentioned earlier, the 3 CU values could be taken out from LT3 and assigned to TLB As of LT1 and LT2, and there will be much compression.
  • the prioritization of all the CU values resulting from this could serve as the Master Ranking List (MRL) of CU values found.
  • MRL Master Ranking List
  • This MRL could be used for constructing a Master LT Structure if no dynamical adjustment of the LT structure is attempted or implemented.
  • the LT structure, LTS refers to the number of LTs being used (affecting the number of bits used for TBN) and the Table Size (the number of bits used for LB As for holding CU values assigned) of each of the LTs used. There could be a standard LT structure or more than 1 LTS to choose from.
  • the information signifying these standard LT structures and the rules determining how to select and use these LT structures could be embedded in the application or recorded as Header Information of the digital information file.
  • C could record a signature, Data Distribution Signature (DDS), such as 00 for a highly skewed distribution, into the Header Information in the Header Section of the output encoded file for the purpose of decoding afterward, and start to use the LTS 00 most appropriate for it.
  • DDS 11 may be used and the corresponding LTS 11 is adopted, so on and so forth.
  • Using 2 bits for the DDS means it could represent 4 types of data distribution and 4 LT structures. If more is found to be desirable, more bits could be used for DDS; such as when using CU of 8 bits and PU of 2048 bits, more LTs could be used and there are more combinations of LT structures, catering for more data distribution models.
  • LTS 124 may be the best fit to the digital information being processed, but it could not offer all 8 TLBAs for all the 8 CU values. So Address Branching discussed previously could save as a remedy to a certain extent. If the whole digital information input has only 7 CU values, it is all the best. Reality however is very often not on one's side. Obtaining data distribution information or statistics is vital in designing or selecting the best LTS to match the characteristics of the data distribution of the digital information to be processed. Without the aid of such data statistics, one has to use a pre-designed LTS and use Dynamic Adjustment of LTS and Dynamic Assignment of CU values to LTS according to data statistics collected on fly during processing.
  • LT1 provides some bit saving.
  • the second table, LT2 could only break even.
  • LT3 further requires more bits than those required by the original input code, resulting in data expansion.
  • RTS Relative Table Switching
  • RA Relative Addressing
  • Relative Table Switching is the way to make the use of multiple LTs worth to attempt, especially in the present case of using 3 LTs.
  • Diagram 5 is a technique of Absolute Addressing (AA). It translates CU values using Absolute Addresses.
  • LTs are switched using absolute TLBAs.
  • Diagram 6 and 7 illustrates the difference between using Absolute Addressing and using Relative Addressing by way of Relative Table Switching using a LT structure 1.2.4+1 :
  • a PU with all 8 CU values is an extreme case and in Diagram 6 nine more bits are
  • the output code using RTS is 25 bits including 2 ABB bits. Adding 3 bits (111) of PU header signature, it becomes 28 bits. Using RTS rather than AA, only 4 more bits are used instead of 9 more bits, saving 5 bits.
  • the input code is an input code having even distribution of all 8 CU values present. Using LTs may not be the best way of doing compression for this type of data distribution. Later the present invention will present another technique, Rank Code Pair Numbering (RCPN) , of compressing this kind of PU having 8 different CU values or less.
  • RCPN Rank Code Pair Numbering
  • the first value 000 is translated into 00, i.e. the only TLBA (00:) in LT1. Because there is only 1 address for 1 CU value, its LB A is omitted [0] and it is enough to use just its TBN. 0 in [] denotes the omission of 0 in the encoded code.
  • the next two bits 11 refer to switching to the next LT.
  • the first 1 is the special bit calling for table switching.
  • the second 1 means the table to which it is to switch, 1 meaning the next table in the order.
  • the first value 000 is assigned to LT (LT1) having TBN of 00:. So the next LT in the order is 01 : (LT2).
  • LT 01 there are two TLBAs, 01 :0 and 01 : 1. Since the second value 001 is assigned to 01 :0, the output code for the LBA portion becomes 0, indicating the first TLBA of LT2.
  • the third value 010 is assigned to the same LT2, taking the second TLBA, 01 : 1. So table switching is not required. To signal it, a 0 is used. 0 means no switching or staying on the same LT to find the value required. Since the third value is assigned to the TLBA of 01 : 1, output code for LBA, i.e. 1, is used to signify it. Up to here, all the TLBAs of LT1 and LT2 are used up, all other values are assigned to LT3 having TBN as 10:.
  • Bit 1 is used to indicate the action of switching LT for obtaining TLBA
  • a Staying Bit 0 is used to indicate no table switching or remaining in the same LT for finding TLBA.
  • RTS it is also important to keep a variable to register the current TBN (CTBN) of the LT being used for processing in the memory space of C and D. And this current TBN changes with the switching of LT.
  • CTBN current TBN
  • Diagram 8 clarifies how the current TBN variable changes in value with the use RA by way of RTS:
  • the Staying Bit 0 is used instead of 2 bits for the TBN using Absolute Addressing and this helps save 1 bit. So for LT using TBN with longer bit assigned, using RTS and the associated Staying Bit 0 could save more bits.
  • RCPN is the technique used to convert a data pattern of binary bits into binary numbers.
  • Diagram 7 is for PU having 8 different CU values, the PU8 group.
  • Diagram 9 shows a scenario for PU having 7 different CU values, using assumption used previously for Diagram 7:
  • CU value 000 is denoted in two different ways in the above scenario.
  • the first CU value to be encoded in a PU should at least be given 2 bits of TBN code, so that the CTBN could be captured when processing changes from one PU to another PU.
  • the full TLBA should be written down into the encoded output file. So if it has 2 TLB As, either 000 or 001 should be written down instead of just 00 used now when the LT being used has only 1 TLBA, thereby using the TBN for representing the only TLBA to which Value 000 is assigned.
  • [0] in Diagram 9 denotes that 0 could have to be written down but it is omitted and not required to be written.
  • the same [0] for CU value 000 is used at the end of the PU, denoting that [0] should be omitted from the encoded code.
  • the switching code H is itself enough to point to CU value 000 and to represent this value in this example.
  • the number of CU values found in any PU in this case is fixed at 8 counts and does not vary.
  • the processing of the current PU has to stop and the next PU has to be processed.
  • This, the value instance count is a measure of control, signifying when the processing of a PU is to start and where it is to stop and whether it is to continue or not and also where to find the additional ABB bits and make correction for the Address Branched Codes.
  • the ABB however as mentioned before could be placed just right after the relevant Address Branched TLB A, on decoding when D comes to the Address Branched TLB A, it could just look for the following bit to determine how the Address Branched TLB A is to be decoded.
  • Diagram 9 what is changed secondly is the LTS being used, i.e. one without using Address Branching. One should note that this is used for illustration of what takes place if RTS without Address Branching is the case.
  • the LTS changes from 1.2.4+1 with Address Branching as in Diagram 7 to 1.2.4 without Address Branching as in Diagram 9.
  • the number of bits required for the output file for Diagram 9 scenario is 22 bits. A saving of 2 bits is achieved after RTS compression. It should however be noted that it is a rather optimal situation as other combinations of CU values could occur that are not so optimal. For instance, if it has CU value 111 instead of 100, Address Branching is still required. This also illustrates that for PU with number of CU values less than 8, it does not mean that LT structure 1.2.4+1 with Address Branching could be avoided.
  • LT structure 1.2.4+1 with Address Branching still has to be used. Only when the whole digital information input file has only 7 different CU values instead of 8 different CU values, then LT structure 1.2.4 without Address Branching could be adopted.
  • the total number of different CU values found within a digital information file therefore affects very much the data compression ratio that could be achieved. It could be seen that for a PU having 7 different CU values for the scenario of Diagram 9, 24 bits are compressed to 22 bits.
  • Diagram 10 below presents another scenario of PU having 6 different CU values, one of which is 111. So LT structure 1.2.4+1 with Address Branching is to be used:
  • the output code now has 23 bits, including 2 additional ABB bits, versus the input code of 24 bits. Because the 7 th and the 8 th CU values represent the two least frequently occurring values, in most other scenarios with PUs having 6 different CU values, the 2 ABB bits could be further saved.
  • the possible structure of a PU outlined here for implementation is headed by 3 bits representing the number of different CU values (CUV) within the input code of the PU read, the PU Header.
  • CMV CUV
  • Diagram 11 it is 101, indicating only 6 different CU values of the 8 possible CU values are found inside the output code of the PU.
  • a PU having only 1 CU value present could be called PU1, meaning it belonging to the PU group having only 1 CU value, PU having 2 different CU values PU2, PU having 3 CU values PU3, so on and so forth up to PU8 in the example presented here.
  • the input code is encoded by C one CU value at a time, using their TLB As until finishing reading and encoding.
  • Diagram 7 to Diagram 10 illustrates how Relative Addressing by way of RTS is used. If at the end of the whole digital input file, it does not end with a whole PU and for instance only 5 bits are found instead of the 24 bits assigned to the PU in the present example, such 5 bits could be left not processed and appended to the encoded code. Since there is no more incoming bits from the input digital information file, there should be no mistake about where it is the end of processing.
  • the input code is encoded by C one CU value read at a time by writing into the output file its corresponding TLBA until the whole PU is read and is encoded. If Address Branching is required to be used, the corresponding ABB(s) being kept in the computer memory or in a computer storage file is/are written to the end of the PUs or the end of the output file or placed just right after the encoded Address Branched TLB As as determined by the rules of design.
  • D After reading the CUV and knowing how to proceed onwards for decoding the PU, D reads the remaining encoded code, translating the TLB As (or other special bit patterns particular to some PU groups, such as using Data Distribution Alteration technique for PU2) one by one until all the CU values of the PU are decoded.
  • TLB As or other special bit patterns particular to some PU groups, such as using Data Distribution Alteration technique for PU2
  • D scans through the decoded CU values (8 in the present case) previously processed for the PU and finds out the CU values with Address Branching Values associated with them and use the additional ABBs to make the corresponding correction or adjustment. If the ABB(s) is/are placed just right after the ABB TLBA(s), then direct decoding of the TLBA(s) could be implemented when D comes across any Address Branched TLBA on fly
  • each CU value of the PU is encoded or decoded by using its corresponding Absolute TLBA and the corresponding ABBs if Address Branching has to be used.
  • the first CU value to be encoded by C is represented by writing its corresponding Absolute TBN, its LBA portion could be omitted if there is only 1 value present in the LT.
  • the remaining CU values of the PU has to be translated using Relative Addressing by the way of RTS. If sub-division of PUs is not used, then the whole digital data input is taken to be a PU and CUV is necessary to be used, instead a Header Section has to be added to the Compressed Code File as will be elaborated later.
  • Bit 1 has to be written to signify the act of RTS, i.e. switching the Location Table. And Bit 1 is then followed by a bit 0 or bit 1 depending on the TBN of the LT next in the order or previous in the order. If the LT to switch to has only 1 LBA, then there is no need to write down its LBA. If more than 1 LBA, the corresponding LBA should then be written. If the CU value to be encoded stays inside the current LT, there is no need to switch to another LT for the TLBA for it. A Bit 0 is then written to signify staying in the same LT. If the LT has only 1 LBA, then there is no need to write down its LBA. As mentioned in Paragraph [78], a register for the CTBN has to be kept and updated where appropriate for this purpose.
  • the corresponding bit or bits for the LB A (how many bits of LB A here to be read being determined by the bit size assigned to that particular LT, i.e. the Table Size of the LT) corresponding to the first CU value has to be read in order to recover the first CU value.
  • the other TLB As following in the sequence are encoded in terms of Relative Addressing and so should be decoded using Relative Address Translation technique. So to translate the second CU value, D reads the next bit in the bit chain. If it is 0, it means staying in the same LT.
  • the LT has only one TLBA, there is no need to look for its LBA if it is designed to be omitted. D then examines the CTBN register to retrieve the current TBN and find out which CU value has been assigned to it and write out the corresponding CU value to the decoded output. If the LT has more than one TLBA, D has to read in the number of bit(s) corresponding to the Table Size of the LT and get the whole TLBA in the form of TBN:LB A, and translate it into the corresponding CU value. After processing the second value, D continues to read another bit, if this bit is Bit 1, that means it has to switch to another LT for TLBA for the next upcoming value for decoding.
  • D then reads next bit to know whether it is the previous LT or the next LT that is to be used. Bit 0 means the previous LT and Bit 1 the next LT in order. D then consults the CTBN register and then finds out the relevant LT with the correct TBN. D then reads the LBA portion from the encoded code to find out the original code for it and then writes out the original code into the decoded output code. D then updates the CTBN as it has moved to a new LT after the decoding of the second CU value. D then continues to process the third CU value in the same manner.
  • the decoding process by D could then continue to the next PU and then until the end of the whole digital information input code. It knows when to end the processing of the current PU by counting the number of values that have been decoded for the current PU. In the present case, after 8 CU values are decoded, D then moves forward to decode another PU, beginning by reading the CUV again to determine which PU group is to process and what rules and logics of the processing are to observe.
  • a LT structure has to be decided upon and set up with CU values assigned to the TLB As of the LTs of the LT structure. Since LT structure has to be decided upon and set up according to the modeling and matching prepared in the DP phase either by A and passed to C by A or done by C directly. This is the static way of determining and setting up the LT structure by going through the DP phase for collecting the data distribution statistics.
  • the dynamic way of making adaptive LT structure is to set up the LT structure dynamically and adaptively while processing and encoding the digital information file, such as reading the first PU or the first 256 PUs or a certain another number of PUs, where appropriate, of the digital information file and analyzing its data distribution and then prioritizing and assigning the CU values found to a LT structure most appropriate to it. C then uses this as the initial LT structure for use. At the same time, C also keeps a register of the frequency counts of each of the CU values so far processed.
  • C updates the LT structure with the corresponding assigned values and begins processing the next PU or at any juncture using the newly updated LT structure. So such rules of when to make the change and updating have to be embedded in C and D and should be followed consistently. Or changes to the LTS and the CU values re-assignment could be made after every value encoding.
  • C knows what LTS and the assignment of CU values to the TLB As are and it keeps frequency counts of all the CU values having been processed, it could re-assign the CU values to the TLB As of the LTs whenever the ranking of the CU values processed has changed.
  • C could even use a new LTS, such as changing from LTS 1.2.4+1 (i.e. 1.2.4+1 ABB) to LTS 2.2.4 when the frequency counts indicate the CU values are more evenly distribution than before.
  • Rules about when to make the change have to be determined beforehand and to be implemented and applied by C and D consistently. For instance, when the 1 st ranking CU value comes within the 10% margin of the 2 nd ranking value, then LTS 1.2.5 could be changed to LTS 2.2.4 or vice versa.
  • the Input Code shows PUl has only 1 CU value and the CU Value is 111, the PU having 8 counts of 111. So the encoded Output Code for it begins with a CUV of 000, indicating this PU has only 1 CU value found throughout the whole PU. Since there is only 1 CU value of 111, therefore just writing down one 111 into the Output Code file is enough.
  • the other 111 input codes could be omitted; i.e. by inference, the other 111 values are assumed as being omitted.
  • This encoding by C does not use the LT structure 124+1. Because as explained previously, the first encoded CU value has to be in the form of an Absolute TLB A.
  • Diagram 12 selects not using the LT structure for PUl because of simplicity and without having the chance of using 2 more bits as explained above. This also illustrates that each of the PU groups, i.e. PUl, PU2, so on and so forth up to PU8, may have its own set of rules for C and D to process according to the design and the need for adjustment to the data distribution peculiar to the PU group being processed.
  • the current PU to be processed reads the first encoded CU value, in this case the CU value written down directly, i.e. 111. It then writes down 8 counts of 111 into the decoded output file by using inference and according to the logics and rules programmed into D for the PUl group. So 24 bits are compressed into 6 bits, a saving of 18 bits.
  • Diagram 13 illustrates Absolute Addressing requires 40 bits for the 24 bits of the Input Code and Relative Addressing 33 bits for 24 bits.
  • the data distribution is skewed to the 2 least frequently occurring CU values for the whole digital data input. It is the worst situation for using the LT structure 1.2.4+1 designed for a data distribution of the opposite skewness. It is no wonder that data expansion instead of data compression is achieved. However it shows that using Relative Addressing does have an advantage over using Absolute Addressing here.
  • the first 3 bits of the Output Code is as always, indicating the CUV and its also tells the C and D which rule set and logics of processing to follow.
  • the logics of processing presented here is just an example, if one is novel in another way, another set of processing logics could be designed and used for processing PUs of PU2 group.
  • the output value is 111.
  • This 111 represents PU2 is made into PUl having all values in the form of 111.
  • the next output code 110 means it is to alter 110 of the Input Code of the current PU into 111 and there are 3 values of 110 to alter; it is represented by the next output code 10. Since at most 4 possible values of a PU2 have to be altered to make it a PUl, only 2 bits are required for this output code.
  • the next three values in the Output Code are their respective position in the Input Code, i.e. 000 for the first position, 011 for the 4 th position and 101 for the 6 th position.
  • D uses the above explained rules and set of logics to decode the encoded Output Code and recover the original Input Code. So D reads the CUV and knows that it is PU2 and knows that it is to alter PU2 into PUl first according to the set of rules and logics built in for PU2 and reads the next value 111, and writes all 8 counts of 111, and then reads the next value 110, knowing that it is 110 to restore and reads the next value of
  • the number of bits used altogether for producing the Output Code is 20. If there are 4 values, the Output Code requires 23 bits for the 24 bits of the Input Code. So given the worst scenario it still makes 1 bit saving for compression.
  • C encodes the input code by using the standard LTS.
  • the first value is therefore the CUV
  • the second value is 1011
  • the following bit 0 means no switching of LT for finding the next value
  • the current LT is LT3 having 4 TLB As, so staying in the same table means reading only 2 bits for the next value as TBN of LT3 is not required to be written down.
  • the next 2 bits is 11 and it represents 1011 again.
  • bit 0 in the following means staying, so representing using the same value, i.e.
  • the 2 nd CU value 111 as the next value; and bit 1 using the previous value, i.e. the 1 st CU value 110, as the next value and update the current CU value registers, i.e. the current CU value register and the previous CU value register, accordingly.
  • the decoded Output code becomes 8 counts of 1011 with two ABBs of 0 and 1. So to make adjustment for the 8 counts, D goes back to the encoded code again and then it finds the first 1011 should be 110 because the first bit of ABB is 0, meaning the first or original assigned value to TLBA 10: 11. And the next bit 0 means staying within the same LT to find the next value and the corresponding TLBA is again 11, meaning 10: 11 but with an Address Branched Bit 1, so the second TLBA is to be interpreted as 111 instead of 110. After the 2 different CU values are found and adjusted, D could easily decode the remaining values of 1011 using the meaning given to Bit 0 and Bit 1, i.e.
  • the 2 nd Output Code in Diagram 16 shows where the ABBs are placed after the encoded Address Branched CU values according to another design.
  • the first Bit 0 after reading the first encoded CU value, 1011, is the ABB; i.e. after the encoded Address Branched CU value, 1 more bit should be read to distinguish which original CU value it represents.
  • the first Bit 0 means staying with the same LT. This use and meaning of Bit 0 is not to be changed and used the same way until another CU value is found.
  • the meaning of 0 and 1 could be changed to mean using the same value as the current value or changing into the previous value and use it.
  • This change of meaning of 0 and 1 represents the act of using Adaptive LT Structure.
  • the decoded values in the preceding decoded output code could serve as the data set required for building up the next LT structure.
  • the LT structure of 124+ could be dispensed with and D, according to the preferred set of rules and logics of using RTS and RA for PUs of PU2, could then rely on the 2 decoded CU values written out previously as a reference making a new LT structure with 2 LTs only, each having only 1 TLBA.
  • 1 TLBA as explained previously does not require bit assignment to the LBA portion. This means the act of staying or switching, i.e. by using the special switching bit, Bit 0 or Bit 1, could refer to the only TLBA of the 2 LTs, and there is no need to refer to the TBNs or the TLB As in order to save more bits.
  • Diagram 17 Another set of rules and logics of using RTS and RA could be illustrated in the following Diagram 17: Diagram 17
  • Another set of rules and logics could be like that, since there should be only 2 different CU values in the PU.
  • a 3-bit CU could be used to represent 8 different bit values (or bit patterns) whether repeating or not. They could be of the same value or pattern or could all be different from one another, the former case is PUl and the latter case PU8 and there are other six intermediate cases from PU2 to PU7.
  • RTS and RA in this case does not help much in making compression of PU8 group of data, instead expansion of data is the result. This is because the data distribution of PU8 group is extremely even, involving frequent switch of LTs, thus unable to take advantage of RTS and RA. So another technique is revealed here to complement the techniques and method introduced above. This technique is called Rank Code Pair Numbering (RCPN). To use Rank Code Pair Numbering for making data compression, it involves change of the code ranks into numbers, i.e. using numbers to represent the ranking of the code values. To illustrate how it does the work and the mechanism involved, one could continue with the example of Diagram 6 and Diagram 7 for encoding the bit pattern of a PU of PU8.
  • RCPN Rank Code Pair Numbering
  • the PU of this PU8 has all 8 different CU values or bit patterns, from 000, 001, 010, 011, 100, 101, 110 and 111 in descending order of frequency counts.
  • Using RTS and RA expands the data.
  • RCPN makes very significant reduction of the number of bits used for encoding the data patterns.
  • Rank Codes here could either refer to the ranking according to frequency count or to natural ranking; natural ranking refers to the binary number as represented by the CU.
  • a CU could have 8 binary numbers, and on decimal scale from 1 to 8. So in Diagram 19, Decimal 1 is given the highest natural rank and Decimal 6 the lowest natural rank.
  • the RCPN technique revealed in the present invention applies to PUs having consecutive different CU values one after another; it does not apply to PUs having repeating CU values. So strictly speaking, it applies only to PU8. However, one variant of the other PU groups could also be qualified for using this techique; i.e. when all the different CU values of the PU come one after another consecutively without repeating values in- between.
  • the first two CU values are written directly as they are into the Output Code. Because for a PU8, for the first two positions, there are 56 permutations for the two CU values out of the 8 possible CU values, it requires 6 bits to represent; so there is no need to convert the first Rank Code Pair (RCP) into a binary number because the binary number so converted still requires 6 bits, thus no saving at all. But the case is different for the 2 nd RCP, the 3 rd and the 4 th in a PU8.
  • RCP Rank Code Pair
  • the possible permutations are 30 only. So this requires only a 5-bit binary number to represent the 2 nd RCP; for the 3 rd RCP, two out of four and 4-bit binary number; and for the 4 th RCP, two out of two and 1-bit binary number.
  • To convert the 2 nd RCP one uses the RCPNT6 in Diagram 20.
  • To convert the 3 rd RCP one refers to the RCPNT4 in Diagram 21; the remaining CU values are the 5 th , 6 th , 7 th and 8 th in the ranking.
  • RCPNT4 is numbered using 1 st , 2 nd , 3 rd and 4 th in the ranking. So to obtain the RCPN for the 5 th and 6 th CU values, one has to substitute the highest rank code, the 5 th one, in the Input Code as the 1 st one in the RCPNT4. Re-ranking the remaining CU values makes the 5 th become the 1 st , the 6 th the 2 nd , the 7 th the 3 rd and the 8 th the 4 th . The 4 th RCP is converted likewise in the same manner by re-ranking the remaining CU values after taking out the 3 rd RCP CU values and by using RCPNT2 in Diagram 22.
  • Diagram 18 shows the number of permutations (with repetitions except PU8) for all the 8 PU groups discussed in the present invention. For instance, it shows PU7 has 1,128,960 permutations. But not all these permutations of PU7 could use the technique of RCPN to compress because only one scenario of these PU7 permutations is suitable for using RCPN for this purpose, i.e. the scenario where all of the 7 different unique CU values come one after another consecutively without intervening repeating CU value. This is so for other PU groups, such as PU6, PU5 and PU4. PU1 has no RCP and PU2 and PU3 have only 1 RCP that even when converted gives no bit saving.
  • PU8 or C8 has been described in Paragraph [110] to [112].
  • the bit saving is 5 bits, C6 3 bits, C5 2 bits and C4 1 bit.
  • the case of C5 has some special points to note. It requires 2 bits for its last code. Because after 4 CU values appear, only 4 remaining CU values are yet to come, so 1 out of the 4 remaining values could be represented by using 2 bits. Furthermore in the encoding process, one could also try to take in one more value, adding it to the chain of C4 and C5 for saving more bits for compression.
  • Diagram 25 below tables the number of bits saving using this RCPN technique: Diagram 25
  • TLB A Table Location Binary Address
  • LB A being used for the assignment or accommodation of CU values for use and therefore as Location Reference for identifying the CU values being assigned to, for instance 0 bit 1 LBA (using the respective TBN: as the TLB A), 1 bit 2 LB As, 2 bits 4 LB As, 3 bit 8 LB As, so on and so forth; the use of the technique of Address Branching giving rise to assignment of Address Branched Bits (ABBs) to the LBA of the CU value to which another value is assigned;
  • ABBs Address Branched Bits
  • CU being the basic unit for the process of encoding and decoding; the size of the CU determining the number of CU values that a CU could represent, 1 bit 2 values, 2 bits 4 values, 3 bit 8 values, so on and so forth; CU values being the values appearing in a CU;
  • PU consisting of a certain number of CUs, being the basic unit for processing CUs, a higher level unit of processing;
  • LTS As shown below in Diagram 27, showing that a LTS 4,2,2 with 3 LTs is used, where LT1 uses 2 bit LB A, LT2 and LT3 1 bit LB A.
  • Group 1 (4 Styles 100, 101, 110, 111)
  • Group 2 (2 Styles 010,011)
  • Diagram 28, 30 and 31 all indicate that the bit usage is the same as the original input code without gain nor loss in bit.
  • Diagram 32 shows a new LTS 2.2.2.2 with its Bit Usage in Diagram 33 :
  • Diagram 34 is another design of LTS 2.2.2.2, having sub-TBN structure:
  • Diagram 34 is just another form of LTS expression with sub-LT structure. Its effect is just like Diagram 32, resulting in data expansion because of there being higher chance of switching to another LT than staying to the same LT.
  • Diagram 35 just like in Diagram 33, the TBN portion including the Sub-TBN portion counts to 2 bits and the LBA portion remains 1 bit. So the use of sub-LT to a main LT is another variant of the LTS being revealed in the present invention.
  • Diagram 34 there are two main LTs and each of which are further divided into two sub-LTs. The number of LTs are then 4, the same as the LTS in Diagram 32.
  • Diagram 34 therefore shows that there could be a hierarchy of Location Tables, i.e. main LTs with sub-LTs; and further levels or layers of sub-LTs could be added where appropriate for the purpose of classification of data.
  • the TBN portion of the TLB A should include the sub-LT TBN, and as such the TBN of Group 0a in Diagram 35 becomes 1 : 1 :, and Group 0b 1 :0:, Group la 0: 1 : and Group lb 0:0:. If additional levels or layers of sub-LTs are found, the TBN portion should include them likewise.
  • C When writing the TBN portion into the encoded code, C as before omits the : notation and writes directly 11 for Group 0a, 10 Group 0b, 01 Group la and 00 Group lb. This is to be followed by the LBA portion as discussed before.
  • D reads and interprets TLB A from the encoded code for decoding in the same manner as C does write and encode.
  • Diagram 29 the use of RTS and RA works just like AA as indicated in Diagram 30 and 31 and also indicated in Diagram 28. While if the binary numbers are grouped according to Diagram 32 or 34, the use of RTS and RA combined with AA only results in data expansion. So the use of RTS and RA as well as AA could only at best produce an encoded code with no loss or no gain in bits if the binary bits are truly random and even.
  • the advantages effects of using LTS with RTS and RA (combined with AA) to using AA only is that it provides higher chance of making compression when the data distribution is not truly random and even.
  • Switching Bit of RA could result in the same bit usage as that given by using AA. And the use of Staying Bit of RA could provide bit saving that could not provided by using only AA.
  • Multiple Location Table Structure also provides a very flexible structure that could be used to model and match the data distribution of the digital information so that the best compression rate could be approached and approximated. It allows also for the selection of different Compression Units (CU) and Processing Units (PU) for encoding and decoding purpose, providing chances of compressing digital information on different scales, such as on 3 bits ternary scale or 8 bits octary scale or 16 bits hexadecimal scale, etc.
  • CU Compression Units
  • PU Processing Units
  • Diagram 36 uses 3 bits as the CU and 24 bits as the PU and Diagram 37 8 bits the CU and 2048 bits the PU. That means the 3 -bit CU could have 8 variations and could be anyone of the 8 CU values and 24-bit PU could host 8 counts of 3 -bit CU values; and 8- bit CU could have 256 variations and could be anyone of the 256 CU values and 2048-bit PU could host 256 counts of 8-bit CU values. In processing and analyzing the first four paragraphs of the present invention, it is found that it could be turned into 1397 units of 24-bit PU and 17 units of 2048-bit PU.
  • the use of Staying Bit 0 uses up one bit, the number of bits thus saved is the bit size of the TBN minus 1. So if the TBN bit size is 3 bits, then 2 bits are saved.
  • the use of RTS and RA makes switching table using the same number of bits used as the bit usage of using Absolute Addressing with suitable LTS.
  • the TBN for each LT should have 3 bits because 2 bits could only represent 4 LTs. So using RTS and RA, switching to another LT may have to write the Absolute TBN after the Switching Bit 1, thus resulting in a loss of one bit. However one could use another extended variant of RTS and RA. Instead of just expressing the switching to the previous LT and the next LT, one could express the switching by writing the Switching Bit 1 first and then use the concept of previous pair and next pair of LTs instead of the concept of previous one and next one LT.
  • Paragraph [143] has shown how RTS and RA is used for a LTS with 5 LTs. If 6 LTs are used, using In-Table Switching and Out-Table Switching may help to retain using RTS and RA as if there are only 5 LTs are involved, thereby the steps for using 1 currect LT and 2 previous LTs and 2 next LTs could still be used without incurring additional bits used for TBN to refer to the 6 th additional LT. This is especially useful for a group or groups of CU values which seldom occur and which more or less occur in a cluster; i.e. if one CU value occurs, other CU values in the group are also likely to occur one after another.
  • Switching Bit 1 and then the relative TBN to LT5 for example, if LT5 is the next second LT, C writes Switching Bit 1 and 11 for the next second LT, and then writes 11, the reserved LBA for LT6, then LT6 is called in for looking up the next CU value. If the CU value is assigned to LBA 10 in LT6, then C writes 10 to represent the CU value. So the series of bits to be written for this next CU value in LT6 by C is 11110. This represents Out- Table Switching; i.e.
  • RTS and RA One essential characteristic of RTS and RA is the use of Switching Bit and Staying Bit to refer to the targeted LT for the next CU value.
  • the core LTs could be limited to 4 and peripheral LTs 2, thus RTS and RA could be used for 4 core LTs with 1 peripheral LT in a series of 1 current LT, 2 LTs in the next order and 2 LTs in the previous order; the peripheral LT being paired with another peripheral LT to be In-Table switched or Out- Table switched, each of the peripheral LTs using one special TLBA for In-Table and Out- Table Switching.
  • the two peripheral LTs could each, using 4 bit LB A, accommodate 15 LB As for the 31 least frequently occurring CU values being assigned to this pair of peripheral LTs (the two least frequently occurring CU values being Address Branched CU values); the other four core LTs share the remaining 23 CU values.
  • the first ranking CU value 0 has 2256 counts and the second CU value 32 has 310 counts and the rest 21 CU values ranges from 206 counts of the third CU value 101, to 17 counts of the 23 th CU value
  • LT1 with 0 bit LB A, being able to accommodate only the first ranking CU value
  • LT2 takes the next 2 CU values using 1 bit LB A
  • LT3 takes the next 4 CU values using 2 bit LB A
  • LT4 takes the next 16 CU values using 4 bit LB A.
  • a master list of the 54 CU values ranked according to frequency counts in descending or ascending order could be recorded in the Header Section as well as the incremental changes of CU values for each of the units after the first unit.
  • C could manage the swapping in and swapping out of CU values after each unit is read before doing the encoding for each unit.
  • C could update the LTS and the associated change of CU values assignment, which reflects the incremental changes of the CU values being swapped-in and swapped- out.
  • the above suggestion is still more an art than a science. However, it sheds some light on how the scientific study and operation could be conducted further, such as what statistics are to be collected and how LTs are designed and how CU values are assigned to.
  • the present invention therefore reveals the schema, the method and the techniques for the implementing the compression and decompression process as well as the foundation of applying the computing power nowadays for automating the process of modeling and matching between the schema and method used and the data distribution of the digital information to be processed.
  • each of the possible combinations of the LTS could be assessed using the statistics so collected about the data distribution of the digital information to be processed, the best LTS or the LTS which could give the most compression saving could be determined.
  • the art of compression then becomes a subject of scientific endeavor with modeling and matching being done by computers these days. This modeling and matching for the determination of the best-fit LTS could be done by A and passed to C or by C alone altogether.
  • D relies on the encoded code made by C to decode and recover the original code.
  • the encoded code made by C has to use the necessary information about the size of CU, the size of PU if PU is used.
  • Such necessary information also includes the LTS chosen to be used, including the number of LTs used and the associated sub-LTs used if any, the number of bits assigned to TBN and LBA for each LT and sub-LT used and the respective TBNs of the LTs and sub-LTs, as well as the assignment pattern of the CU values to the LTS, including how the CU values are grouped together and assigned to which particular LTs under the LTS chosen, together with the techniques, such as RTS and RA or AA or any special processing indicators, such as in the form of specially assigned TLB As for such purposes, to be used with the LTS chosen.
  • the assignment of CU values to the LTS reflects the frequency ranking order and the clustering characteristics of each of the CU value assigned in accordance with the statistics collected in the DP stage or with a preselected statistics model where no such data statistics of the digital information to be processed is available and dynamic adjustment to this pre-selected LTS has to be made later according to the result of data distribution analysis conducted on fly where appropriate.
  • Such necessary information stated above could be written as the Header Information in the Header Section of the whole Compressed Code File (CCF) to be used by D for decoding the encoded code in the Encoded Code Section following the Header Section.
  • CCF Compressed Code File
  • the TLBA addresses at least should be no less than 54, assuming a static LTS is used, and the number of LTs should be no more than 54, i.e. from 1 to 54.
  • the number of bits used by TBN of each LT including the Sub-TBN, from 0 bit, representing only 1 LT used onwards to say 6 bits, which could represent 64 LTs, enough for the 54 CU values found, each with 0 bit LBA.
  • bit number of LBA associated with each of the LT of all the combinations of the LTs used could also be listed out too from 0 bit LBA to 6 bit LBA. Of course, the listing could be enormous, but the computing today should be more than enough for this. Also Bit Usage for each of the combinations of the LTS and the assignment pattern associated with them could also be computed; for instance, such bit usage result could be computed for LTS with 1 LT to 54 LTs and the best-fit LTS could be obtained for C to do the encoding. Or C could just do the actual encoding using all the combinations of the LTSs one by one and find out which one gives the best compressing saving.
  • the CCF contains a Header Section and the Encoded Code Section and the Remaining Bits Section (RBS, containing the bits not processed because these bits do not make up 1 CU or PU as explained in Paragraph [156] below).
  • the Encoded Code Section contains the encoded code, i.e. the TLB As and other special indicators produced by using the aforesaid compression techniques with the chosen LTS assigned with CU values.
  • Such special indicators are special TLBAs assigned for special processing such as Terminating Signature, Special Processing for long sequence of consecutive CU value patterns, such as different CU values in a long chain or same CU value in a long row, or for In-Table and Out- Table Switching, or some CU values used in DDA or some bits representing ABBs as well as the CUV when PUs are classified and used as discussed previously.
  • the Header Section contains the Header Information as mentioned in Paragraph [152]. It could also contain a Remaining Bits Unit (RBU) used for indicating how many bits are left without encoding at the end of the whole CCF in the Remaining Bits Section because if the CU is 3-bit and the whole Input Code File does not contain whole units of 3 bits, then there may be 1 or 2 bits that could not be processed. So the RBU is used to indicate how many bits are left not processed. This RBU could be just as the size of a CU if PU is not used. If PU is used, RBU should be the size of a PU. If the CCF goes on to be compressed for a second round or a third round, another piece of Header Information and RBU is added.
  • RBU Remaining Bits Unit
  • This another piece of Header Information and RBU is based on the bit usage and the data distribution of the CCF of the previous round of compression.
  • a checksum of the Header Information and RBU could be calculated and added ahead in the Header Section to distinguish a CCF from an ordinary digital input code file. The checksum is not necessary if there is another way that surely indicates the input file is a CCF or not a CCF, such as using File Extension Name for such distinction.
  • the Header Section also could be put into a separate file, pairing with the CCF without Header Section.
  • the pair of Header Section File and the Encoded CCF could be taken as a whole as a CCF.
  • the Header Section when merged with the CCF should have a signature indicating how long the Header Section is or where the Encoded Section begins.
  • another Compression Bit could also be used and included in the Header Section or as part of Header Information, indicating whether the CCF has undergone just one round of compression or more than one round of compression and this lets D know when to stop the recursive decoding.
  • the original input code file could be processed as a whole without sub-division into classified PUs or it could be processed using PUs classified as described above or in other ways as the designer considers appropriate.
  • PU2 PU7 and PU8
  • other PU3, PU4, PU5 and PU6 could be processed as one PU groups using a master LTS with 8 CU values assigned according to their ranking of frequency counts and clustering indices of each of the 8 CU values and using RTS and RA or AA or their combination as discussed previously. Which way is the best therefore depends actually on the actual data distribution of the digital information to be processed or being processed.
  • the CUV then can be re-numbered as Bit 00 (for PU8), 01 (for PUl) and 11 (for PU6), and used as the TBN for the 3 PU groups, each having 0 bit of LB A.
  • the Header Information therefore could be just used for and based on the data distribution of PU6.
  • the master LTS could be just used for PU6; i.e. when C detects that the coming unit is one of PU6, i.e.
  • TBN signature of Bit 11 uses the master LTS as a sub-LTS to the main LTS 1.1.1 used on a higher level.
  • C knows when to stop encoding for the current PU and goes on to find out the next PU belonging to which PU group and write the TBN signature of the next PU to the encoded code.
  • D does the decoding likewise.
  • Fixing the Header Section including adjusting the provision of space for
  • Fixing the Header Section including adjusting the provision of space for
  • a Location Table Structure used for making compression of digital data characterized by:
  • TBN A Table Location Binary Address
  • TLBA of (b) above being assigned with values of a Compression Unit (CU) used in compression encoding and decompression decoding where the CU values are represented in the form of binary bit(s) of Bit 0 or Bit 1.
  • CU Compression Unit
  • a method comprising the step of using the technique of Relative Table Switching and Relative Addressing for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using the technique of Absolute Table Switching and Absolute Addressing for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using the technique of Relative Table Switching and Relative Addressing combined with Absolute Addressing for compressing or
  • a method comprising the step of using the technique of Address Branching for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using the technique of In-Table Switching and Out- Table Switching for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using the technique of Dynamic Adaptive Location Table Restructuring for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using the technique of Data Distribution Alteration for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using the technique of Rank Code Pair Numbering for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Compression Unit for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Processing Unit(s) for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Compression Unit Value Signature for grouping Processing Unit(s) for compressing or decompressing digital data using
  • TLB A as special signature for special processing required for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Header Section containing Header
  • a method comprising the step of using Header Section containing Remaining Bits Unit as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Header Section containing Compression Bit as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Analyzer for analyzing and collecting statistics about the data distribution of the digital information to be processed or being processed for use in compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Compressor for analyzing and collecting statistics about the data distribution of the digital information to be processed or being processed for use in compressing or decompressing digital data using Location Table Structure.
  • a method comprising the step of using Compressor for compressing digital data using Location Table Structure.
  • (21) A method comprising the step of using Decompressor for decompressing digital data using Location Table Structure.
  • the prior art for the implementation of this invention includes computer languages and compilers for making executable code and operating systems as well as the related knowledge for making application or programs; the hardware of any device(s), whether networked or standalone, including computer system(s) or computer- controlled device(s) or operating- system-controlled device(s) or system(s), capable of running executable code; and computer-executable or operating-system-executable instructions or programs that help perform the steps for the method of this invention.
  • this invention makes possible the implementation of a Location Table Structure for the compression of digital information, including digital data and digital executable codes; and in this relation, is characterized by the following claims:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method, a schema and techniques using Location Table Structure for compressing and decompressing digital information, the technique of Relative Table Switching and Relative Addressing especially proven to have intrinsic value in making data compression and decompression in addition to capitalizing on the uneven nature of the data distribution of the digital information under processing.

Description

Description
COMPRESSION CODE AND METHOD BY LOCATION TABLE STRUCTURE
Technical Field
[1] This invention relates to compression for the use and protection of intellectual property, expressed in the form of digital information, including digital data as well executable code for use in device(s), including computer system(s) or computer-controlled device(s) or operating-system-controlled device(s) or system(s) that is/are capable of running executable code or using digital data. Such device(s) is/are mentioned hereafter as Device(s).
[2] In particular, this invention relates to the method and schema as well as its application in processing, distribution and use in Device(s) of digital information, including digital data as well as executable code, such as boot code, programs, applications, device drivers, or a collection of such executables constituting an operating system in the form of executable code embedded or stored into hardware, such as embedded or stored in all types of storage medium, including read-only or rewritable or volatile or nonvolatile storage medium (referred hereafter as the Storage Medium) such as physical memory or internal DRAM (Dynamic Random Access Memory) or hard disk or solid state flash disk or ROM (Read Only Memory), or read-only or rewritable
CD/DVD/HD-DVD/Blu-Ray DVD or hardware chip or chipset etc.
[3] In essence, this invention reveals the method and schema as well as its application that could be used to make compression of digital information. In this relation, it makes possible the processing, distribution and use of digital information in Device(s) connected over local clouds or internet clouds for the purpose of using and protecting intellectual property. As with the use of other compression methods, without proper decompression using the corresponding methods, the compressed code could also considered an encrypted code as well.
[4] However, the method and the schema as well as its application revealed in this
invention is not limited to delivery or exchange of digital information over clouds, i.e. local area network or internet, but could be used in other modes of delivery or exchange of information.
Background Art
[5] There are many methods and algorithms published for compressing digital information and introduction to commonly used data compression methods and algorithms could be found at http : //en .wikipedia. org/wiki/Data compre ssion . The present invention describes a novel method of making lossless data compression. Relevant part of the aforesaid wiki on lossless compression is reproduced here for easy reference:
"Lossless data compression algorithms usually exploit statistical redundancy to represent data more concisely without losing information, so that the process is reversible. Lossless compression is possible because most real-world data has statistical redundancy. For example, an image may have areas of colour that do not change over several pixels; instead of coding "red pixel, red pixel, ..." the data may be encoded as "279 red pixels". This is a basic example of run-length encoding; there are many schemes to reduce file size by eliminating redundancy.
The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. [6] DEFLATE is a variation on LZ optimized for decompression speed and compression ratio, but compression can be slow. DEFLATE is used in PKZIP, Gzip and PNG. LZW (Lempel-Ziv-Welch) is used in GIF images. Also noteworthy is the LZR (Lempel-Ziv-Renau) algorithm, which serves as the basis for the Zip method. LZ methods use a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ-based coding scheme that performs well is LZX, used in Microsoft's CAB format.
The best modern lossless compressors use probabilistic models, such as prediction by partial matching. The Burrows-Wheeler transform can also be viewed as an indirect form of statistical modelling. [7]
The class of grammar-based codes are gaining popularity because they can compress highly repetitive text, extremely effectively, for instance, biological data collection of same or related species, huge versioned document collection, internet archives, etc. The basic task of grammar-based codes is constructing a context-free grammar deriving a single string. Sequitur and Re-Pair are practical grammar compression algorithms for which public codes are available.
In a further refinement of these techniques, statistical predictions can be coupled to an algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bi-level image compression standard JBIG, and the document compression standard Dj Vu. The text entry system Dasher is an inverse arithmetic coder. [8]"
[6] In the aforesaid wiki, it says that "LZ methods use a table-based compression model where table entries are substituted for repeated strings of data". The use of table for translation, encryption, compression and expansion is common but how the use of table for such purposes are various and could be novel in one way or the other.
[7] The present invention presents another attempt of using table(s) for such purposes and yet in another novel implementation not previously revealed. This could be seen in how the technical problems described in the following section are being approached and solved. Disclosure of Invention
Technical Problem
[8] The technical problem presented in the challenge of lossless data compression using
table-based methods is how longer entries of digital data code could be represented in shorter entries of code and yet could be recoverable. While shorter entries could be used for substituting longer data entries, it seems inevitable that some other information, in digital form, has to be added in order to make it possible or tell how it is to recover the original longer entries from the shortened entries. If too much such digital information has to be added, it makes the compression efforts futile and sometimes, the result is expansion rather than compression.
[9] The way of storing such additional information presents another challenge to the
compression process. If the additional information for one or more entries of the digital information is stored interspersed with the compressed data entries, how to differentiate the additional information from the original entries of the digital information is a problem and the separation of the compressed entries of the digital information during recovery presents another challenge, especially where the original entries of the digital information are to be compressed into different lengths and the additional information may also vary in length accordingly.
[10] This is especially problematic if the additional information and the compressed digital entries are to be recoverable after re-compression again and again. More often than not, compressed data could not be re-compressed and even if re-compression is attempted, not much gain could be obtained and very often the result is an expansion rather than compression.
[11] Inherent to the table-based compression methods and algorithms, how the reference table of substituting entries of codes and the substituted entries of codes are organized also affects the result of compression. The way such organization is implemented represents a schema which affects how coding and decoding is to be proceeded and also the compression rate. The schema presented in the present invention provides variants, making possible for digital information to be compressed in various way, including whether reference table(s) are to be embedded or not as well as whether compression is to be implemented in an adaptive or static way or whether the digital information is to be compressed in unit by unit or without unit separation. This will be evident in the description in the following section.
[12] The digital information to be compressed also varies in nature; some are text files, others are graphic, music, audio or video files, etc. Text files usually have to be compressed losslessly, otherwise its content becomes lost or scrambled and unrecognizable.
[13] And some text files are ASCII based while others UNICODE based. Text files of
different languages also have different characteristics as expressed in the frequency and combination of the digital codes used for representation. This means a schema and method which has little adaptive power could not work best for all such scenarios.
Providing a more adaptive and flexible schema and method for data compression is therefore a challenge. Technical Solution
[14] To provide a more adaptive and flexible schema and method for lossless compression that suits to digital information of different types and of different language characteristics, the content of the digital information to be compressed has to be known so that its essential features have to be captured for adjusting the way the schema and method is used in the compression processes and the schema and method of the compression have to allow for such adjustment.
[15] Assuming two application programs are designed to do the work of compression and decompression, the first one Compressor (C), the second Decompressor (D). In order to learn the essential features and characteristics of the content of the digital information to be compressed, a step of Data Parsing (DP) has to be done so that the digital information is analyzed and its essential features and characteristics captured. This step either has to be done by the third application program, Analyzer (A), or done by C before the start of compression coding. If the digital information passed to C for processing is very long and in a size unmanageable to C or A, the digital information has to be cut into pieces of manageable size before the DP stage is carried out. This invention only deals with the processing starting at the step of DP.
[16] Of course, after DP and if it warrants to do so after learning the essential features and characteristics of the content of the digital information, C, if built in with such flexibility and capability, could also divide the digital information into pieces or units for compression. For instance, portions of long series of 0s or Is or other similar patterns could lie interspersed within the digital information file at different locations; in this case unit processing could be selected as the preferred way of coding. So this invention does allow for compressors and decompressors to take part in the compressing and
decompressing processes in different ways at different locations of the digital
information file.
[17] Returning to the digital information which is decided to be processed using the Method (M) as revealed in the present invention, after DP is carried out and such essential features and characteristics are obtained, such essential features and characteristics are used to make the Location Table(s) (LT) to be used subsequently for compression. Using multiple tables, i.e. Multiple LTs, is a novel fundamental characteristic of the present invention.
[18] To do compression using the present invention, one has first to decide or select the size of the unit of digital information for compression, Compression Unit (CU). 1 byte has 8 bits and a 1 byte ASCII code table has 256 variations or 256 different ASCII characters. UNICODE assumes 2 bytes and UNICODE code table has 256 x 256 = 65,536 variations or 65,536 UNICODE characters. The CU could be 1 byte or 2 bytes or in other sizes, such as 1 bit with 2 variations or 2 bits with 4 variations or 3 bits with 8 variations, etc. etc.
[19] Taking the simplest case for illustration here as one is used to reading ASCII code table, to compress the digital information Byte by Byte, the CU is then selected to be 1 byte. So the compression process starts with reading the digital information file to be compressed byte by byte and compressing it also byte by byte into the compressed codes. The processing starts with reading therefore 1 byte, i.e. 8 bits and then representing this code of 8 bits with a code of less than 8 bits. Using the present invention, for example, the substitution of a code of one byte long, i.e. of 8 bits, with a code of less than 8 bits is done by making reference to LTs.
For instance, here is how Location Tables could be structured in the present invention:
Diagram 1
Figure imgf000006_0002
LT1 is the Table Name, its Table Binary Number, i.e. using binary notation, is 0. This LT has only 1 location only. Usually, the Location Binary Address (LB A) for it could be given a binary number as 0. But this LT1 has only 1 location, so its LB A could be omitted, designating the TBN is designating the only location of this LT. So to designate this only location in LT1, one could just use 0: as the Table Location Binary Address (TLBA) instead of 0:0. To write into the compressed code, one could omit the : symbol and write 0 only.
Diagram 2
Figure imgf000006_0003
LT2 has two locations. To designate its first location, one could use 1 :0 and write 10, the second location 1 : 1 and write 11 as the compressed code. If this LT2 has instead only one location, then one could just use 1 : to designate it and write 1 as the compressed code just like the case in Paragraph 20.
The above LT is called LT2, because it assumes there are two LTs being used altogether. So in the case of two LTs discussed above, the situation is like:
Diagram 3
Figure imgf000006_0001
LT2 1 0 1
In this case, there are two LTs, the first one LT1, its TBN 0; because it has only one location, so one could just write in 0 as its compressed code of its TLBA. To distingui from LTl, the TBN of LT2 is 1. To designate the first location of LT2, one writes 10 as its TLB A, and the second location 11.
[23] If LT2 has 4 locations, then the corresponding TLB As are 1 :00, 1 :01, 1 : 10 and 1 : 11 and their compressed codes to be written into the compressed code file are 100, 101, 110 and 111.
[24] On binary scale, the LTs with 1 bit for LB As could represent and accommodate 2
locations at most, 2 bits 4 locations, 3 bits 8 locations, so on and so forth. So if the CU is 1 byte, i.e. 8 bits, one CU could have 256 values. One could use one LT to accommodate all these 256 values, but its Table Size (TS), LBA bit size, should be 8 bits. Or one could use 2 LTs, each has a size of 7 bits accommodating 128 values each.
[25] If one uses only one LT to represent these 256 values, one could do away with the TBN and just use the LB As as the TLB As and write 00000000 as its first location into the compressed output file and 00000001 as its second location, and so on and so forth.
[26] To represent all the 256 values of 1 byte of the digital information, using 1 LT to house these 256 values does not save any bits for any of the 256 values. Compression only works either there are less actual values in the original digital information than all the possible values, say in this case only 128 out of 256 values are found, therefore one could use just one LT of TS of 7 bits; or it works when some values of these 256 values are more frequent and other values are less frequent, and in this latter case, the more frequent values are put into table locations of LT(s) of smaller TS(s) and less frequent values put into LT(s) of bigger TS(s), that means it requires more than one LT to house the 256 values in this case, for instance, such as 3 LTs, one of 7 bits having 128 locations and the other two each of 6 bits having 64 locations. In doing so, it now requires 3 LTs and to designate them, one has to use 2 bits for the TBN. So the more LTs are to be used, the bigger the Size of the TBN (TBNS) required, TBNS of 1 bit could allow for the designation of 2 LTs, 2 bits 4, 3 bits 8, so on and so forth. As the TLBA consists of two parts, one TBN and the other LB A, i.e. in the form of TBN:LBA, using more LTs will increase the number of bits required for TBN. So to accommodate 256 values, one had better not make a LT having 256 LB As if using more than 1 LT as one has to give at least one bit to represent the TBN of the LTs.
[27] So it is a delicate balance to strike. To use only 1 LT for representing all the 256 values of a byte, the LBA portion could not be 8 bits if one bit is given to TBN of the LT or the TBN of the LT has to be given no bit at all. In the case of a LT without TBN and has 256 8-bit LBAs, this 256 LB As are then used as its TLBAs to hold the 256 values. As such no values could be compressed into fewer bits. So the only option is to use more than one LT. But using more than one LT, one has to use the TBN portion in designating the TLBAs; the more the LTs are used, the bigger the TBN portion. Using 5 to 8 LTs, the TBN portion will go up to 3 bits, 9 to 16 LTs 4 bits, so on and so forth. To have compression achievable for one-byte values where all such values are found within the digital information file to be processed, on average TBN:LBA should be less than 8 bits, i.e. TBNS+TS, Table Binary Number Size plus Table Size should be less than 8 bits. For the CU of 1 byte, there are two extremes of these combinations, one extreme is using 1 LT of TS of 8 bits and the other using 256 LTs each having one location of TS of 0 bit. To designate the TLB A, in the first extreme the TBN could be omitted while in the other extreme, the LB A could be omitted. Using these two extremes represents no loss and no gain in the output file size.
[28] In order to achieve compression for this CU of 1 byte having all the 256 values, there are many possible combinations of TBN:LBA to try. Which combination of these TBN:LBA is the best one also depends on the nature and characteristics of the digital information, which is to be collected in the DP stage by A or C.
[29] Furthermore, the size of CU may not be just one byte, one could use other CU sizes as well, such as 2 bytes or half a byte or any other bit lengths. This further complicates the problem as one may first wonder which size of CU is appropriate to the digital information of the input file to be compressed. The modeling (i.e. finding out the nature and the characteristics and getting a meaningful picture) of the digital information of the input file for compression and the matching of it to a particular TBN:LBA combination as well as the assignment of values into the TLB As is still largely an unexplored area in the field or art of compression science because no disclosure has been made to using Location Tables having a structure with TBNs and LB As (combined together as the TLB As) for the assignment of CU values read from the digital information to be processed.
[30] So to compress a digital file using LTs, i.e. the schema proposed in the present invention, one has to decide what size of CU is to be used, how to model and match the digital file to the best TBN:LB A (yet to be determined) or to the best Location Table Structure (LTS) to be used for processing and also how the output file is to be structured so that it could be recoverable and better still re-compressible and re-recoverable.
[31] The present invention therefore is the first attempt to provide an initial answer to the above questions. With the advent of super computers with enormous computing power, such problems could be approached in a better and better way.
[32] To begin with, using LTs for making compression, the fundamental rule to follow is that the most frequent values are to be assigned to the Table Location of the shortest TLB As. This is also the basic premium on which the compression science is founded. Based on this, one therefore has to find out and prioritize all the values of the CU found within the digital file to be compressed. Values are prioritized or ranked according to their frequencies of occurrence in the digital file, i.e. the first priority given to the value with the highest counting found. However one has to decide how to rank the values with a tie in frequency count. For simplicity, one could break the tie by simply ranking the values with lower values in front of the values with higher values. This simplicity points out the difficulty of doing modeling and matching.
[33] Using the table-switching technique to be revealed later in the present invention, tie- breaking could be done by other ways. For instance, one could break the tie by finding out which values within the tie appear more frequently after the preceding value in the ranking. For example if the preceding values is 0011, the values of the tie are 1100 and 0110. Then if 0110 appear more frequently after 0011. Then 0110 should come next before 0110.
[34] If there are more than 2 values in the tie, such as 1100, 0110 and 1110. Then to break the tie, one could find out which one, 1100 or 1110, appearing more frequently after the preceding value, i.e. 0110, this time, not the one before 0110. This rule applies to more other values in the tie if found. This however requires more statistics to be collected from the digital file, i.e. the frequency count of all the values appearing after any particular other values in the digital file. This makes the modeling and matching better than using the lowest values first rule if table-switching technique is to be adopted.
[35] Next, one has to explore the question of determining the size of CU to be used. It has been suggested for the languages of the Western World, One Byte, i.e. 8 bits, is the preferred choice as their languages are represented in ASCII codes and for other languages, such as Chinese, Two Bytes, i.e. 16 bits, is better to be adopted as the Chinese Language is of Double-Bytes as with other languages of the Eastern World.
[36] Of course, one could use other CU sizes for other reasons. What is apparent is that the bigger the CU size, the more complex the modeling and matching is to be done. As said before, which CU size is the best is still an area unexplored and may not universally applicable to digital information of all nature.
[37] The present invention does not assume any particular characteristics of the digital input file, whether it is a text file, audio file or image file. The input files are all treated in the same way, and they appear only as a series of 0s and Is in different orders and patterns.
[38] For the purpose of exposition of the characteristics of the present invention, therefore the smaller the size of the CU is the easier and better understood. The smallest size of the CU is 1 bit. 1-bit CU has only two values, either 0 or 1. It does not appear to yield much if the schema proposed under the present invention is used. For using more than 1 LT or using multiple LTs, assigning just two values into two or more tables does not make sense as a TLBA consists of two parts, i.e. TBN:LBA, which uses at least one bit for specifying a TLBA. In short, no gain is to be achieved using the present invention for this. Using run-length encoding seems more appropriate.
[39] What comes next is 2 bits. A 2-bit CU has four values, 00, 01, 10 and 11. Putting them into two LTs, there are two combinations for this. Either the first LT has 1 value and the second LT has 3 values or the first LT has 2 values and the second 2 also. In the latter case of using 2 LTs each having 2 values, the TLB As are 0:0, 0: 1 and 1 :0, 1 : 1, again 2 bits have to be used for each values and no gain at all.
[40] In the former case, the corresponding TLB As (TBN:LBA) are 0: and 1 :00, 1 :01 and 1 : 10.
It is apparent that one has to use 1 bit to represent the first-ranking value and to use 3 bits to represent the other 3 values. Except for those the digital files, the frequency
distribution is highly skewed to the first-ranking value, there is not much point to use 2- bit CU as using 3 bits to represent the other three values is a waste and achieve expansion instead of compression. As just mentioned, using the 2-bit CU is not appropriate for digital files with digital information of even distribution, it therefore could not match easily with most scenarios of data distribution. Of course, it does not prevent using 2 LTs to represent the input code values where the data distribution of the input code warrants it.
[41] The present invention below mainly uses 3 -bit CU to illustrate the techniques used under the schema presented in the present invention. After reading the present invention, one, of course, could apply the techniques revealed to the 2-bit CU scenario or scenarios of CU of other bit sizes if so desired and further investigate how to improve on it.
[42] As apparent in the discussion in Paragraph [40], using more than 1 LT could provide a very basic way of representing values of higher frequency by shorter TLB As and values of lower frequency by longer TLB As. That is 2-bit CU values are represented either by 1- bit TLBA or 3-bit TLB As. Using more LTs could provide more variations for modeling and matching to digital information having various kinds of data distribution. And CU of bigger size has more values and could allow for the use of more LTs for assignment of CU values.
[43 ] For instance, using CU of 3 bits, one has 8 values : 000, 001 , 010, 011 , 100, 101 , 110 and 111. 8 CU values could be assigned to 1 LT, having TLB As as 0:000, 0:001, 0:010, 0: 011, 0: 100, 0: 101, 0: 110 and 0: 111. Encoding them into compressed code, the LB As (where TBN could be omitted here) are just the same as the original code without gain or loss.
[44] The other extreme is to put these 8 CU values into 8 LTs each having one Table Location, the 8 TLB As are then 000:0, 001 :0, 010:0, 011 :0, 100:0, 101 :0, 110:0 and 111 :0.
Omitting the LB As and using the TBNs only, the compressed codes are exactly the same again as the original values.
[45] Between the extremes of using 1 LT and 8 LTs, there are choices of using 2 LTs, 3 LTs, 4 LTs, 5 LTs, 6 LTs and 7 LTs. From a first glance and the insight revealed in the discussion above, it does not seem having much point in using 2 LTs for the 8 CU values. However this does not prevent one from using different number of LTs and the present invention also presents later a technique of using dynamic LTS.
[46] The present invention is not aimed at exhausting such variations but at revealing the
techniques and schema that could be used for making data compression. Using 3 LTs for CU of 3 bits having 8 CU values appears to provide more and yet manageable variations for discussion. This also illustrates well the merits of using table switching as will be revealed.
[47] For CU of a size of 3 bits having 8 values, the longest TLB As are expected to be no more than 3 bits. But it seems very hard to put 8 3-bit CU values into 3 LTs where the longest TLB As are no more than 3 bits.
[48] Using 3 LTs, one of the TBNs should have at least 2 bits, such as in the combinations of 0:, 10:, 11 : or 00:, 01 :, 1 or etc. This leaves only 1 bit for the LB As. So using 2 bits for all the TBNs (the option chosen for the TBNs of the 3 LTs here for illustration being 00, 01 and 10) and 1 bit for the LB As for all the 3 LTs, the TLB As of these 3 LTs are then 00:0, 00: 1, 01 :0, 01 : 1, 10:0, and 10: 1; only 6 TLBAs (2 in each of the 3 LTs, i.e. 2.2.2, the notation being used for this having separating dots for the ease of expression) are provided. 6 TLBAs cannot accommodate 8 CU values. Another feature is that 2.2.2 is an even distribution structure in 1 shape (Lower Case 1), which does not serve the fundamental principle and technique for compression science, i.e. assigning less number of bits to more frequent values and more to less frequent ones.
[49] Another variant (1.1.4, 1 in the first LT, 1 in the second and 4 in the third) of the 3-LT structure is TLBAs being 00:, 01 :, 10:00, 10:01, 10: 10, and 10: 11; in this case TBN 00: and 10 has only 1 address, i.e. 00: and 01 : respectively, TBN 10: having 4 addresses, namely 10:00, 10:01, 10: 10, and 10: 11. This is a skewed LT structure in L (Upper Case L) shape, having extremes on either side. This is good for digital data of skewed distribution, i.e. more frequent values being assigned to the first and second LT and less frequent values to the third LT for achieving data compression. Assigning the other way means data expansion instead.
[50] One could also make the LT structure into 1.2.4 in \ shape (a slash), i.e. 3 LTs having 1, 2 and 4 TLBA(s) in the first, second and the third LT respectively, the TLBAs being 00:, 01 :0, 01 : 1, 10:00, 10:01, 10: 10, and 10: 11. This \ shape LT structure is in the middle between the skewed L shape and the even 1 shape.
[51] The above 1, L, and \ shape LT structures serve as options to be chosen for modeling and matching the different kinds of data distribution of the digital information to be compressed. The two uneven LT structures, L and \ shapes, have uneven bits for different TBNs and LB As for the LTs, providing different numbers of TLBAs. The 1 shape LT structure has the same number of TLBAs for its 3 LTs inside.
[52] The 1 and L shape LT structures provide 6 TLBAs and the \ shape provides 7 TLBAs.
They still could not take all the 8 possible values of the CU of 3 bits. Another L shape LT structure in the form of 2.2.4 could be considered, whereby the first LT has 2 TLBAs, the second also 2 and the third 4, TLBAs being 00:0, 00: 1, 01 :0, 01 : 1, 10:00, 10:01, 10: 10 and 10: 11. This could accommodate 8 possible values of 3-bit CU. Adopting this LT structure means using one more bit for the third LT, and sometimes this may not be desirable for the digital information being modeled and matched for making
compression. However another LTS 2.2.4 could accommodate all 8 CU values; for instance, one design like 00:0, 00: 1, 01 :0, 01 : 1, 1 :00, 1 :01, 1 : 10 and 1 : 11; that is the TBN of the third LT becomes 1 : instead of 10:.
[53] To spend less number of bits for the digital information of an uneven data distribution, sometimes one could also consider using the technique of Address Branching. Taking the L shape structure 1.1.4 as an example, in order to accommodate the 7th TLB A for assigning the CU value of the 7th ranking according to frequency of occurrence and save more space for a highly skewed distribution, one could make the structure into 1.1.5. To accommodate 5 TLBAs in the third LT, normally the LB A portion requires 3 bits.
Address Branching is a technique of saving bit size for TLBAs of a particular table. Because 3-bit LBAs have 8 addresses and there are only 5 values assigned to it, this means wasting space of 3 empty TLBAs. And the first 4 values take up LBAs of 3 bits instead of 2 bits as it could have taken. Address Branching means, in this example, branching the 5th address to the 4th address and merging them together so that 2-bit LB As could provide 5 to 7 addresses, step by step. If the number of addresses to be created is up to 8, then 3 -bit LB As have to be used instead of 2 bits with Address Branching as no bit saving could be achieved in this case.
[54] Continuing with the example of LTS 1.1.4, if there are all 8 values of the 3-bit CU
appearing in the digital information being processed, with Address Branching the least frequently occurring 7th and 8th values are separately assigned to TBLA 10: 10 and 10: 11, i.e. 7th CU value being assigned to the 5th CU value and 8th to the 6th. So when writing down to the compressed output file, one writes 1010 and 1011 into it when the value being processed is the 7th and the 8th value. To distinguish the 5th and 6th CU values from the 7th and 8th CU values respectively, when the processing comes to the end of the Processing Unit (PU) or any other designed unit of processing, one more bit for the Address Branched CU values (being temporarily kept in the computer memory array or somewhere in a computer file) has to be written, say 0 for the original 5th and 6th values and 1 for the 7th and the 8th value being Address Branched to them respectively. When the compressed output file being decoded, each PU is decoded in sequence and when it comes to the end of the PU, there should be some more bits of 0s or Is. The PU is then to be scanned for the TLBA of 10 : 10 and 10 : 11 , for the first 1010 and 1011 in the PU, the first Address Branching Bit (ABB) determines if it is the 5th and 6th CU values or the 7th and the 8th value. If this ABB is 0, then it is given the value of the 5th or 6th value, which is then written down into the output file, 7th or 8th if the ABB is 1. If there are more ABBs in the PU, this means in the decoded output of the PU being processed there are more 1010 and 1011 to be decoded in the same way one by one. ABB could also be put right after the Address Branched TLBA and decoded in the same way as it is put at the end of a PU. That is when decoding comes to an Address Branched TLBA, D reads in one more bit and then determines if it is the 5th or 6th value or the 7th or 8th value. PU, Processing Unit, here is a certain amount of digital information decided to be processed at one unit of division one by one. For the present example here, one could use 8 3-bit CUs as one PU, i.e. a CU here is 3 bits and a PU is 8 times 3 bits, equivalent to 24 bits. If no such division is attempted, the whole input file is regarded as 1 PU. So a big digital file could have a lot of PUs. In case there is more than 1 PU group, the logics of processing, i.e. encoding and decoding, may vary from one PU group to another PU group.
[55] To illustrate the concept of PU, for the example here the input digital file has 1,792 bits of information, which could be divided into 8 PUs and each PU consists of 24 bits, assuming the CU is 3 bits and the PU 24 bits. So this input digital file has 8 PUs, each having 8 CUs. Selecting 24 bits for the PU size is for the purpose of making it possible to accommodate all the 8 possible values of a 3 -bit CU, being 000, 001, 010, 011, 100, 101, 110, and 111 on binary scale.
[56] Therefore there could be 8 scenarios for each PU; each PU could have only 1 value inside or 2 or 3, and up to 8 all possible CU values. On one extreme where there is only 1 CU value inside a 24-bit PU, there are 8 counts of that CU value, such all 8 are 000, or all 8 are 111, or in-between. On the other extreme, there are 8 different values and each CU value has only 1 count, such as 000, 001, 010, Oi l, 100, 101, 110, and 111. In the PU with 7 different values, then 1 CU value will have 2 counts and the other 6 different values 1 count each. In the PU with 6 different values, it then has either 5 different CU values with 1 count each and 1 CU value with 3 counts or 4 different CU values with 1 count and the other 2 different CU values with 2 counts each. One could deduce from the above description and write a program calculating and listing all the possible
combinations of PU with different number of CU values each. In the present example used, there could 8 PU groups, from PU1 to PU8; PU1 having only 1 CU value of all 8 counts inside the PU and PU8 having all 8 CU values with 1 count each only.
[57] In the present invention, there is a schema in which the digital information file is divided into PUs for encoding and decoding as described in Paragraph [56] above. The structure of the encoded output of the PU begins with a signature of 3 bits indicating how many number of different CU values the PU has as follows:
Diagram 4
CUV Header of PU
Input Code of a PU
000001000001000001000001
Output Code of a PU
001
The Compressor C scans the above input code and finds out that inside this PU of digital information, there are only 2 values found, i.e. 000 and 001 only. So it writes down the header signature of PU, CU Value (CUV) of this PU as 001, indicating within this PU there are only 2 different CU values and then encodes the 24 bits of input code for this PU according to the logics particular to the PU with 2 different CU values only. In the input code of Diagram 4, there are four Value 000 and four Value 001. The of the output code of Diagram 4 denotes the compressed code encoded by C.
[58] If the digital input file is to be processed as one unit and not sub-divided into smaller PUs, then the header signature, CUV of the PU as described in the Paragraph above could be dispensed with.
[59] So after choosing 3 bits as the size of the CU and 24 bits of the PU, in the present
example one has to choose the LTS to be used. In the aforesaid paragraphs, one has come across LTS of 2.2.2, 1.1.4, 1.2.4, and 1.2.5 with or without Address Branched Bits (ABB). One could design other LT structures for use as well by modeling and matching the LT structures with the actual digital information to be processed. For a highly skewed data distribution, a LTS of more uneven TBN:LBAs is more appropriate. So after the DP stage, if it is found to be so by A or C, one may find using LT structure 1.2.5 more appropriate. In LTS 1.2.5, there are only 1 TLBA in LT1, being 00: for assigning to 1 value, and 2 TLB As, being 01 :0 and 01 : 1 in LT2 for 2 values, and 4 TLB As in LT3, being 10:00, 10:01, 10: 10 and 10: 11 for 5 values with the use of Address Branching, where the 8th CU value of the least frequency count is assigned to 10: 11 with ABB 1. The 7th one is given the TLBA 10: 11, with ABB 0 and the 6th 10: 10, the 5th 10:01, the 4th 10:00, the 3rd 01 : 1, the 2nd 01 :0 and the 1st, being the one with the highest frequency count, 00:, given the fewest bit.
[60] To assign values to these 8 TLB As of the LT structure 1.2.5, one has to do so according to the statistics obtained during the DP stage about the data distribution of the digital information for modeling and matching with the LTS. The following statistics are expected to be collected in the DP stage by A or C, including the size of the digital information input file, how many of the 8 possible CU values found, and the frequency counts of each of these CU values found, if not all 8 CU values are present, the frequency count of each of the CU values appearing after another CU value present, for example, for use in breaking the tie in the ranking and prioritizing the CU values in accordance with frequency count for the whole digital information file, the repetition statistics about each of the CU values found preceding itself in the digital information (for instance, for use in determining how the Staying and Switching Bits are used in the use of Table- Switching), and if PUs are to be divided and used, the above statistics for the whole input digital file should also be collected with reference to the need for PU processing, in particular the frequency count of the CU values for PUs having different number of CU values for use in ranking and prioritizing the CU values for use in PU processing if so desired. This will be elaborated later in due course of the body of the Description. One may also consider including other statistics, such as Clustering Index (meaning the tendency of different CU values clustering together or separating from each others for the use of grouping CU values under the same or different Location Tables) for use where appropriate.
[61] If the statistics of the digital information to be compressed indicates that the 8 CU values are ranked in the following order according to frequency count in descending order, i.e. the most frequently occurring values ranked first in the front: 000, 001, 010, 011, 101, 100, 101 and 111, then the assignment of the 8 CU values to the TLB As of the 3 LTs of the LT structure of 1.2.5 chosen above could be:
Diagram 5
Assignment of CU values and the Encoded Code
LT TLBA Output Encoded Code Input Code of CU Value
1 00: 00 000
2 01 :0 010 001
2 01 : 1 011 010
3 10:00 1000 011
3 10:01 1001 100 3 10: 10 1010 101
3 10: 11 1011 0 110 with ABB 0
1011 1 111 with ABB 1
[62] According to Diagram 5 above, it could be seen that all the input CU values are 3 bits, to represent CU Value 000 to CU 111. The TLB As vary from 1 bit to 4 bits. CU 000 is the most frequently occurring value, so assigned to the shortest TLBA 00:, having 2 bits. The two less frequently occurring values, CU 001 and 010, are then assigned to LT2 and given encoded codes of 3 bits, in the same width as the original input code. And then the last 5 values with lower and lower frequency are given 4 bits instead of their own 3 bits, resulting in an expansion instead of compression. So if the data distribution is even, the encoded code results in expansion. To achieve compression, the value 000 should have a very high frequency in order to compensate for the losses due to the assignment into LT3. Whether there is expansion or compression of the digital input depends on how uneven the data distribution of the digital input is and how skewed it is to the most frequently occurring value 000 in this case.
[63] Also if the PU is not the whole digital input file, but only 24 bits in this case, it is highly likely that one PU of 24 bits very seldom contains all the 8 CU values. But the LT structure still has to make provision for the occurring of the least frequent CU values as all CU values could appear in just one but not all of the PUs being processed. Usually different PUs may have different CU values appearing in different numbers. If there is a PU3, meaning a PU of the PU3 group, having only 3 CU values found within the PU, and one uses LTS 125 and the 3 CU values are concentrated into the values assigned to LT3, then expansion will be the case. In that case, the present invention introduces a technique of Dynamic Tabling, dynamically adjusting the LTS to achieve more compression. This will be further elaborated later. Using Dynamic Tabling, if less than 8 CU values are found in the whole digital input file, for instance, if only 3 CU values are there as mentioned earlier, the 3 CU values could be taken out from LT3 and assigned to TLB As of LT1 and LT2, and there will be much compression.
[64] If the frequency counts of the CU values are conducted according to the whole digital input file, the prioritization of all the CU values resulting from this could serve as the Master Ranking List (MRL) of CU values found. This MRL could be used for constructing a Master LT Structure if no dynamical adjustment of the LT structure is attempted or implemented. The LT structure, LTS, refers to the number of LTs being used (affecting the number of bits used for TBN) and the Table Size (the number of bits used for LB As for holding CU values assigned) of each of the LTs used. There could be a standard LT structure or more than 1 LTS to choose from. The information signifying these standard LT structures and the rules determining how to select and use these LT structures could be embedded in the application or recorded as Header Information of the digital information file. For instance, after modeling the data distribution of the digital information to be processed, C could record a signature, Data Distribution Signature (DDS), such as 00 for a highly skewed distribution, into the Header Information in the Header Section of the output encoded file for the purpose of decoding afterward, and start to use the LTS 00 most appropriate for it. If the data distribution is highly even, then DDS 11 may be used and the corresponding LTS 11 is adopted, so on and so forth. Using 2 bits for the DDS means it could represent 4 types of data distribution and 4 LT structures. If more is found to be desirable, more bits could be used for DDS; such as when using CU of 8 bits and PU of 2048 bits, more LTs could be used and there are more combinations of LT structures, catering for more data distribution models.
[65] So if the LTS selected does not model and match well to the data distribution of the
digital information to be processed, it is highly unlikely that data compression is achievable. Even if a good LTS is chosen, sometimes it is only the best amongst the possible choices available. For instance, the LTS 124 may be the best fit to the digital information being processed, but it could not offer all 8 TLBAs for all the 8 CU values. So Address Branching discussed previously could save as a remedy to a certain extent. If the whole digital information input has only 7 CU values, it is all the best. Reality however is very often not on one's side. Obtaining data distribution information or statistics is vital in designing or selecting the best LTS to match the characteristics of the data distribution of the digital information to be processed. Without the aid of such data statistics, one has to use a pre-designed LTS and use Dynamic Adjustment of LTS and Dynamic Assignment of CU values to LTS according to data statistics collected on fly during processing.
[66] Referring to Diagram 5 again, it could be seen only the first table, LT1 provides some bit saving. The second table, LT2 could only break even. LT3 further requires more bits than those required by the original input code, resulting in data expansion.
[67] Even worse, in order to make the encoded code recoverable to its original form, some rules must be specified in the application programs, which consumes computer time and power or some indicators or signatures must be written into the encoded code, which results in additional bits. It is also essential that the application programs, such as A, C or D, should be hard-coded with some normally used rules and parameters. And it is inevitable that some additional information has to be written into the output result file to provide the necessary information for decoding purpose.
[68] So techniques which could result in less storage space consumption are always welcome and very much desired. Address Branching and Dynamic Tabling mentioned above are two such techniques.
[69] There is another novel and marvelous technique introduced in the present invention for space saving. This technique is called Relative Table Switching (RTS) and Relative Addressing (RA).
[70] Relative Table Switching is the way to make the use of multiple LTs worth to attempt, especially in the present case of using 3 LTs. The encoding process described in
Paragraph [59], [61] and [62] and Diagram 5 is a technique of Absolute Addressing (AA). It translates CU values using Absolute Addresses. In Diagram 5, LTs are switched using absolute TLBAs. [71] Diagram 6 and 7 illustrates the difference between using Absolute Addressing and using Relative Addressing by way of Relative Table Switching using a LT structure 1.2.4+1 :
Diagram 6
Absolute Addressing
Input Code of PU with 24 bits having 8 CU Values
000 001 010 011 100 101 110 111
Output Code of PU using AA 2 ABB bits
000 010 011 1000 1001 1010 1011 1011 01
[72] It could be seen from above that 24 bits of the input code of the PU having 8 CU values are represented by 30 bits including 2 Address Branching Bits for distinguishing the 7th and 8th values as described previously in Diagram 5. For the sake of easy reference, the header signature (111) of the PU is omitted here. Including the 3 bits of 111, actually 33 bits are required.
[73] A PU with all 8 CU values is an extreme case and in Diagram 6 nine more bits are
required for the encoded code. It does not serve the purpose of compressing data.
[74] In the following Diagram 7, it shows how the same input code (000) being the most frequent value and 111 the least frequent value, is translated using Relative Table Switching (RTS):
Diagram 7
Relative Table Switching
Input Code of PU with 24 bits having 8 different CU Values
000 001 010 011 100 101 110 111
Output Code of PU using RTS 2 ABB bits
00[0] 11 0 0 1 11 00 0 01 0 10 0 11 0 11 01
[75] The output code using RTS is 25 bits including 2 ABB bits. Adding 3 bits (111) of PU header signature, it becomes 28 bits. Using RTS rather than AA, only 4 more bits are used instead of 9 more bits, saving 5 bits. The input code is an input code having even distribution of all 8 CU values present. Using LTs may not be the best way of doing compression for this type of data distribution. Later the present invention will present another technique, Rank Code Pair Numbering (RCPN) , of compressing this kind of PU having 8 different CU values or less. The technique of RCPN fits well into and adds to the PU schema, the RTS and RA technique, Dynamic Tabling, and Multiple LT structure being introduced in the present invention for data compression and serves as reminder that there is no one technique or method that serves all purposes or fits all kinds of data distribution. This of course is also applicable to RCPN itself.
[76] Returning to Diagram 7 using RTS, one could see some Switching Codes with
underlines. And the original input code is translated in another way. To translate the input code using RTS, the first value 000 is translated into 00, i.e. the only TLBA (00:) in LT1. Because there is only 1 address for 1 CU value, its LB A is omitted [0] and it is enough to use just its TBN. 0 in [] denotes the omission of 0 in the encoded code. The next two bits 11 refer to switching to the next LT. The first 1 is the special bit calling for table switching. The second 1 means the table to which it is to switch, 1 meaning the next table in the order. The first value 000 is assigned to LT (LT1) having TBN of 00:. So the next LT in the order is 01 : (LT2). In LT 01 :, there are two TLBAs, 01 :0 and 01 : 1. Since the second value 001 is assigned to 01 :0, the output code for the LBA portion becomes 0, indicating the first TLBA of LT2. The third value 010 is assigned to the same LT2, taking the second TLBA, 01 : 1. So table switching is not required. To signal it, a 0 is used. 0 means no switching or staying on the same LT to find the value required. Since the third value is assigned to the TLBA of 01 : 1, output code for LBA, i.e. 1, is used to signify it. Up to here, all the TLBAs of LT1 and LT2 are used up, all other values are assigned to LT3 having TBN as 10:.
[77] To switch from LT2 to LT3, i.e. LT 10:, the special Switching Bit 1 is used again. And to specify switching to the LT next in the order, another 1 is used. 5 values, namely 011, 100, 101, 110, 111 in sequence are assigned to 4 TLBAs, with the last 2 values taking the last TLBA 10: 11. The fourth value 011 is assigned to the first TLBA of LT3, 10:00. To specify it, 00 are used. This is because there are 4 TLBAs inside LT3. It uses 2 bits to represent 4 TLBAs. So the first TLBA of LT3 is 00. Since all the remaining values are assigned to the same LT3, to specify the remaining values, no table switching is required. So only a Staying Bit 0 is used to denote staying on with the same LT. To specify the fifth value 100, 01 is used as it is assigned to the second TLBA of LT3, 10:01. To translate the sixth value 101, no table switching is required and is denoted by the special Staying Bit 0 again; as Value 100 is assigned to the third TLBA 10: 10, one writes down 10 for it. The seventh and eighth values are both assigned to 10: 11 using Address Branching for accommodating 8 values using 3 LTs with a structure of 124+1. To specify the seventh value, Bit 0 is used again to denote staying in the same table LT3, and 11 to denote its TLBA. Because of Address Branching, a special ABB 0 is used to denote the original value. To translate the eighth value 111, another Bit 0 is used for staying in the same LT and 11 is again used to represent the TBLA 10: 11 to which it has been branched. And the ABB for the 8th value, this time is 1, after the ABB 0 is assigned to the 7th value. That means both the 7th and the 8th values are having ABB, the former one with Bit 0 and the latter one Bit 1.
[78] To use the technique of RTS, the following characteristics emerge: a special Switching
Bit 1 is used to indicate the action of switching LT for obtaining TLBA, a Staying Bit 0 is used to indicate no table switching or remaining in the same LT for finding TLBA. Using RTS, it is also important to keep a variable to register the current TBN (CTBN) of the LT being used for processing in the memory space of C and D. And this current TBN changes with the switching of LT. The following Diagram 8 clarifies how the current TBN variable changes in value with the use RA by way of RTS:
Diagram 8
Relative Addressing
LT1=00:, LT2=01 :, LT3=10:
Case 1 where Current TBN = 00
CTBN
00
Using RA 10 11
Referring to TBN 10 01
In Order Previous Next
Case 2 where Current TBN = 01
CTBN
01
Using RA 10 11
Referring to TBN 00 10
In Order Previous Next
Case 3 where Current TBN = 10
CTBN
10
Using RA 10 11
Referring to TBN 01 00
In Order Previous Next
For Case 1 where CTBN is 00, the LT next in order is LT2 01 and the LT previous in order is LT3 10; for Case 2, CTBN 01, LT next in order is LT3 10 and the LT previous in order is LTl 00; for Case 3, CTBN 10, LT next in order is LTl 00 and the LT previous in order is LT2 01. So to switch to another LT by using special Switching Bit 1, one could use just another 1 bit, either 0 or 1, to specify which LT to switch to by using relative addressing. Bit 10 means switching to the previous LT in the order, Bit 11 means switching to the next LT in the order. So using RTS and the Switching Bit uses the same number of bits as used by Absolute Addressing. However, when the value next to come remains in the same LT, then the Staying Bit 0 is used instead of 2 bits for the TBN using Absolute Addressing and this helps save 1 bit. So for LT using TBN with longer bit assigned, using RTS and the associated Staying Bit 0 could save more bits.
[80] RCPN is the technique used to convert a data pattern of binary bits into binary numbers.
Combining the use of RTS and Multiple Table Structure with the use of RCPN is another useful technique to provide a much better solution of compressing data of different distribution in general as could be seen later in this Description. More details of RCPN will be discussed when the processing of PU8 group is mentioned later.
[81] More examples are provided to illustrate how to use RTS for processing PUs containing different number of CU values below. Diagram 7 above is for PU having 8 different CU values, the PU8 group. The following Diagram 9 shows a scenario for PU having 7 different CU values, using assumption used previously for Diagram 7:
Diagram 9
Relative Table Switching
Input Code of PU with 24 bits having 7 different CU Values
000 001 010 011 100 101 110 000
Output Code of PU using RTS for LT structure 124 without Address Branching
00[0] 11 0 0 1 11 00 0 01 0 10 0 11 11 [0]
[82] Firstly it should be noted that CU value 000 is denoted in two different ways in the above scenario. In the schema presented here for exposition, the first CU value to be encoded in a PU should at least be given 2 bits of TBN code, so that the CTBN could be captured when processing changes from one PU to another PU. If the LT assigned with the first CU value has more than one TLBA, then the full TLBA should be written down into the encoded output file. So if it has 2 TLB As, either 000 or 001 should be written down instead of just 00 used now when the LT being used has only 1 TLBA, thereby using the TBN for representing the only TLBA to which Value 000 is assigned. [0] in Diagram 9 denotes that 0 could have to be written down but it is omitted and not required to be written. In the same manner in Diagram 9, the same [0] for CU value 000 is used at the end of the PU, denoting that [0] should be omitted from the encoded code. This is because after using the switching code H, the CTBN is updated to that pointing to LT1, i.e. 00: and LT1 has only 1 TBLA, there is no need to write down another 0 in order to achieve more compression. The switching code H is itself enough to point to CU value 000 and to represent this value in this example. The number of CU values found in any PU in this case is fixed at 8 counts and does not vary. After 8 instances of values are counted, the processing of the current PU has to stop and the next PU has to be processed. This, the value instance count, is a measure of control, signifying when the processing of a PU is to start and where it is to stop and whether it is to continue or not and also where to find the additional ABB bits and make correction for the Address Branched Codes. The ABB however as mentioned before could be placed just right after the relevant Address Branched TLB A, on decoding when D comes to the Address Branched TLB A, it could just look for the following bit to determine how the Address Branched TLB A is to be decoded. In Diagram 9, what is changed secondly is the LTS being used, i.e. one without using Address Branching. One should note that this is used for illustration of what takes place if RTS without Address Branching is the case.
Returning to the present example assuming no Address Branching is required, the LTS changes from 1.2.4+1 with Address Branching as in Diagram 7 to 1.2.4 without Address Branching as in Diagram 9. The number of bits required for the output file for Diagram 9 scenario is 22 bits. A saving of 2 bits is achieved after RTS compression. It should however be noted that it is a rather optimal situation as other combinations of CU values could occur that are not so optimal. For instance, if it has CU value 111 instead of 100, Address Branching is still required. This also illustrates that for PU with number of CU values less than 8, it does not mean that LT structure 1.2.4+1 with Address Branching could be avoided. As long as the whole digital information file has CU value 111 found at different places in PUs of different number of CU values, LT structure 1.2.4+1 with Address Branching still has to be used. Only when the whole digital information input file has only 7 different CU values instead of 8 different CU values, then LT structure 1.2.4 without Address Branching could be adopted. The total number of different CU values found within a digital information file therefore affects very much the data compression ratio that could be achieved. It could be seen that for a PU having 7 different CU values for the scenario of Diagram 9, 24 bits are compressed to 22 bits.
[83] Diagram 10 below presents another scenario of PU having 6 different CU values, one of which is 111. So LT structure 1.2.4+1 with Address Branching is to be used:
Diagram 10
Relative Table Switching
Input Code of PU with 24 bits having 6 different CU Values
000 001 111 011 110 101 000 000
Output Code of PU using RTS & LT structure 124+1 with Address Branching 2 ABBs bits
oo[0] li o 11 11 o oo o 11 o 10 H [0] 0 [0] 10
[84] In Diagram 10, Address Branching is still required as the CU values 111 and 110 are present. The notation [0] as has been explained means the 0 bit concerned could be omitted. Whether it is omitted depends on the rules used by the designer and
implemented in C and D and this is so as long as the encoding and coding follows the rules in a regular manner. Of course, omitting [0] wherever possible is preferred as it is to the interest of the compression endeavor. So the output code now has 23 bits, including 2 additional ABB bits, versus the input code of 24 bits. Because the 7th and the 8th CU values represent the two least frequently occurring values, in most other scenarios with PUs having 6 different CU values, the 2 ABB bits could be further saved.
[85] After explaining the use of Absolute Addressing and Relative Addressing, it is the time to give an idea about the possible structure of PU that could be used in the schema presented in the present invention:
Diagram 11
Structure of a PU
Input Code of PU with 24 bits having 6 different CU Values
000 001 111 011 110 101 000 000
Output Code illustrating the possible structure of a PU
101 0()|() |
The possible structure of a PU outlined here for implementation is headed by 3 bits representing the number of different CU values (CUV) within the input code of the PU read, the PU Header. In Diagram 11, it is 101, indicating only 6 different CU values of the 8 possible CU values are found inside the output code of the PU. So to classify or group PUs by the number of different CU values found, a PU having only 1 CU value present could be called PU1, meaning it belonging to the PU group having only 1 CU value, PU having 2 different CU values PU2, PU having 3 CU values PU3, so on and so forth up to PU8 in the example presented here. After the CUV signature (101 here meaning 6 different CU values are found inside the PU) the input code is encoded by C one CU value at a time, using their TLB As until finishing reading and encoding. One could use Absolute Addressing as illustrated in Diagram 6. Diagram 7 to Diagram 10 illustrates how Relative Addressing by way of RTS is used. If at the end of the whole digital input file, it does not end with a whole PU and for instance only 5 bits are found instead of the 24 bits assigned to the PU in the present example, such 5 bits could be left not processed and appended to the encoded code. Since there is no more incoming bits from the input digital information file, there should be no mistake about where it is the end of processing.
[86] In short, after writing the CUV, the input code is encoded by C one CU value read at a time by writing into the output file its corresponding TLBA until the whole PU is read and is encoded. If Address Branching is required to be used, the corresponding ABB(s) being kept in the computer memory or in a computer storage file is/are written to the end of the PUs or the end of the output file or placed just right after the encoded Address Branched TLB As as determined by the rules of design.
[87] Upon decoding, D reads the CUV first and the CUV read lets D know what kind of
processing or decoding and what rules to follow be adopted and implemented, corresponding to the PU having the number of different CU values indicated by the CUV. After reading the CUV and knowing how to proceed onwards for decoding the PU, D reads the remaining encoded code, translating the TLB As (or other special bit patterns particular to some PU groups, such as using Data Distribution Alteration technique for PU2) one by one until all the CU values of the PU are decoded. In the present case, since the input code of a PU has 24 bits only (representing only 8 counts of CU values), so the end of the PU could be determined by keeping a count register of the number of TLB As having been translated; where it reaches 8 counts, then D knows it has processed all the CU values of the PU. If Address Branching is used and if the ABB(s) is/are placed at the end of the PU, D then scans through the decoded CU values (8 in the present case) previously processed for the PU and finds out the CU values with Address Branching Values associated with them and use the additional ABBs to make the corresponding correction or adjustment. If the ABB(s) is/are placed just right after the ABB TLBA(s), then direct decoding of the TLBA(s) could be implemented when D comes across any Address Branched TLBA on fly
[88] To translate CU values into the corresponding TLB As, there are two different ways of translation; one is through Absolute Addressing and the other through Relative
Addressing by the way of using RTS depending on which type of Addressing C and D uses according to design. Or a 1 -bit Addressing Signature could be added to the Header of the digital information file signaling the usage. In Absolute Addressing, each CU value of the PU is encoded or decoded by using its corresponding Absolute TLBA and the corresponding ABBs if Address Branching has to be used.
[89] Using Relative Addressing by way of RTS, the first CU value to be encoded by C is represented by writing its corresponding Absolute TBN, its LBA portion could be omitted if there is only 1 value present in the LT. After encoding its first CU value, all the remaining CU values of the PU has to be translated using Relative Addressing by the way of RTS. If sub-division of PUs is not used, then the whole digital data input is taken to be a PU and CUV is necessary to be used, instead a Header Section has to be added to the Compressed Code File as will be elaborated later. Returning to the discussion in Diagram 9 and 10 and Paragraph [81] to [84], in order to switch to another LT for the CU value to be encoded, Bit 1 has to be written to signify the act of RTS, i.e. switching the Location Table. And Bit 1 is then followed by a bit 0 or bit 1 depending on the TBN of the LT next in the order or previous in the order. If the LT to switch to has only 1 LBA, then there is no need to write down its LBA. If more than 1 LBA, the corresponding LBA should then be written. If the CU value to be encoded stays inside the current LT, there is no need to switch to another LT for the TLBA for it. A Bit 0 is then written to signify staying in the same LT. If the LT has only 1 LBA, then there is no need to write down its LBA. As mentioned in Paragraph [78], a register for the CTBN has to be kept and updated where appropriate for this purpose.
[90] To decode the encoded output using Relative Addressing, D after reading the CUV (if there is no sub-division of PUs, then reading the Header Section of the whole digital data input instead of reading the CUV for PUs and the whole digital data input is taken as one PU and Header Section as the CUV), it reads 2 bits representing the Absolute TBN of the first CU value to be translated. If there is only 1 TBLA in the LT, because the LBA portion could be omitted, there is no need to look for the LB A portion. The two bits for Absolute TBN read could be used to represent the whole TLBA for the first CU value. If more than 1 LB A, the corresponding bit or bits for the LB A (how many bits of LB A here to be read being determined by the bit size assigned to that particular LT, i.e. the Table Size of the LT) corresponding to the first CU value has to be read in order to recover the first CU value. After translating the first TLBA for the first CU value, the other TLB As following in the sequence are encoded in terms of Relative Addressing and so should be decoded using Relative Address Translation technique. So to translate the second CU value, D reads the next bit in the bit chain. If it is 0, it means staying in the same LT. If the LT has only one TLBA, there is no need to look for its LBA if it is designed to be omitted. D then examines the CTBN register to retrieve the current TBN and find out which CU value has been assigned to it and write out the corresponding CU value to the decoded output. If the LT has more than one TLBA, D has to read in the number of bit(s) corresponding to the Table Size of the LT and get the whole TLBA in the form of TBN:LB A, and translate it into the corresponding CU value. After processing the second value, D continues to read another bit, if this bit is Bit 1, that means it has to switch to another LT for TLBA for the next upcoming value for decoding. D then reads next bit to know whether it is the previous LT or the next LT that is to be used. Bit 0 means the previous LT and Bit 1 the next LT in order. D then consults the CTBN register and then finds out the relevant LT with the correct TBN. D then reads the LBA portion from the encoded code to find out the original code for it and then writes out the original code into the decoded output code. D then updates the CTBN as it has moved to a new LT after the decoding of the second CU value. D then continues to process the third CU value in the same manner. After processing all the CU values of the PU, it has to make adjustment or correction for those CU values which are Address Branched by scanning all the CU values so far processed for the PU and make the corresponding change by reading and using the remaining ABB(s) if ABBs are placed at the end of the PU. If it is designed that the ABB(s) is/are placed right after the Address Branched TLBA(s), D then reads in the TLBA(s) and if it is found to be an Address Branced TLBA, D reads in another bit, i.e. the ABB, and then decodes the Address Branched TLBA(s) with the relevant ABB(s) on fly and writes the decoded code to the decoded output. The decoding process by D could then continue to the next PU and then until the end of the whole digital information input code. It knows when to end the processing of the current PU by counting the number of values that have been decoded for the current PU. In the present case, after 8 CU values are decoded, D then moves forward to decode another PU, beginning by reading the CUV again to determine which PU group is to process and what rules and logics of the processing are to observe.
It is obvious from the preceding description, before encoding and decoding taking place, a LT structure has to be decided upon and set up with CU values assigned to the TLB As of the LTs of the LT structure. Since LT structure has to be decided upon and set up according to the modeling and matching prepared in the DP phase either by A and passed to C by A or done by C directly. This is the static way of determining and setting up the LT structure by going through the DP phase for collecting the data distribution statistics. The dynamic way of making adaptive LT structure is to set up the LT structure dynamically and adaptively while processing and encoding the digital information file, such as reading the first PU or the first 256 PUs or a certain another number of PUs, where appropriate, of the digital information file and analyzing its data distribution and then prioritizing and assigning the CU values found to a LT structure most appropriate to it. C then uses this as the initial LT structure for use. At the same time, C also keeps a register of the frequency counts of each of the CU values so far processed. When the frequency counts of the data distribution warrants a change of the assignment of CU values to the LT TLB As or together with a change of the LT structure, C updates the LT structure with the corresponding assigned values and begins processing the next PU or at any juncture using the newly updated LT structure. So such rules of when to make the change and updating have to be embedded in C and D and should be followed consistently. Or changes to the LTS and the CU values re-assignment could be made after every value encoding. As C knows what LTS and the assignment of CU values to the TLB As are and it keeps frequency counts of all the CU values having been processed, it could re-assign the CU values to the TLB As of the LTs whenever the ranking of the CU values processed has changed. C could even use a new LTS, such as changing from LTS 1.2.4+1 (i.e. 1.2.4+1 ABB) to LTS 2.2.4 when the frequency counts indicate the CU values are more evenly distribution than before. Rules about when to make the change have to be determined beforehand and to be implemented and applied by C and D consistently. For instance, when the 1st ranking CU value comes within the 10% margin of the 2nd ranking value, then LTS 1.2.5 could be changed to LTS 2.2.4 or vice versa.
[92] It is now the time to turn to discuss how individual PUs having different CUVs are
encoded and decoded according to the techniques and method of the present invention to illustrate how novel the present invention is.
[93] Assuming a LT structure of 1.2.4+1 is used, having the 8th CU values, the least frequently occurring value, Address Branched to the 7th CU value. The frequency count distribution of all the 8 CU values are 000, 001, 010, 011, 100, 101, 110 and 111 in descending order of frequency count, i.e. 000 being the most frequent one and 111 being the least frequent one. Using the LT structure 1.2.4+1, to process a PU having CUV 000, i.e. PUl with only 1 CU value found inside this PU, the following Diagram 12 helps to illustrate the encoding and decoding processes:
Diagram 12
PUl processing
Input Code
111 111 111 111 111 111 111 111
Output Code: encoded PUl with only 1 CU value
CUV The first and the only CU value
000 111 In Diagram 12, the Input Code shows PUl has only 1 CU value and the CU Value is 111, the PU having 8 counts of 111. So the encoded Output Code for it begins with a CUV of 000, indicating this PU has only 1 CU value found throughout the whole PU. Since there is only 1 CU value of 111, therefore just writing down one 111 into the Output Code file is enough. The other 111 input codes could be omitted; i.e. by inference, the other 111 values are assumed as being omitted. This encoding by C does not use the LT structure 124+1. Because as explained previously, the first encoded CU value has to be in the form of an Absolute TLB A. If absolute TLBA is used, it would have been 10: 11 with another ABB as 1. In such a way, it occupies 5 bits instead of 3. If the only CU value found is 000, then the Absolute TLBA for this is 00, represent TBN 00: and since there is only 1 value assigned to this LT1, there is no need to give 1 bit to it as its Table Size. So there are two rules to select from. One is to use the LT structure to encode the first CU value or to write down the CV value of the Input Code directly without using LT structure.
Diagram 12 selects not using the LT structure for PUl because of simplicity and without having the chance of using 2 more bits as explained above. This also illustrates that each of the PU groups, i.e. PUl, PU2, so on and so forth up to PU8, may have its own set of rules for C and D to process according to the design and the need for adjustment to the data distribution peculiar to the PU group being processed.
[94] Upon decoding, after reading the CUV, D knows that this PU has only 1 CU value
throughout the current PU to be processed, it reads the first encoded CU value, in this case the CU value written down directly, i.e. 111. It then writes down 8 counts of 111 into the decoded output file by using inference and according to the logics and rules programmed into D for the PUl group. So 24 bits are compressed into 6 bits, a saving of 18 bits.
[95] To process a PU of PU2 having only two CU values, one could use Absolute Address or Relative Addressing using the LT structure 1.2.4+1 set up. Diagram 13 gives an example of illustration as follows:
Diagram 13
Using Absolute Addressing and Relative Addressing in Processing a PU2 Input Code
110 111 111 110 111 110 111 111
Output Code using Absolute Addressing ABBs
1011 1011 1011 1011 1011 1011 1011 1011 0 1 1 0 1 0 1 1
Output Code using Relative Addressing
1011 0 11 0 11 0 11 0 11 0 11 0 11 0 11 0 1 1 0 1 0 1 1
Diagram 13 illustrates Absolute Addressing requires 40 bits for the 24 bits of the Input Code and Relative Addressing 33 bits for 24 bits. In this scenario, the data distribution is skewed to the 2 least frequently occurring CU values for the whole digital data input. It is the worst situation for using the LT structure 1.2.4+1 designed for a data distribution of the opposite skewness. It is no wonder that data expansion instead of data compression is achieved. However it shows that using Relative Addressing does have an advantage over using Absolute Addressing here.
[96] There is however another marvelous technique of doing the compression for PU of the group PU2. This requires a new approach and new paradigm and novel mind set. This is to alter the data distribution according to the best processing method. This is the use of the technique of Data Distribution Alteration. For decoding, the altered data set has to be recoverable and therefore additional information has to be added into the encoded output. But as long as the added information requires less number of bits used than other methods, this is the best method and technique for processing PUs of that particular PU group.
[97] By now, one perhaps may appreciate what to do. The answer is to make PUs of PU2 as
PUs of PUl, i.e. to change the data and the data distribution of PU2s into PUls. How this is to be done is illustrated in Diagram 14:
Diagram 14
The technique of Data Distribution Alteration Input Code
110 111 111 110 111 110 111 111 Altered Input Code
111 111 111 111 111 111 111 111
This is simple and yet novel. In the Input Code, there are 3 values of 110 and 5 values of 111. So the easiest way of making PU2 into PUl is to alter the 3 values of 110 into 3 values of 111, instead of the other way round.
[98] Diagram 15 below shows the encoded Output Code using technique of Data Distribution Alteration:
Diagram 15
Output Code encoded with technique of Data Distribution of Alteration Input Code
110 111 111 110 111 110 111 111
Output Code
CUV
001 111 110 10 000 011 101 The first 3 bits of the Output Code is as always, indicating the CUV and its also tells the C and D which rule set and logics of processing to follow. The logics of processing presented here is just an example, if one is novel in another way, another set of processing logics could be designed and used for processing PUs of PU2 group.
Returning to Diagram 15, after CUV, the output value is 111. This 111 represents PU2 is made into PUl having all values in the form of 111. The next output code 110 means it is to alter 110 of the Input Code of the current PU into 111 and there are 3 values of 110 to alter; it is represented by the next output code 10. Since at most 4 possible values of a PU2 have to be altered to make it a PUl, only 2 bits are required for this output code. The next three values in the Output Code are their respective position in the Input Code, i.e. 000 for the first position, 011 for the 4th position and 101 for the 6th position. Upon decoding, D uses the above explained rules and set of logics to decode the encoded Output Code and recover the original Input Code. So D reads the CUV and knows that it is PU2 and knows that it is to alter PU2 into PUl first according to the set of rules and logics built in for PU2 and reads the next value 111, and writes all 8 counts of 111, and then reads the next value 110, knowing that it is 110 to restore and reads the next value of
2 bits, in this case 10, knowing that there are 3 values of 110 to restore and reads the next
3 values of 000, 011 and 101, knowing that the 111 at the 1st, 4th and 6th position of the series of 8 counts of 111 value is to restore to 110 and does the restoration and decoding.
[99] It could be seen from Diagram 15 that if there are 3 values to alter and restore, the
number of bits used altogether for producing the Output Code is 20. If there are 4 values, the Output Code requires 23 bits for the 24 bits of the Input Code. So given the worst scenario it still makes 1 bit saving for compression.
[100] There is yet another way of using RTS and RA for encoding and decoding PU2s. This is through using adaptive LT structure. Diagram 16 below illustrates how to do this:
Diagram 16
Using RTS and RA with Adaptive LT Structures Input Code
110 111 111 110 111 110 111 111
1st Output Code with ABBs at the end of PU
CUV ABBs
001 1011 0 11 0 1 1 1 1 0 0 1
2nd Output Code with ABBs placed after the encoded CU values
CUV ABB ABB
001 1011 0 0 11 1 0 1 1 1 1 0
[101] In Diagram 16, this time instead of using DD A technique, C encodes the input code by using the standard LTS. The first value is therefore the CUV, the second value is 1011, the absolute TLBA of the 7th or the 8th CU values in the frequency order because of Address Branching. The following bit 0 means no switching of LT for finding the next value, the current LT is LT3 having 4 TLB As, so staying in the same table means reading only 2 bits for the next value as TBN of LT3 is not required to be written down. The next 2 bits is 11 and it represents 1011 again. As there are only 2 different CU values found in PU2, bit 0 in the following means staying, so representing using the same value, i.e. the 2nd CU value 111, as the next value; and bit 1 using the previous value, i.e. the 1st CU value 110, as the next value and update the current CU value registers, i.e. the current CU value register and the previous CU value register, accordingly.
[102] Upon decoding by D, the decoded Output code becomes 8 counts of 1011 with two ABBs of 0 and 1. So to make adjustment for the 8 counts, D goes back to the encoded code again and then it finds the first 1011 should be 110 because the first bit of ABB is 0, meaning the first or original assigned value to TLBA 10: 11. And the next bit 0 means staying within the same LT to find the next value and the corresponding TLBA is again 11, meaning 10: 11 but with an Address Branched Bit 1, so the second TLBA is to be interpreted as 111 instead of 110. After the 2 different CU values are found and adjusted, D could easily decode the remaining values of 1011 using the meaning given to Bit 0 and Bit 1, i.e. either staying with the same value or switching to the previous value, for making correction by writing down the value of either 111 or 110. The 2nd Output Code in Diagram 16 shows where the ABBs are placed after the encoded Address Branched CU values according to another design. The first Bit 0 after reading the first encoded CU value, 1011, is the ABB; i.e. after the encoded Address Branched CU value, 1 more bit should be read to distinguish which original CU value it represents. And the first Bit 0 means staying with the same LT. This use and meaning of Bit 0 is not to be changed and used the same way until another CU value is found. After finding all the 2 CU values of this PU, the meaning of 0 and 1 could be changed to mean using the same value as the current value or changing into the previous value and use it. This change of meaning of 0 and 1 represents the act of using Adaptive LT Structure. The decoded values in the preceding decoded output code could serve as the data set required for building up the next LT structure. Since it is a PU2 and only 2 different CU values should be present in the PU and so after the 2 different CU values are found, the LT structure of 124+ could be dispensed with and D, according to the preferred set of rules and logics of using RTS and RA for PUs of PU2, could then rely on the 2 decoded CU values written out previously as a reference making a new LT structure with 2 LTs only, each having only 1 TLBA. 1 TLBA as explained previously does not require bit assignment to the LBA portion. This means the act of staying or switching, i.e. by using the special switching bit, Bit 0 or Bit 1, could refer to the only TLBA of the 2 LTs, and there is no need to refer to the TBNs or the TLB As in order to save more bits.
[103] One could however give different meanings to Bit 0 and Bit 1 than given above. The logics and the set of rules used for RTS and RA could be designed in another way.
Another set of rules and logics of using RTS and RA could be illustrated in the following Diagram 17: Diagram 17
Using another set of rules and logics for RTS and RA with Adaptive LT Structures Input Code
110 111 111 110 111 110 111 111
1st Output Code with ABBs at the end of PU
CUV ABBs
001 101111011 0 1 1 1 1 0 0 1
2nd Output Code with ABBs placed after the encoded CU values
CUV ABB ABB
001 1011 0 11011 1 0 1 1 1 1 0
Another set of rules and logics could be like that, since there should be only 2 different CU values in the PU. One could use special Bit 0 as staying with the same value instead of staying with the same LT. If Bit 0 is used as such, the Bit 1 should be re-defined to mean switching to the remaining CU value. This re-defining of meaning of the special Bit 0 and Bit 1 may be acceptable as the ensuing values after the first CU value may be the same as the first CU value, using special Bit 0 as staying with the same value may also be good for such cases. So after using the special Bit 0 as such, and special Bit 1 used as switching to another CU value, then the Absolute TLB A for the next new CU value should be used instead of using the relative TLB A. After the second CU value is found and decoded, one could resume using the adaptive RA for decoding upcoming values. In this way, however the use of 0s and Is should be used consistently according to the set of rules and logics for processing PU2 by C and D in the same way. Using the RTS and RA as explained in the above example requires only 18 bits in the first case and 20 bits in the second case. But it is not to say the second set of rules and logics must be inferior than the first as there may be cases where after the first CU value is found, it repeats itself for a number of entries. In the latter case, the second set of rules and logics may become better. But in general, it demonstrates using RTS and RA using the above two sets of rules and logics both could improve the compression rate.
[104] One could therefore decide to see if one wishes to use the technique of Data Distribution Alteration or the technique of RTS and RA as explained above. Other techniques introduced in this invention could also be added to help with making higher and higher compression rate.
[105] For the schema of using PUs introduced here, a 3-bit CU could be used to represent 8 different bit values (or bit patterns) whether repeating or not. They could be of the same value or pattern or could all be different from one another, the former case is PUl and the latter case PU8 and there are other six intermediate cases from PU2 to PU7.
[106] Altogether, for these 8 PU groups, the total permutations with repetition of the bit values or bit patterns are in the order of 8 power 8 as listed below: Diagram 18
No of Permutations (with Repetition except PU8) for each PU group from PU1 to PU8
PU1 = 8
PU2 = 7112
PU3 = 324576
PU4 = 2857680
PU5 = 7056000
PU6 = 5362560
PU7 = 1128960
PU8 = 40320
It is interesting to note that only PU8 by definition has no repetition within its permutation of bit values or bit patterns; all other PU groups have one, PU7, to eight, PU1, value(s) or pattern(s) repeating.
[107] Since there is no bit value or bit pattern repeating in the PU of PU8, it is the group with the most evenly distributed bit pattern of data. As the compression science relies very much on a data distribution with frequency variation or a skewed data distribution, and on the principle of assigning more number of bits to values or patterns with higher frequency of occurrence and less number of bits to those with lower and lower frequencies as does the use of Multiple Location Tables outlined above, PU8 is not a very good candidate for making compression based on the aforesaid principle.
[108] RTS and RA in this case does not help much in making compression of PU8 group of data, instead expansion of data is the result. This is because the data distribution of PU8 group is extremely even, involving frequent switch of LTs, thus unable to take advantage of RTS and RA. So another technique is revealed here to complement the techniques and method introduced above. This technique is called Rank Code Pair Numbering (RCPN). To use Rank Code Pair Numbering for making data compression, it involves change of the code ranks into numbers, i.e. using numbers to represent the ranking of the code values. To illustrate how it does the work and the mechanism involved, one could continue with the example of Diagram 6 and Diagram 7 for encoding the bit pattern of a PU of PU8. The PU of this PU8 has all 8 different CU values or bit patterns, from 000, 001, 010, 011, 100, 101, 110 and 111 in descending order of frequency counts. Using RTS and RA expands the data. However using RCPN makes very significant reduction of the number of bits used for encoding the data patterns. Rank Codes here could either refer to the ranking according to frequency count or to natural ranking; natural ranking refers to the binary number as represented by the CU. For the present case, a CU could have 8 binary numbers, and on decimal scale from 1 to 8. So in Diagram 19, Decimal 1 is given the highest natural rank and Decimal 6 the lowest natural rank.
[109] The RCPN technique revealed in the present invention applies to PUs having consecutive different CU values one after another; it does not apply to PUs having repeating CU values. So strictly speaking, it applies only to PU8. However, one variant of the other PU groups could also be qualified for using this techique; i.e. when all the different CU values of the PU come one after another consecutively without repeating values in- between.
[110] Using PU8 as an example first, the following Diagram 19 to 23 demonstrate how RCPN works:
Diagram 19
Adding of A Pair of Rank Codes & Rank Code Pair Numbering
Figure imgf000032_0001
Diagram 20
Rank Code Pair Number Table for 6 Rank Codes (RCPNT6)
Rank Code Pair Number Rank Code Pair
Decimal Binary
1 00000 1+2
2 00001 2+1
3 00010 1+3
4 00011 3+1
5 00100 1+4
6 00101 4+1
7 00110 2+3
8 00111 3+2
9 01000 1+5 10 01001 5+1
11 01010 2+4
12 01011 4+2
13 01100 1+6
14 01101 6+1
15 OH IO 2+5
16 01111 5+2
17 10000 3+4
18 10001 4+3
19 10010 2+6
20 10011 6+2
21 10100 3+5
22 10101 5+3
23 10110 3+6
24 10111 6+3
25 11000 4+5
26 11001 5+4
27 11010 4+6
28 11011 6+4
29 11100 5+6
30 11101 6+5
Diagram 21
Rank Code Pair Number Table for 4 Rank Codes (RCPNT4)
Rank Code Pair Number Rank Code Pair
Decimal Binary
1 0000 1+2
2 0001 2+1
3 0010 1+3
4 0011 3+1
5 0100 1+4
6 0101 4+1
7 0110 2+3
8 0111 3+2
9 1000 2+4
10 1001 4+2
11 1010 3+4
12 1011 4+3 Diagram 22
Rank Code Pair Number Table for 2 Rank Codes (RCPNT2)
Rank Code Pair Number Rank Code Pair
Decimal Binary
1 0 1+2
2 1 2+1
Diagram 23
Rank Code Pair Numbering for PU8
Input Code
000 001 010 011 100 101 110 111
Corresponding Ranking of the above Code
st 2nc' 4tn 5tri (J1 1 tri 3tn
Output Code
1st RCP 2nd RCP 3rd RCP
(written directly)
000 001 10000 0000 ] Excluding the PU8 header, i.e. the CUV, the first two CU values are written directly as they are into the Output Code. Because for a PU8, for the first two positions, there are 56 permutations for the two CU values out of the 8 possible CU values, it requires 6 bits to represent; so there is no need to convert the first Rank Code Pair (RCP) into a binary number because the binary number so converted still requires 6 bits, thus no saving at all. But the case is different for the 2nd RCP, the 3rd and the 4th in a PU8. For the 2nd RCP, as there are two out of the six remaining possible CU values to choose from as two CU values have been fixed and will not appear in the chain of the remaining 6 CU values to come, the possible permutations are 30 only. So this requires only a 5-bit binary number to represent the 2nd RCP; for the 3rd RCP, two out of four and 4-bit binary number; and for the 4th RCP, two out of two and 1-bit binary number. To convert the 2nd RCP, one uses the RCPNT6 in Diagram 20. To convert the 3rd RCP, one refers to the RCPNT4 in Diagram 21; the remaining CU values are the 5th , 6th , 7th and 8th in the ranking. RCPNT4 is numbered using 1st , 2nd , 3rd and 4th in the ranking. So to obtain the RCPN for the 5th and 6th CU values, one has to substitute the highest rank code, the 5th one, in the Input Code as the 1st one in the RCPNT4. Re-ranking the remaining CU values makes the 5th become the 1st, the 6th the 2nd , the 7th the 3rd and the 8th the 4th . The 4th RCP is converted likewise in the same manner by re-ranking the remaining CU values after taking out the 3rd RCP CU values and by using RCPNT2 in Diagram 22.
] So it could be seen that a 24-bit PU8 could now be represented by using 4 RCPs, using 6+5+4+1=16 bits only, a saving of 8 bits. If adding the 3 -bit requirement of the PU Header, the CUV, 19 bits are required, still a saving of 5 bits. So whether this RCPN is to be used depends on how many bits are required for each RCPN of a PU and also the number of bits required for the header information of the PU being processed.
Diagram 18 shows the number of permutations (with repetitions except PU8) for all the 8 PU groups discussed in the present invention. For instance, it shows PU7 has 1,128,960 permutations. But not all these permutations of PU7 could use the technique of RCPN to compress because only one scenario of these PU7 permutations is suitable for using RCPN for this purpose, i.e. the scenario where all of the 7 different unique CU values come one after another consecutively without intervening repeating CU value. This is so for other PU groups, such as PU6, PU5 and PU4. PU1 has no RCP and PU2 and PU3 have only 1 RCP that even when converted gives no bit saving.
Applying the technique of RCPN to the above case with 8 different unique CU values one after another, 24 bits are reduced to 16 bits. The following Diagram 24 lists out the bit saving figures for cases having 7, 6, 5 and 4 different CU values in a chain
consecutively using this RCPN technique:
Diagram 24
Bit Usage for Using RCPN for Different Consecutive CU Values in a Chain
Number of CU Number of Bits Number of Bits Last Total
Values in Chain of Input Code 1st RCP 2nd RCP 3rd RCP/ RCP/ Bits
Last Code Code
C8 24 bits 6 bits 5 bits 4 bits 1 bit 16 bits
C7 21 bits 6 bits 5 bits 4 bits 1 bit 16 bits
C6 18 bits 6 bits 5 bits 4 bits 15 bits
C5 15 bits 6 bits 5 bits 2 bits 13 bits
C4 12 bits 6 bits 5 bits 11 bits
PU8 or C8 has been described in Paragraph [110] to [112]. For the case of 7 different unique CU values in a Chain (C7), the bit saving is 5 bits, C6 3 bits, C5 2 bits and C4 1 bit. The case of C5 has some special points to note. It requires 2 bits for its last code. Because after 4 CU values appear, only 4 remaining CU values are yet to come, so 1 out of the 4 remaining values could be represented by using 2 bits. Furthermore in the encoding process, one could also try to take in one more value, adding it to the chain of C4 and C5 for saving more bits for compression. This is so because for the chain of C4 and C5, the CU value which comes in to break the chain is either one of the preceding 4 or 5 CU values, that means for the case of 4, 2 bits are enough to represent this repeating 3-bit CU value, resulting in a saving of 1 bit; this is a sure win option. Or one could adopt a gambling option, using Bit 0 to represent the upcoming CU value if it is the 4th CU value and Bit 1 to represent if it is not the 4th CU value. If using Bit 1, the breaking value of the chain should therefore be 1 out of the first 3 preceding CU values. And one still could use another two bits to represent the 1 outcome out of the 3 possible preceding CU values, resulting in using 3 bits for this 5th CU value and no gain nor loss in bit. In the case of 5, to take in another CU value to make it into 6 values for compression, one could only gamble on using Bit 0 or Bit 1 strategy and could not have a sure win option of saving 1 bit because the number of preceding CU values is 5 and 3 bits are required to represent 1 out of 5 possible outcomes. Therefore if Bit 1 turns out to be case, this gambling strategy will entail losing 1 bit. If Bit 0 comes up, 2 bits are saved. The gambling strategy could also be used for the cases of C6 and C7 to take in one more value for compression. But the chance of losing out increases from 4/5 to 5/6 and 6/7 respectively. So the gambling strategy is not recommended for C5, C6 and C7 being discussed here.
Diagram 25 below tables the number of bits saving using this RCPN technique: Diagram 25
Bit Saving for Using RCPN for Different Consecutive CU Values in a Chain
Number of CU Number of Bits Number of Bits Number of Bits
Values in Chain of Input Code of Output Code Saved
8 24 bits 16 bits 8 bits
7 21 bits 16 bits 5 bits
6 18 bits 15 bits 3 bits
5 15 bits 13 bits 2 bits
4 12 bits 11 bits 1 bit ] To change from the previous processing using a different set of rules to process these chains of consecutive different CU values, one has to use some signature for the making the change and indicate which chain, i.e. 8 different values or 7 or 6 or 5 or 4, one is going to process. If the number of bits representing these signatures are more than the number of bits saved by using RCPN as indicated in the above Diagram, one better forgets using RCPN. So the possible candidates that may be appropriate for
implementing the special processing using RCPN is the Chain 8 and 7, and barely the choice of Chain 6. So one has to collect again the statistics of the data distribution of the digital information to be processed or being processed and see if these special chains occur very frequently or very rarely in the actual data so that the use of RCPN could make bit saving. If any of these chains come up very frequently, then one could design using very short signature(s) (or specially reserved TLB As) for initiating the special processing of RCPN for it. If they occur very rarely, then the use of RCPN may not be appropriate at all.
] In the aforesaid discussion, the processing of PUl, PU2 and PU8 as well as C8, C7, C6, C5 and C4 has been revealed with the use of the following techniques:
(1) the use of Multiple Location Tables having Table Location Binary Address (TLB A) in the form of TBN:LBA; (2) the use of Location Table Structure including the number of Location Tables to be used, the number of bits assigned to TBN determining the maximum number of Location Tables that could be used, for instance, 1 bit two LTs, 2 bits four LTs, 3 bits 8 LTs, so on and so forth; the number of bits assigned to LB A of each of the Location Tables determining the Table Size of the Location Table, i.e. the number of LB As of each of the Location Tables, LB A being used for the assignment or accommodation of CU values for use and therefore as Location Reference for identifying the CU values being assigned to, for instance 0 bit 1 LBA (using the respective TBN: as the TLB A), 1 bit 2 LB As, 2 bits 4 LB As, 3 bit 8 LB As, so on and so forth; the use of the technique of Address Branching giving rise to assignment of Address Branched Bits (ABBs) to the LBA of the CU value to which another value is assigned;
(3) the use of Dynamic Adaptive Location Table Restructuring and the use of more than one and different Location Table Structures in the encoding and decoding process where appropriate;
(4) the use of TLB A as special signature for special processing required;
(5) the use of Absolute Addressing and Absolute Addresses;
(6) the use of Relative Addressing and Relative Addresses;
(7) the use of Table Switching such including Absolute Table Switching and Relative Table Switching;
(8) the use of CU and CU values; CU being the basic unit for the process of encoding and decoding; the size of the CU determining the number of CU values that a CU could represent, 1 bit 2 values, 2 bits 4 values, 3 bit 8 values, so on and so forth; CU values being the values appearing in a CU;
(9) the use of PU; PU consisting of a certain number of CUs, being the basic unit for processing CUs, a higher level unit of processing;
(10) the use of CUV Signature for classifying and grouping PUs for the purpose of encoding and decoding;
(11) the use of the technique of Data Distribution Alteration; and
(12) the use of Rank Code Pair Numbering for specifying a RCP Number to a RCP; the use of the conversion and re-conversion (i.e. reversion) of a Rank Code Pair to the corresponding RCP Number using a RCPNT.
Advantageous Effects
[119] It has long been held in the data compression field that pure random binary numbers could not be shown to be definitely subject to compression. Here below is an
investigation of the technical problem of whether random binary numbers could be compressed or not, using the schema, method, and techniques revealed in the present invention. The answer provided here is applicable to all binary numbers not limited to any particular bit size; however, the bit size of the binary number should be manageable by the computer at the time of processing.
[120] The answer lies with using the Multiple Location Table Structure together with the use of the technique of Relative Table Switching and Relative Addressing combined with Absolute Addressing as revealed in the present invention. Diagram 26 continues with the example of using the 8 CU values of a 3 -bit CU:
Diagram 26
8 CU Values of 3 -bit CU
000 001 010 011 100 101 110 111 One could design a LTS as shown below in Diagram 27, showing that a LTS 4,2,2 with 3 LTs is used, where LT1 uses 2 bit LB A, LT2 and LT3 1 bit LB A.
Diagram 27
LTS 4.2.2 for 8 CU Values of a 3-bit CU
TBN TBN bits LB A bits Assignment of CU Values
Group 1 (4 Styles 100, 101, 110, 111) Group 2 (2 Styles 010,011)
Group 3 (2 Styles 000,001)
Diagram 28
Bit Usage with the Use of RTS and RA combined with AA for a LTS 4.2.2
Start Group TBN Start Stay To 1 : 01 : 00:
Group 1 1 :
Bit Pattern lxx Oxx l lx lOx
Bit Number 1+2 1+2 1+1+1 1+1+1
Group 2a 01 :
Bit Pattern Olx Olx lxx OOx
Bit Number 2+1 2+1 1+2 1+1+1
Group 2b 00:
Bit Pattern OOx OOx lxx Olx
Bit Number 2+1 2+1 1+2 1+1+1
Diagram 29
LTS 4.4 for 8 CU Values of a 3-bit CU
LT TBN TBN bits LB A bits Assignment of CU Values
LT1 0: 1 2 Group 0 (000,001,010,011)
LT2 1 : 1 2 Group 1 (100,101, 110,111)
Diagram 30
Bit Usage with the Use of AA for a LTS 4.4 Group TBN Start Stay To 0: 1 :
Group 0 0
Bit Pattern Oxx Oxx lxx
Bit Usage 1+2 1+2 1+2
Group 1 1
Bit Pattern l lx lxx Oxx
Bit Usage 2+1 1+2 1+2
Diagram 31
Bit Usage with the Use of RTS and RA combined with AA for a LTS 4.4
Group TBN Start Stay To 0: 1 :
Group 0 0
Bit Pattern Oxx Oxx lxx
Bit Usage 1+2 1+2 1+2
Group 1 1
Bit Pattern lxx Oxx lxx
Bit Usage 2+1 1+2 1+2
[122] In Diagram 28, there is a new feature of RTS and RA combined with AA. Staying within the same LT or LT within the same group still uses Bit 0 and Switching LT still uses Bit 1. That means, when switching (or staying) to another LT within the same group, i.e. switching from Group 2a to Group 2b or vice versa, or switching (or staying) within the group beginning with the same first TBN bit, using the Absolute TBN makes no mistakes in distinguishing which LT to refer to, thus saving the use of 1 Switching Bit. The result in Diagram 28 makes no difference from using AA (Absolute Addressing) as shown in Diagram 29 and 30 above or using RTS and RA as in Diagram 31.
[123] Diagram 28, 30 and 31 all indicate that the bit usage is the same as the original input code without gain nor loss in bit.
[124] Diagram 32 shows a new LTS 2.2.2.2 with its Bit Usage in Diagram 33 :
Diagram 32
A LTS Design, LTS 2.2.2.2, for 8 CU Values of a 3-bit CU
LT TBN TBN bits LB A bits Value Assignment
LT1 11 : 2 1 Group 0a (110,111 )
LT2 10: 2 1 Group 0b (100, 101)
LT3 01 : 2 1 Group la (010,011)
LT4 00: 2 1 Group lb (000,001) Diagram 33
Bit Usage with the Use of RTS and RA combined with AA for a LTS
Group TBN Start Stay To 11: 10: 01: 00:
Group 0a 11:
Bit Pattern llx Ox llx lOlx lOOx Bit Usage 2+1 1+1 2+1 1+2+1 1+2+1 Group 0b 10
Bit Pattern lOx Ox llx lOlx lOOx Bit Usage 2+1 1+1 2+1 1+2+1 1+2+1 Group la 01:
Bit Pattern Olx Ox lllx HOx lOx Bit Usage 2+1 1+1 1+2+1 1+2+1 1+1+1 Group lb 00:
Bit Pattern OOx Ox lllx HOx lOx
Bit Usage 2+1 1+1 1+2+1 1+2+1 1+1+1
Diagram 34 is another design of LTS 2.2.2.2, having sub-TBN structure:
Diagram 34
Another LTS 2.2.2.2 for 8 CU Values of a 3-bit CU
LT TBN TBN bit Sub-TBN Sub-TBN LBA Value Assignment bit bit
LT1 1: 1
LTla 0: 1 1 Group Oa (110,111)
LTlb 1: 1 1 Group Ob (100,101)
LT2 0: 1
LT2a 0: 1 1 Group la (010,011)
LT2b 1: 1 1 Group lb (000,001)
Diagram 35
Bit Usage with the Use of RTS and RA for LTS 2.2.2.2 in Diagram 34
Group TBN & Start Stay To 1:1: 1:0: 0:1: 0:0:
Sub-TBN
Group Oa 1:1:
Bit Pattern llx Ox llx 1 Olx lOOx
Bit Usage 2+1 1+1 2+1 1+2+1 1+2+1
Group Ob 1:0:
Bit Pattern llx Ox llx lOlx lOOx
Bit Usage 2+1 1+1 2+1 1+2+1 1+2+1 Group la 0: 1 :
Bit Pattern Olx Ox l l lx HOx lOx
Bit Usage 2+1 1+1 1+2+1 1+2+1 1+1+1
Group lb 0:0:
Bit Pattern OOx Ox l l lx HOx lOx
Bit Usage 2+1 1+1 1+2+1 1+2+1 1+1+1
[126] It could be seen that Diagram 34 is just another form of LTS expression with sub-LT structure. Its effect is just like Diagram 32, resulting in data expansion because of there being higher chance of switching to another LT than staying to the same LT.
[127] In Diagram 35, just like in Diagram 33, the TBN portion including the Sub-TBN portion counts to 2 bits and the LBA portion remains 1 bit. So the use of sub-LT to a main LT is another variant of the LTS being revealed in the present invention. In Diagram 34, there are two main LTs and each of which are further divided into two sub-LTs. The number of LTs are then 4, the same as the LTS in Diagram 32. Diagram 34 therefore shows that there could be a hierarchy of Location Tables, i.e. main LTs with sub-LTs; and further levels or layers of sub-LTs could be added where appropriate for the purpose of classification of data. Taking into account of the use of sub-LTs, the TBN portion of the TLB A should include the sub-LT TBN, and as such the TBN of Group 0a in Diagram 35 becomes 1 : 1 :, and Group 0b 1 :0:, Group la 0: 1 : and Group lb 0:0:. If additional levels or layers of sub-LTs are found, the TBN portion should include them likewise. When writing the TBN portion into the encoded code, C as before omits the : notation and writes directly 11 for Group 0a, 10 Group 0b, 01 Group la and 00 Group lb. This is to be followed by the LBA portion as discussed before. D reads and interprets TLB A from the encoded code for decoding in the same manner as C does write and encode.
[128] So if the same number of binary numbers are grouped according to the LTS found in
Diagram 29, the use of RTS and RA works just like AA as indicated in Diagram 30 and 31 and also indicated in Diagram 28. While if the binary numbers are grouped according to Diagram 32 or 34, the use of RTS and RA combined with AA only results in data expansion. So the use of RTS and RA as well as AA could only at best produce an encoded code with no loss or no gain in bits if the binary bits are truly random and even. The advantages effects of using LTS with RTS and RA (combined with AA) to using AA only is that it provides higher chance of making compression when the data distribution is not truly random and even. This is because using LTS with RTS and RA (combined with AA) provides an intrinsic property or characteristic, with the use of Switching and Staying Bit, that adds compression saving in addition to the uneven nature of data distribution of the digital information to be processed. Through the use of Staying Bit, it reduces the bit usage wherever there are repetitive CU values coming in a chain. And through grouping clustering CU values together in the same LT or same LT group, it reduces further the use of Switching Bit, thus avoiding extra bit usage. The use of RTS and RA also provides the framework where Switching Bit and Staying Bit could be used for using Relative TBN in addition to the only use of Absolute TBN. The use of
Switching Bit of RA could result in the same bit usage as that given by using AA. And the use of Staying Bit of RA could provide bit saving that could not provided by using only AA.
[129] So there are two kinds of compression savings, one kind is due to uneven data
distribution (that could be achieved by simply using AA) and another due to some intrinsic property or characteristic of the compression technique used, just like using RTS and RA and RTS and RA combined with AA together with the design of LTS as revealed in the present invention. With uneven data distribution, whether using RTS and RA or AA or their combination together with using 2 LTs or more LTs, there could be compression saving if the frequency counts of the data could be known and the LTS is designed and matched well with the data distribution. And using RTS and RA combined with AA provides an additional edge over AA alone in making compression in the long run when a suitably designed LTS is used where the data distribution is not truly even and random but seems to be.
[130] The use of RTS and RA with or without combination with AA with Multiple Location Table Structure has its intrinsic value in compressing digital information and does not totally rely on but also adds strength to the presence of an uneven data distribution for compressing data. That is, digital information with uneven data distribution could be compressed by using AA. However adding the use of RTS and RA, more compression bits could be squeezed out.
[131] Multiple Location Table Structure also provides a very flexible structure that could be used to model and match the data distribution of the digital information so that the best compression rate could be approached and approximated. It allows also for the selection of different Compression Units (CU) and Processing Units (PU) for encoding and decoding purpose, providing chances of compressing digital information on different scales, such as on 3 bits ternary scale or 8 bits octary scale or 16 bits hexadecimal scale, etc.
Best Mode
[132] It is quite difficult to assess what is the best mode for the present invention as many
techniques have been introduced and used in combination. The proof that the use of RTS and RA combined with using AA (starting the use of RTS and RA involving the use of Absolute TLB A, and therefore AA, for encoding the first CU value in the chain anyway) has its intrinsic value in compressing data in addition to taking advantage of the opportunity for making compression arising from the uneven nature of the data distribution of the digital information seems to suggest the best mode for the present invention is the use of RTS and RA (combined with using AA) under Multiple Location Table Structure. One could use different Compression Units and Processing Units as appropriate. Mode for Invention
[133] The following is a comparison of how the use of a 3-bit CU and a 8-bit CU affects the construction of the data distribution model that may provide more information about what size of CU one may decide to choose. It seems that a bigger CU size provides more flexibility and more elbow room in the design of LTS and makes the data distribution appearing to be more uneven and therefore amenable to compression.
[134] The first four paragraphs from Paragraph [1] to [4] have been extracted and turned into a simple Notepad type text file. A simple statistical analysis of it gives the following figures in Diagram 36 and Diagram 37 below:
Diagram 36
Some Statistics for The First 4 Paragraphs using CU3 and PU24
Total Number of PU processed = 1397
Number of CU values found = 8
PU with 2 different CU values = 145
PU with 3 different CU values = 423
PU with 4 different CU values = 450
PU with 5 different CU values = 286
PU with 6 different CU values = 80
PU with 7 different CU values = 13
CU values
6129
1587
929
671
580
526
388
366
Diagram 37
Some Statistics for The First 4 Paragraphs using CU8 and PU2048
Total Number of PU processed = 17
Number of CU values found = 54
PU with 20 different CU values = 1
PU with 21 different CU values = 1
PU with 23 different CU values = 1
PU with 24 different CU values = 3
PU with 26 different CU values = 3
PU with 27 different CU values = 2 PU with 28 different CU values = 3
PU with 30 different CU values = 1
PU with 34 different CU values = 1
PU with 42 different CU values = 1
CU values :Number of Counts
0: 2256
32: 310
101 : 206
111 : 152
105: 138
116: 131
115: 116
114: 114
97: 113
110: 110
100: 88
108: 83
99: 79
104: 48
109: 46
117: 46
112: 40
102: 29
103 : 29
44: 22
118: 22
121 : 19
98: 17
119: 14
68: 13
40: 12
41 : 12
46: 9
45: 8
120: 8
47: 5
77: 5
82: 5
107: 5
9: 4
91 : 4
93 : 4
10: 3 13 : 3
65: 3
73 : 3
86: 3
72: 2
79: 2
83 : 2
49: 1
50: 1
51 : 1
52: 1
66: 1
67: 1
84: 1
254 : 1
255 : 1
[135] Diagram 36 uses 3 bits as the CU and 24 bits as the PU and Diagram 37 8 bits the CU and 2048 bits the PU. That means the 3 -bit CU could have 8 variations and could be anyone of the 8 CU values and 24-bit PU could host 8 counts of 3 -bit CU values; and 8- bit CU could have 256 variations and could be anyone of the 256 CU values and 2048-bit PU could host 256 counts of 8-bit CU values. In processing and analyzing the first four paragraphs of the present invention, it is found that it could be turned into 1397 units of 24-bit PU and 17 units of 2048-bit PU. The figures suggest some discrepancy and this is due to the fact that the last unit of the first four paragraphs in each case is not a whole unit but padded with Bit Zeros to form a whole unit for ease of representation. Leaving aside this discrepancy, one could still find some significant points to note.
[136] For the case of 24-bit PU, after processing the first 5 PUs out of the total of 1397 PUs all the 8 CU values have appeared whereas for the 2048-bit PU case, after all the 17 PUs are processed only 54 CU values out of the 256 CU values appear. At the first glance, then if only 1 LT is used for both cases, for the 24-bit PU case a LT with 3-bit LBA using 3 bits to represent 3 bits could not make any compression after processing the first 5 PU.
Whereas for the 2048-bit PU case, throughout the processing of all the digital information, as only 54 CU values out of 256 possible CU values appear, using one LT with 6-bit LBA is enough. Just by this fact, 2 bits could be saved.
[137] For the case of 2048-bit PU, none of the 17 units have all the 54 CU values present. The PU having the highest number of different CU values found counts to 42 only and the lowest number is 20. That means if using more than 1 LT, then some of the LTs could use LBA with bit size less than 6. For instance, if one uses 2 LTs, then 54 CU values could be divided into 2 halves, each having 27 CU values which requires only 5 bits to hold.
However, because there are 2 LTs used, 1 bit must be used for the TBNs of each of the 2 LTs. Therefore 1-bit TBN plus 5-bit LBA does not make any saving at all as comparing to using 1 LT of 6-bit LBA. Using 3 or 4 LTs, one have to use 2 bits for the TBN and so the LB A portion of the TLB A should be made shorter in bit size. For instance, if one uses 3 LTs, then there could be one LT with 5-bit LBA hosting 32 CU values, and the other two LTs share the remaining 12 CU values. In this case, the 12 CU values are better to be divided into 1 LT with 8 values and another LT with 4 values than into 2 LTs each of which with 6 values. As a 3 -bit LBA could host from 5 to 8 CU values, if dividing into 2 LTs with 6 values each, then both LTs should have 3 -bit LB As, whereas the case of one LT with 8 values and another with 4 values requires the LT with 8 values to have 3 -bit LB As and the other LT with 4 values to have 2-bit LB As. Diagram 38 below shows the bit usage of these two LTS designs:
Diagram 38
Bit Usage of LTS Designs Assuming CU Values with Equal Frequency Count
TBN Bits LBA Bits Bit Usage Total Bit Usage LTS 6.6.32
LT1 6 values 2 3 6x5 = 30
LT2 6 values 2 3 6x5 = 30
LT3 32 values 2 5 32x7 = 224
284
LTS 4.8.32
LT1 4 values 2 2 4x4 = 16
LT2 8 values 2 3 8x5 = 40
LT3 32 values 2 5 32x7 = 224
280
LTS 54
LT1 54 values 0 6 54x6 = 324 324
[138] It could be seen from Diagram 38, if using 3 LTs, LTS 4.8.32 seems a better choice than LTS 6.6.32. For both LTSs, the TLBAs of their LT3 all use 7 bits, still one bit less than the original code of 8-bit CU value. As comparing with LTS 54, LT3 of the other two LTSs use 7 bit, one bit more than the 6 bits of LT1 of LTS 54* however LT1 and LT2 of these other two LTSs have shorter bit size than 6 bits. The Total Bit Usage in Diagram 35 shows using 3 LTs is better than using 1 and using LTS 4.8.32 is better than LTS 6.6.32. This still assumes the frequency count of each of the 54 CU values are the same. If not the same, and assigning values according to the principle of more frequently occurring values being given less bits, then LTS 4.8.32 should achieve much more bit saving than the other LTSs. This also shows the merits of using LTS with multiple LTs in making data compression.
[139] So collecting statistics about the data distribution of the digital information is important.
If the frequency counts of each of the occurring different CU values are found, and if one still is not sure about which LTS to use, then actually one could calculate the Grand Total Bit Usage using Total Bit Usage times the Frequency Counts with CU values ranked higher in frequency count assigned to the LTs using less bits in successive order. So for making best compression, one could then compare the Grand Total Bit Usage amongst different LTSs to find out which LTS is the best to use. However, if using RTS and RA, this is only an approximation because it does not take into account the figures of bit saving due to the use of Staying Bit 0. Using Staying Bit 0 means C does not have to look into another LT but stay in the same LT for encoding the next value. The use of Staying Bit 0 uses up one bit, the number of bits thus saved is the bit size of the TBN minus 1. So if the TBN bit size is 3 bits, then 2 bits are saved. The use of RTS and RA makes switching table using the same number of bits used as the bit usage of using Absolute Addressing with suitable LTS.
[140] To further optimizing the bit usage, one could as said before conduct an analysis of the pattern of clustering for the CU values found. Putting clustered CU values into the same LT minimizes the chance of having to use the Switching Bit 1 and maximizes the chance of using the Staying Bit 0. There could be many techniques for measuring the clustering pattern of the CU values found. For instance, for all values one could get the statistics of the average distance between one CU value and another CU value in the digital information stream; or more precisely, the chance of one CU value preceding another CU value and the chance of one CU value coming after another CU value. Such statistics could be done for all values. These figures, together with ranking priority list, could act as indicators for grouping CU values into different LTs.
[141] As Paragraph [137] observes, for the case of 2048-bit PU, none of the 17 units have all the 54 CU values present. The PU having the highest number of CU values found counts to 42 only and the lowest number is 20. This means that some CU values may not appear in all the 17 units and there could be some core CU values that appear in most of the 17 units if not all and some peripheral CU values that only appear in one or two units. The frequency count of occurrence of each of the CU values found is also shown in Diagram 29. The CU values, having only 1 count, i.e. showing up only once in one unit are value 49, 50, 51, 52, 66, 67, 84, 254, 255. This suggests that one could also divide CU values and group them into LT(s) with core CU values (core LT) and LT(s) with peripheral CU values (peripheral LT).
[142] Diagram 28 and Paragraph [122] has already described an extended form of RTS and RA (combined with using AA not just in the beginning of the encoding using RTS and RA); switching to another LT within a group helps saving one bit. But it appears quite difficult for RTS and RA to further maneuver when more and more LTs are required to accommodate the CU values found.
[143] When 5 LTs are used, the TBN for each LT should have 3 bits because 2 bits could only represent 4 LTs. So using RTS and RA, switching to another LT may have to write the Absolute TBN after the Switching Bit 1, thus resulting in a loss of one bit. However one could use another extended variant of RTS and RA. Instead of just expressing the switching to the previous LT and the next LT, one could express the switching by writing the Switching Bit 1 first and then use the concept of previous pair and next pair of LTs instead of the concept of previous one and next one LT. So for every switch of LT, C has to write in the Switching Bit 1 and then write another bit, Bit 0 representing the previous pair of LTs and Bit 1 the next pair of LTs and then write another bit to represent which one LT of the LT pair that one is going to specify; i.e. 110 or 111 or 100 or 101 for the other 4 LTs. This still makes the RTS using the same number of bits as using Absolute Addressing in switching LT and retains the advantage of saving TBN bits when staying in the same LT for looking for the next value for encoding and decoding.
[144] If one must use 6 or more LTs to represent all CU values, then the problem of adding one bit of TBN arises again when switching has to take place. When this happens, one could consider the use of the technique of In-Table Switching and Out- Table Switching. This is another form of extending RTS and RA.
[145] Paragraph [143] has shown how RTS and RA is used for a LTS with 5 LTs. If 6 LTs are used, using In-Table Switching and Out-Table Switching may help to retain using RTS and RA as if there are only 5 LTs are involved, thereby the steps for using 1 currect LT and 2 previous LTs and 2 next LTs could still be used without incurring additional bits used for TBN to refer to the 6th additional LT. This is especially useful for a group or groups of CU values which seldom occur and which more or less occur in a cluster; i.e. if one CU value occurs, other CU values in the group are also likely to occur one after another.
[146] So if there are two peripheral groups with the characteristics of appearing like that
described in Paragraph [145], one could use In-Table Switching and Out- Table Switching to switch between them. For instance, the LT5 and LT6 match the characteristics just described, then one TLB A of LT5 could be used to represent LT6 and vice versa. When LT5 becomes the current LT, and when the next CU value to come is in LT6, then C could just use the one LBA reserved for LT6 and writes this LBA, such as 11, to the encoded code (by using Staying Bit 0 and LBA Bit 11), and then update the current LT register to LT6 and then look up the LT6 LB As for CU values assigned inside. When the next CU value comes from LT5, then if LT6 is the current LT, writing the Staying Bit 0 and its reserved LBA for LT5, such as LBA 11 again, signifies the return to LT5 and then the current LT register is updated again. Switching LT in this way when the current LT is to be switched out and replaced by another LT direct is called In-Table Switching. If the current LT is not one of the pair of peripheral LT5 and LT6, such as LT1, then to switch to LT6 when LT5 is in one of LTs in the previous or next LT order, C writes the
Switching Bit 1 and then the relative TBN to LT5, for example, if LT5 is the next second LT, C writes Switching Bit 1 and 11 for the next second LT, and then writes 11, the reserved LBA for LT6, then LT6 is called in for looking up the next CU value. If the CU value is assigned to LBA 10 in LT6, then C writes 10 to represent the CU value. So the series of bits to be written for this next CU value in LT6 by C is 11110. This represents Out- Table Switching; i.e. when the current LT is not to be switched out, and when the LT to be used is not in the existing previous or next LT order, then using the LBA of the LT reserved for another LT not in the existing previous or next LT order helps to switch into the LT not in the existing order, replacing the LT in the existing order associated with it for looking up the LBA for the next CU value during the process of encoding or decoding. Of course there could be other designs for using RTS and RA. And as said earlier, it is not the aim here to exhaust all such scenarios, but to provide some examples to demonstrate how RTS and RA could be used in the implementation of the LTS schema for making data compression. One essential characteristic of RTS and RA is the use of Switching Bit and Staying Bit to refer to the targeted LT for the next CU value.
[147] Furthermore, it has to be noted here that how to decide on the LTS and the number of LTs to be used according to data distribution of the digital information to be or being processed is an art that requires modeling and matching between the LTS used and the data distribution of the digital information concerned. The present invention by way of illustration shows how statistics collected from the digital information could provide indicators for making the decision. The more vital statistics is collected the more precise the modeling and matching and the more compression achievable. For processing the digital information on fly, such statistics about the data distribution of the digital information has to be collected dynamically as the sequence of digital bits come up one after another. One could therefore at best pre-fix a LTS and fill up the LTS with the first values that come up and then re-adjust the LTS according to the data statistics that has been collected on fly at critical points that warrant re-adjustment of the LTS being used.
[148] The LTS schema (and its dynamic adjustment if data statistics is collected on fly or the situation under concern requires) together with the use of RTS and RA and AA therefore provides a way for modeling and matching with the data distribution of the digital information to be processed or being processed. This therefore turns more or less the art of modeling and matching into a subject matter that could be subject to scientific research as demonstrated in Paragraphs [134] to [141] and will be further elaborated below.
[149] Following on the discussion in Paragraphs [134] to [141], and after the discussion of the concept of peripheral LTs and techniques of In-Table and Out- Table Switching, one could see that such techniques and concepts are designed to maximize the chance of Staying in the same LT and minimize the chance of having to Switch to another LT for looking up the next CU value for encoding. This is essential to the use of RTS and RA. If CU values are grouped together into LTs in such a way that causes unnecessary LT Switching, this means more bits have to be used during encoding and this defeats the purpose of making data compression. So the clustering statistics mentioned in Paragraph [140] could be used for grouping CU values together into LTs where appropriate. So the art of modeling and matching becomes amenable to scientific study and operation. Also, as it has been found out that not all 256 8-bit CU values, but only 54, are found inside the first four paragraphs, a text file containing Paragraph [1] to [4], of the present invention, and not all of the 54 CU values found are found throughout the 17 units of PU2048, in the least 20 CU values in one unit, at the most 42 CU values in one unit, and that the CU values found occur in different frequency counts, the design of LTS for such a data distribution pattern could be conducted in a scientific manner. For instance, one could group values appearing only once or twice into peripheral LTs, and In-Table and Out- Table techniques could be used. And the core LTs could be limited to 4 and peripheral LTs 2, thus RTS and RA could be used for 4 core LTs with 1 peripheral LT in a series of 1 current LT, 2 LTs in the next order and 2 LTs in the previous order; the peripheral LT being paired with another peripheral LT to be In-Table switched or Out- Table switched, each of the peripheral LTs using one special TLBA for In-Table and Out- Table Switching. The two peripheral LTs could each, using 4 bit LB A, accommodate 15 LB As for the 31 least frequently occurring CU values being assigned to this pair of peripheral LTs (the two least frequently occurring CU values being Address Branched CU values); the other four core LTs share the remaining 23 CU values. Given the fact that the first ranking CU value 0 has 2256 counts and the second CU value 32 has 310 counts and the rest 21 CU values ranges from 206 counts of the third CU value 101, to 17 counts of the 23th CU value, one could use LT1 with 0 bit LB A, being able to accommodate only the first ranking CU value and LT2 takes the next 2 CU values using 1 bit LB A, LT3 takes the next 4 CU values using 2 bit LB A, and LT4 takes the next 16 CU values using 4 bit LB A.
] Another way of designing LTS involves dynamic change in assignment of CU values and more dynamic use of LTs. As indicated in Diagram 33 and mentioned in the preceding Paragraph [149], not all of the 54 CU values found are found throughout the 17 units of PU2048, in the least 20 CU values in one unit, at the most 42 CU values in one unit. So a LTS good enough to accommodate 42 CU values could be attempted. As there are 54 CU values altogether, some CU values have to be swapped out and others swapped it. This entails a record (to be put by C into the Header Section as Header Information to be used by D for decoding) of the dynamic incremental changes of these swapped-in and swapped-out CU values for every unit of processing. For instance, a master list of the 54 CU values ranked according to frequency counts in descending or ascending order could be recorded in the Header Section as well as the incremental changes of CU values for each of the units after the first unit. Using such information together with the LTS designed for and assigned with the initial 42 CU values, C could manage the swapping in and swapping out of CU values after each unit is read before doing the encoding for each unit. Or C could update the LTS and the associated change of CU values assignment, which reflects the incremental changes of the CU values being swapped-in and swapped- out. If the swapping-in and swapping-out of CU values are many upon each unit change, then there may be no point of using this dynamic way of handling incremental changes of CU values and the static design of a LTS hosting all the 54 CU values is preferred. On the other hand, the incremental changes are not frequent and very few upon each unit change, then it is preferred to the static design.
] The above suggestion is still more an art than a science. However, it sheds some light on how the scientific study and operation could be conducted further, such as what statistics are to be collected and how LTs are designed and how CU values are assigned to. The present invention therefore reveals the schema, the method and the techniques for the implementing the compression and decompression process as well as the foundation of applying the computing power nowadays for automating the process of modeling and matching between the schema and method used and the data distribution of the digital information to be processed.
] Given the enormous computing power one could have these days, each of the possible combinations of the LTS could be assessed using the statistics so collected about the data distribution of the digital information to be processed, the best LTS or the LTS which could give the most compression saving could be determined. The art of compression then becomes a subject of scientific endeavor with modeling and matching being done by computers these days. This modeling and matching for the determination of the best-fit LTS could be done by A and passed to C or by C alone altogether. D relies on the encoded code made by C to decode and recover the original code. The encoded code made by C has to use the necessary information about the size of CU, the size of PU if PU is used. Such necessary information also includes the LTS chosen to be used, including the number of LTs used and the associated sub-LTs used if any, the number of bits assigned to TBN and LBA for each LT and sub-LT used and the respective TBNs of the LTs and sub-LTs, as well as the assignment pattern of the CU values to the LTS, including how the CU values are grouped together and assigned to which particular LTs under the LTS chosen, together with the techniques, such as RTS and RA or AA or any special processing indicators, such as in the form of specially assigned TLB As for such purposes, to be used with the LTS chosen. The assignment of CU values to the LTS reflects the frequency ranking order and the clustering characteristics of each of the CU value assigned in accordance with the statistics collected in the DP stage or with a preselected statistics model where no such data statistics of the digital information to be processed is available and dynamic adjustment to this pre-selected LTS has to be made later according to the result of data distribution analysis conducted on fly where appropriate. Such necessary information stated above could be written as the Header Information in the Header Section of the whole Compressed Code File (CCF) to be used by D for decoding the encoded code in the Encoded Code Section following the Header Section.
[153] To list out the possible combinations of the LTS to be used for statistics collected about the data distribution of the digital information to be processed could be guided by the statistics obtained. For instance, if only 54 CU values are found, then the TLBA addresses at least should be no less than 54, assuming a static LTS is used, and the number of LTs should be no more than 54, i.e. from 1 to 54. And for each number and combination of the LTs used, one could list out the number of bits used by TBN of each LT, including the Sub-TBN, from 0 bit, representing only 1 LT used onwards to say 6 bits, which could represent 64 LTs, enough for the 54 CU values found, each with 0 bit LBA. The bit number of LBA associated with each of the LT of all the combinations of the LTs used could also be listed out too from 0 bit LBA to 6 bit LBA. Of course, the listing could be enormous, but the computing today should be more than enough for this. Also Bit Usage for each of the combinations of the LTS and the assignment pattern associated with them could also be computed; for instance, such bit usage result could be computed for LTS with 1 LT to 54 LTs and the best-fit LTS could be obtained for C to do the encoding. Or C could just do the actual encoding using all the combinations of the LTSs one by one and find out which one gives the best compressing saving.
[154] To take advantage of the benefits arising from the use of RTS and RA, one does not have to list out all the combinations of LTS and the associated LTs to be used and calculate the bit usage for them. For instance, for 8-bit CU, one could just evaluate the bit usage of LTSs with 3 LTs to 5 LTs or make the actual encoding of each of these LTSs to find out the one that gives the most compression saving. If more LTs are required to be used, either In-Table and Out- Table Switching could be employed or sub-LTs could be created to cater for the need of data classification where appropriate or necessary. Or a combination of the use of RTS and RA with AA could be attempted when more LTs have to be used. That is when doing Table Switching, the Switching Bit 1 is still used, however it is followed by the Absolute TBN to indicate to which LT it is switching and when doing Table Staying, Staying Bit 0 continues to be used for saving TBN bits. In this way, one has to balance with the 1 bit loss on Table Switching and the bit gain due to Table Staying. So grouping clustered CU values into the same LTs could help in reducing the extra bit used for Table Switching. What is said above also applies to CU of other bit sizes.
[155] So the CCF contains a Header Section and the Encoded Code Section and the Remaining Bits Section (RBS, containing the bits not processed because these bits do not make up 1 CU or PU as explained in Paragraph [156] below). The Encoded Code Section contains the encoded code, i.e. the TLB As and other special indicators produced by using the aforesaid compression techniques with the chosen LTS assigned with CU values. Such special indicators are special TLBAs assigned for special processing such as Terminating Signature, Special Processing for long sequence of consecutive CU value patterns, such as different CU values in a long chain or same CU value in a long row, or for In-Table and Out- Table Switching, or some CU values used in DDA or some bits representing ABBs as well as the CUV when PUs are classified and used as discussed previously.
[156] The Header Section contains the Header Information as mentioned in Paragraph [152]. It could also contain a Remaining Bits Unit (RBU) used for indicating how many bits are left without encoding at the end of the whole CCF in the Remaining Bits Section because if the CU is 3-bit and the whole Input Code File does not contain whole units of 3 bits, then there may be 1 or 2 bits that could not be processed. So the RBU is used to indicate how many bits are left not processed. This RBU could be just as the size of a CU if PU is not used. If PU is used, RBU should be the size of a PU. If the CCF goes on to be compressed for a second round or a third round, another piece of Header Information and RBU is added. This another piece of Header Information and RBU is based on the bit usage and the data distribution of the CCF of the previous round of compression. A checksum of the Header Information and RBU could be calculated and added ahead in the Header Section to distinguish a CCF from an ordinary digital input code file. The checksum is not necessary if there is another way that surely indicates the input file is a CCF or not a CCF, such as using File Extension Name for such distinction. The Header Section also could be put into a separate file, pairing with the CCF without Header Section. The pair of Header Section File and the Encoded CCF could be taken as a whole as a CCF. The Header Section when merged with the CCF should have a signature indicating how long the Header Section is or where the Encoded Section begins. For recursive compression or decompression, another Compression Bit could also be used and included in the Header Section or as part of Header Information, indicating whether the CCF has undergone just one round of compression or more than one round of compression and this lets D know when to stop the recursive decoding. Such
implementation details are left to the designer to decide which ways are best suited to their purposes. [157] The original input code file could be processed as a whole without sub-division into classified PUs or it could be processed using PUs classified as described above or in other ways as the designer considers appropriate. Continuing with the previous discussion of PUl, PU2, PU7 and PU8, other PU3, PU4, PU5 and PU6 could be processed as one PU groups using a master LTS with 8 CU values assigned according to their ranking of frequency counts and clustering indices of each of the 8 CU values and using RTS and RA or AA or their combination as discussed previously. Which way is the best therefore depends actually on the actual data distribution of the digital information to be processed or being processed. For instance, if it is found that all the CU values are arranged in the pattern of PU8, PUl and PU6, then one could use 3 LTs, the CUV then can be re-numbered as Bit 00 (for PU8), 01 (for PUl) and 11 (for PU6), and used as the TBN for the 3 PU groups, each having 0 bit of LB A. And because PU8 and PUl could be processed in their respective special way, the Header Information therefore could be just used for and based on the data distribution of PU6. The master LTS could be just used for PU6; i.e. when C detects that the coming unit is one of PU6, i.e. using TBN signature of Bit 11, then it uses the master LTS as a sub-LTS to the main LTS 1.1.1 used on a higher level. As the PUs are of fixed size, C knows when to stop encoding for the current PU and goes on to find out the next PU belonging to which PU group and write the TBN signature of the next PU to the encoded code. D does the decoding likewise.
[158] The following is the steps of encoding a digital input code file without sub-division into PUs by C assuming there is a DP stage done by A or C itself:
(1) Collecting statistics about the data distribution of the digital information to be processed from Analyzer or from the digital information itself for use;
(2) Writing in the Header Section to the Compressed Code File, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], if such Header Information is not made available to Decompressor in other ways;
(3) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished;
(4) Encoding CU values one by one and writing the encoded code one after another following the Header Section according to the Header Information and any or any combination of the techniques chosen, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; and
(5) Upon finishing encoding all the Compression Units, appending the Remaining Bits, if any as indicated by step (3) above to the Encoded Code Section to form the Remaining Bits Section.
[159] The following is the steps of decoding by D a Compressed Code File without subdivision into PUs assuming there is a DP stage done by A or C itself:
(1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing;
(2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit;
(3) Decoding the encoded code read from the Encoded Section of the Compressed Code File one by one and writing the decoded code one after another into the Decompressed Code File according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out-Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering;
(4) Upon finishing decoding all the encoded code, appending the Remaining Bits, if any as indicated by step (2) above to the Decompressed Code File; and
(5) Repeating step (1).
The following is the steps of encoding a digital input code file with sub-division into PUs by C assuming there is a DP stage done by A or C itself:
(1) Collecting statistics about the data distribution of the digital information to be processed from Analyzer or from the digital information itself for use;
(2) Writing in the Header Section to the Compressed Code File, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], if such Header Information is not made available to Decompressor in other ways;
(3) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished;
(4) Encoding and writing the PUs one by one after the Header Section in the following steps:
(a) Reading in the PU;
(b) Writing a CUV for it;
(c) Encoding all the CU values of the PU in one slot according to the technique(s) designed for processing the PU of the PU group as indicated by the CUV and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table
Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering;
(d) Writing the encoded PU as a whole into the Compressed Code File; and
(e) Repeating steps (a) to (d) until there is no more whole unit of PU; and
(5) Upon finishing encoding all the PUs, appending the Remaining Bits, if any as indicated by step (3) above to the Encoded Code Section to form the Remaining Bits Section.
[161] The following is the steps of decoding by D a Compressed Code File with sub-division into PUs assuming there is a DP stage done by A or C itself:
(1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing;
(2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit;
(3) Decoding the encoded code of PUs read from the Encoded Section of the Compressed Code File one PU by one PU according to the technique(s) designed for processing the PU of the PU group as indicated by the CUV and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; and writing the decoded code of one PU after another into the Decompressed Code File;
(4) Upon finishing decoding all the encoded codes of all the PUs, appending the
Remaining Bits, if any as indicated by step (2) above to the Decompressed Code File; and
(5) Repeating step (1).
[162] The following is the steps of encoding a digital input code file without sub-division into PUs by C assuming there is no DP stage done by A or C itself before the digital input code file is processed:
(1) Pre-selecting a LTS and determining the Header Section details to be used;
(2) Determining the rules of dynamic updating of the LTS and the assignment of CU values, such as the updating period, for instance, updating the LTS whenever there is a change of the ranking of the CU values or a change of the number of CU values which affects the number of LTs used to accommodate them or updating the LTS and the assignment of CU values after reading a number of CU values, such as like the case in PU2048 as described in Paragraph [150];
(3) Writing in the Header Section to a temporary space in random access memory or a non-volatile storage, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], as well as the LTS changes, changes of CU values assignment reflecting the incremental changes of CU values swapped-in or swapped out as part of the Header Section;
(4) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished;
(5) Encoding CU values one by one and writing the encoded code one after another to a temporary space in random access memory or a non-volatile storage according to the Header Information and any or any combination of the techniques chosen, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering;
(6) Upon finishing encoding all the Compression Units, appending the Remaining Bits, if any as indicated by step (4) above to the Encoded Code Section to form the Remaining Bits Section;
(7) Fixing the Header Section, including adjusting the provision of space for
accommodating the incremental updates of CU values or the incremental updates of LTS and the associated assignment of CU values; and
(8) Writing the fixed Header Section, the Encoded Code Section and the Remaining Bits Section to form a Compressed Code File.
[163] The following is the steps of decoding by D a Compressed Code File without subdivision into PUs assuming there is no DP stage done by A or C itself before the digital input code file is processed:
(1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing;
(2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit;
(3) Decoding the encoded code read from the Encoded Section of the Compressed Code File one by one and writing the decoded code one after another into the Decompressed Code File according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out-Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering;
(4) Upon finishing decoding all the encoded code, appending the Remaining Bits, if any as indicated by step (2) above to the Decompressed Code File; and
(5) Repeating step (1).
[164] The following is the steps of encoding a digital input code file with sub-division into PUs by C assuming there is no DP stage done by A or C itself:
(1) Pre-selecting a LTS and determining the Header Section details to be used; (2) Determining the rules of dynamic updating of the LTS and the assignment of CU values, such as the updating period, for instance, updating the LTS whenever there is a change of the ranking of the CU values or a change of the number of CU values which affects the number of LTs used to accommodate them or updating the LTS and the assignment of CU values after reading a number of CU values, such as like the case in PU2048 as described in Paragraph [150];
(3) Writing in the Header Section to a temporary space in random access memory or a non-volatile storage, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], as well as the LTS changes, changes of CU values assignment reflecting the incremental changes of CU values swapped-in or swapped out as part of the Header Section;
(4) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished;
(5) Encoding and writing the PUs one by one to a temporary space in random access memory or a non-volatile storage in the following steps:
(a) Reading in the PU;
(b) Writing a CUV for it;
(c) Encoding all the CU values of the PU in one slot according to the technique(s) designed for processing the PU of the PU group as indicated by the CUV and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table
Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering;
(d) Writing the encoded PU as a whole into the temporary space mentioned above; and
(e) Repeating steps (a) to (d) until there is no more whole unit of PU;
(6) Upon finishing encoding all the PUs, appending the Remaining Bits, if any as indicated by step (4) above to the Encoded Code Section to form the Remaining Bits Section;
(7) Fixing the Header Section, including adjusting the provision of space for
accomodating the incremental updates of CU values or the incremental updates of LTS and the associated assignment of CU values; and
(8) Writing the fixed Header Section, the Encoded Code Section and the Remaining Bits Section to form a Compressed Code File.
The following is the steps of decoding by D a Compressed Code File with sub-division into PUs assuming there is no DP stage done by A or C itself:
(1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information as described in Paragraph [152] and [156], if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing; (2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit;
(3) Decoding the encoded code of PUs read from the Encoded Section of the Compressed Code File one PU by one PU according to the technique(s) designed for processing the PU of the PU group as indicated by the CUV and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; and writing the decoded code of one PU after another into the Decompressed Code File;
(4) Upon finishing decoding all the encoded codes of all the PUs, appending the
Remaining Bits, if any as indicated by step (2) above to the Decompressed Code File ; and
(5) Repeating step (1).
In summary, the method and schema revealed in the present invention is characterized by:
(1) A Location Table Structure used for making compression of digital data, characterized by:
(a) Having Location Table(s), including Sub-Location Table(s) if any;
(b) Location Table(s) including Sub-Location Table(s) having addresses, which are made up of two portions, one portion being Table Binary Number (TBN), including Sub-Table Binary Number (STBN) and another portion being Location Binary Address (LB A); these two portions, either or both of which could be omitted upon compression encoding and interpreted as such upon decompression decoding (this being the case when using RTS and RA, the TBN is usually replaced by the Switching Bit and the relative TBN, or by the Staying Bit, so the TBN is omitted in this scenario; whereas when a LT has only 1 CU value assigned to it, then only the TBN for it is enough for such assignment and when starting the encoding and the CU value being assignment happens to be the first value to be encoded, its Absolute Address, i.e. its Absolute TBN has to be used and because there is no LB A for this value, therefore its LB A is omitted; furthermore when this CU value comes up again, one next to another, then the Staying Bit is enough for representing the TBN and the LBA of it, so both its corresponding TBN and LBA could be omitted), forming Table Location Binary Address (TLB A), which is expressed in the form of TBN:LBA where the TBN could be further expressed in the form of TBN: STBN and one or more :STBN are to be added after TBN: STBN when one or more levels of Sub-Location Table(s) are used;
(c) TBN, STBN, LBA and TLB A of (b) above being represented by binary bit(s) of Bit 0 and Bit 1 separated by colon(s), which are to be omitted upon compression encoding and interpreted as such upon decompression decoding; and
(d) TLBA of (b) above being assigned with values of a Compression Unit (CU) used in compression encoding and decompression decoding where the CU values are represented in the form of binary bit(s) of Bit 0 or Bit 1.
(2) A method comprising the step of using the technique of Relative Table Switching and Relative Addressing for compressing or decompressing digital data using Location Table Structure.
(3) A method comprising the step of using the technique of Absolute Table Switching and Absolute Addressing for compressing or decompressing digital data using Location Table Structure.
(4) A method comprising the step of using the technique of Relative Table Switching and Relative Addressing combined with Absolute Addressing for compressing or
decompressing digital data using Location Table Structure.
(5) A method comprising the step of using the technique of Address Branching for compressing or decompressing digital data using Location Table Structure.
(6) A method comprising the step of using the technique of In-Table Switching and Out- Table Switching for compressing or decompressing digital data using Location Table Structure.
(7) A method comprising the step of using the technique of Dynamic Adaptive Location Table Restructuring for compressing or decompressing digital data using Location Table Structure.
(8) A method comprising the step of using the technique of Data Distribution Alteration for compressing or decompressing digital data using Location Table Structure.
(9) A method comprising the step of using the technique of Rank Code Pair Numbering for compressing or decompressing digital data using Location Table Structure.
(10) A method comprising the step of using Compression Unit for compressing or decompressing digital data using Location Table Structure.
(11) A method comprising the step of using Processing Unit(s) for compressing or decompressing digital data using Location Table Structure.
(12) A method comprising the step of using Compression Unit Value Signature for grouping Processing Unit(s) for compressing or decompressing digital data using
Location Table Structure.
(13) A method, with the use of Location Table Structure, comprising the step of compressing or decompressing the whole digital information as a whole single code input without sub-division into Processing Units.
(14) A method comprising the step of using TLB A as special signature for special processing required for compressing or decompressing digital data using Location Table Structure.
(15) A method comprising the step of using Header Section containing Header
Information as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
(16) A method comprising the step of using Header Section containing Remaining Bits Unit as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
(17) A method comprising the step of using Header Section containing Compression Bit as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
(18) A method comprising the step of using Analyzer for analyzing and collecting statistics about the data distribution of the digital information to be processed or being processed for use in compressing or decompressing digital data using Location Table Structure.
(19) A method comprising the step of using Compressor for analyzing and collecting statistics about the data distribution of the digital information to be processed or being processed for use in compressing or decompressing digital data using Location Table Structure.
(20) A method comprising the step of using Compressor for compressing digital data using Location Table Structure.
(21) A method comprising the step of using Decompressor for decompressing digital data using Location Table Structure.
Industrial Applicability
[167] It could be seen from above that the schema of using LTS in making compression is very flexible and could be tailor-made for different data distribution of digital information. The use of RTS and RA with LTS has its intrinsic superiority in addition to capitalizing on the uneven data distribution in making compression. So the ability of making compression of random evenly distributed binary number on fly makes it all the more useful when such need and situation arises. Other techniques of special processing described further enhance the schema in processing data of special patterns. This illustrates the flexible and accommodating nature of the schema and techniques revealed in the present invention that suits to making lossless compression of all types of digital data. Since compression could be made recursive again and again until further round of compression either expands the data or breaks even in bit usage with the CCF of the previous round, this helps squeezing out every savable bit. Because the society of the whole world now enters into a new age of Information Explosion, it could easily be seen that effective, flexible and easily implemented compression schema, techniques and methods as revealed in the present invention could serve industrial applications in various aspects.
[168] The prior art for the implementation of this invention includes computer languages and compilers for making executable code and operating systems as well as the related knowledge for making application or programs; the hardware of any device(s), whether networked or standalone, including computer system(s) or computer- controlled device(s) or operating- system-controlled device(s) or system(s), capable of running executable code; and computer-executable or operating-system-executable instructions or programs that help perform the steps for the method of this invention. In combination with the use of the technical features contained in the prior art stated above, this invention makes possible the implementation of a Location Table Structure for the compression of digital information, including digital data and digital executable codes; and in this relation, is characterized by the following claims:
Sequence List Text

Claims

Claims [1] A Location Table Structure used for making compression of digital data, characterized by: (1) Having Location Table(s), including Sub-Location Table(s) if any; (2) Location Table(s) including Sub-Location Table(s) having addresses, which are made up of two portions, one portion being Table Binary Number (TBN), including Sub-Table Binary Number (STBN) and another portion being Location Binary Address (LB A); these two portions, either or both of which could be omitted upon compression encoding and interpreted as such upon decompression decoding, forming Table Location Binary Address (TLBA), which is expressed in the form of TBN: LB A where the TBN could be further expressed in the form of TBN: STBN and one or more :STBN are to be added after TBN: STBN when one or more levels of Sub-Location Table(s) are used; (3) TBN, STBN, LBA and TLBA of (2) above being represented by binary bit(s) of Bit 0 and Bit 1 separated by colon(s), which is/are to be omitted upon compression encoding and interpreted as such upon decompression decoding; and (4) TLBA of (2) above being assigned with values of a Compression Unit (CU) used in compression encoding and decompression decoding where the CU values are represented in the form of binary bit(s) of Bit 0 or Bit 1. [2] A method comprising steps of encoding a digital input code file without sub-division into Processing Units by Compressor assuming there is a Data Parsing stage done by Analyzer or Compressor itself: (1) Collecting statistics about the data distribution of the digital information to be processed from Analyzer or from the digital information itself for use; (2) Writing in the Header Section to the Compressed Code File, including the Remaining Bits Unit, and the Header Information, if such Header Information is not made available to Decompressor in other ways; (3) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished; (4) Encoding Compression Unit values one by one and writing the encoded code one after another following the Header Section according to the Header Information and any or any combination of the techniques chosen, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out-Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; and (5) Upon finishing encoding all the Compression Units, appending the Remaining Bits, if any as indicated by step (3) above to the Encoded Code Section to form the Remaining Bits Section. A method comprising steps of decoding by Decompressor a Compressed Code File without sub-division into Processing Units assuming there is a Data Parsing stage done by Analyzer or Compressor itself: (1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information, if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing; (2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit; (3) Decoding the encoded code read from the Encoded Section of the Compressed Code File one by one and writing the decoded code one after another into the Decompressed Code File according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; (4) Upon finishing decoding all the encoded code, appending the Remaining Bits, if any as indicated by step (2) above to the Decompressed Code File; and (5) Repeating step (1). A method comprising steps of encoding a digital input code file with sub-division into Processing Units by Compressor assuming there is a Data Parsing stage done by Analyzer or Compressor itself: (1) Collecting statistics about the data distribution of the digital information to be processed from Analyzer or from the digital information itself for use; (2) Writing in the Header Section to the Compressed Code File, including the Remaining Bits Unit, and the Header Information, if such Header Information is not made available to Decompressor in other ways; (3) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished; (4) Encoding and writing the Processing Units one by one after the Header Section in the following steps: (a) Reading in the Processing Unit; (b) Writing a Compression Unit Value for it; (c) Encoding all the Compression Unit values of the Processing Unit in one slot according to the technique(s) designed for processing the Processing Unit of the Processing Unit group as indicated by the Compression Unit Value and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; (d) Writing the encoded Processing Unit as a whole into the Compressed Code File; and (e) Repeating steps (a) to (d) until there is no more whole unit of Processing Unit; and (5) Upon finishing encoding all the Processing Units, appending the Remaining Bits, if any as indicated by step (3) above to the Encoded Code Section to form the Remaining Bits Section. A method comprising steps of decoding by Decompressor a Compressed Code File with sub-division into Processing Units assuming there is a Data Parsing stage done by Analyzer or Compressor itself: (1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information, if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing; (2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit; (3) Decoding the encoded code of Processing Units read from the Encoded Section of the Compressed Code File one Processing Unit by one Processing Unit according to the technique(s) designed for processing the Processing Unit of the Processing Unit group as indicated by the Compression Unit Value and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out-Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; and writing the decoded code of one Processing Unit after another into the Decompressed Code File; (4) Upon finishing decoding all the encoded codes of all the Processing Units, appending the Remaining Bits, if any as indicated by step (2) above to the Decompressed Code File; and (5) Repeating step (1). A method comprising steps of encoding a digital input code file without sub-division into Processing Units by Compressor assuming there is no Data Parsing stage done by Analyzer or Compressor itself before the digital input code file is processed: (1) Pre-selecting a Location Table Structure and determining the Header Section details to be used; (2) Determining the rules of dynamic updating of the Location Table Structure and the assignment of Compression Unit values, such as the updating period, i.e. updating the Location Table Structure whenever there is a change of the ranking of the Compression Unit values or a change of the number of Compression Unit values which affects the number of Location Tables used to accommodate them or updating the Location Table Structure and the assignment of Compression Unit values after reading a number of Compression Unit values; (3) Writing in the Header Section to a temporary space in random access memory or a non-volatile storage, including the Remaining Bits Unit, and the Header Information, as well as the Location Table Structure changes, changes of Compression Unit values assignment reflecting the incremental changes of Compression Unit values swapped-in or swapped out as part of the Header Section; (4) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished; (5) Encoding Compression Unit values one by one and writing the encoded code one after another to a temporary space in random access memory or a non-volatile storage according to the Header Information and any or any combination of the techniques chosen, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out-Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; (6) Upon finishing encoding all the Compression Units, appending the Remaining Bits, if any as indicated by step (4) above to the Encoded Code Section to form the Remaining Bits Section; (7) Fixing the Header Section, including adjusting the provision of space for accommodating the incremental updates of Compression Unit values or the incremental updates of Location Table Structure and the associated assignment of Compression Unit values; and (8) Writing the fixed Header Section, the Encoded Code Section and the Remaining Bits Section to form a Compressed Code File. A method comprising steps of decoding by Decompressor a Compressed Code File without sub-division into Processing Units assuming there is no Data Parsing stage done by Analyzer or Compressor itself before the digital input code file is processed: (1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information, if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing; (2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit; (3) Decoding the encoded code read from the Encoded Section of the Compressed Code File one by one and writing the decoded code one after another into the Decompressed Code File according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering; (4) Upon finishing decoding all the encoded code, appending the Remaining Bits, if any as indicated by step (2) above to the Decompressed Code File; and (5) Repeating step (1). [8] A method comprising steps of encoding a digital input code file with sub-division into Processing Units by Compressor assuming there is no Data Parsing stage done by Analyzer or Compressor itself:
(1) Pre-selecting a Location Table Structure and determining the Header Section details to be used;
(2) Determining the rules of dynamic updating of the Location Table Structure and the assignment of Compression Unit values, such as the updating period, i.e. updating the Location Table Structure whenever there is a change of the ranking of the Compression Unit values or a change of the number of Compression Unit values which affects the number of Location Tables used to accommodate them or updating the Location Table Structure and the assignment of Compression Unit values after reading a number of Compression Unit values;
(3) Writing in the Header Section to a temporary space in random access memory or a non-volatile storage, including the Remaining Bits Unit, and the Header Information, as well as the Location Table Structure changes, changes of Compression Unit values assignment reflecting the incremental changes of Compression Unit values swapped-in or swapped out as part of the Header Section;
(4) Reading the Remaining Bits Unit to set aside the Remaining Bits not to be encoded and to be appended to the Encoded Section after encoding is finished;
(5) Encoding and writing the Processing Units one by one to a temporary space in random access memory or a non-volatile storage in the following steps:
(a) Reading in the Processing Unit; (b) Writing a Compression Unit Value for it;
(c) Encoding all the Compression Unit values of the Processing Unit in one slot according to the technique(s) designed for processing the Processing Unit of the Processing Unit group as indicated by the Compression Unit Value and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out- Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution Alteration, Rank Code Pair Numbering;
(d) Writing the encoded Processing Unit as a whole into the temporary space mentioned above; and
(e) Repeating steps (a) to (d) until there is no more whole unit of Processing Unit;
(6) Upon finishing encoding all the Processing Units, appending the Remaining Bits, if any as indicated by step (4) above to the Encoded Code Section to form the Remaining Bits Section;
(7) Fixing the Header Section, including adjusting the provision of space for accommodating the incremental updates of Compression Unit values or the incremental updates of Location Table Structure and the associated assignment of Compression Unit values; and
(8) Writing the fixed Header Section, the Encoded Code Section and the Remaining Bits Section to form a Compressed Code File.
[9] A method comprising steps of decoding by Decompressor a Compressed Code File with sub-division into Processing Units assuming there is no Data Parsing stage done by Analyzer or Compressor itself:
(1) Verifying the Header Section and if it is found to be a valid Header Section, reading the Header Section of the Compressed Code File, including the Remaining Bits Unit, and the Header Information, if such Header Information is not made available to Decompressor in other ways; and if no valid Header Section is found, stopping decoding and any further processing;
(2) Setting aside the Remaining Bits not to be decoded and to be appended to the decoded code after decoding is finished if there is any Remaining Bit;
(3) Decoding the encoded code of Processing Units read from the Encoded Section of the Compressed Code File one Processing Unit by one Processing Unit according to the technique(s) designed for processing the PU of the PU group as indicated by the Compression Unit Value and according to the Header Information of the Compressed Code File and any or any combination of the techniques used as so indicated, which include Relative Table Switching and Relative Addressing, Absolute Table Switching and Absolute Addressing, Relative Table Switching and Relative Addressing combined with Absolute Addressing, Address Branching, In-Table Switching and Out-Table Switching, Dynamic Adaptive Location Table Restructuring, Data Distribution
Alteration, Rank Code Pair Numbering; and writing the decoded code of one Processing Unit after another into the Decompressed Code File;
(4) Upon finishing decoding all the encoded codes of all the Processing Units, appending the Remaining Bits, if any as indicated by step (2) above to the
Decompressed Code File; and
(5) Repeating step (1).
[10] A method comprising the step of using the technique of Relative Table Switching and Relative Addressing for compressing or decompressing digital data using Location Table Structure.
[11] A method comprising the step of using the technique of Absolute Table Switching and Absolute Addressing for compressing or decompressing digital data using Location Table Structure.
[12] A method comprising the step of using the technique of Relative Table Switching and Relative Addressing combined with Absolute Addressing for compressing or decompressing digital data using Location Table Structure.
[13] A method comprising the step of using the technique of Address Branching for compressing or decompressing digital data using Location Table Structure.
[14] A method comprising the step of using the technique of In-Table Switching and
Out- Table Switching for compressing or decompressing digital data using Location Table Structure.
[15] A method comprising the step of using the technique of Dynamic Adaptive
Location Table Restructuring for compressing or decompressing digital data using Location Table Structure.
[16] A method comprising the step of using the technique of Data Distribution
Alteration for compressing or decompressing digital data using Location Table Structure.
[17] A method comprising the step of using the technique of Rank Code Pair
Numbering for compressing or decompressing digital data using Location Table Structure.
[18] A method comprising the step of using Compression Unit for compressing or decompressing digital data using Location Table Structure.
[19] A method comprising the step of using Processing Unit(s) for compressing or decompressing digital data using Location Table Structure.
[20] A method comprising the step of using Compression Unit Value Signature for grouping Processing Unit(s) for compressing or decompressing digital data using Location Table Structure.
[21] A method, with the use of Location Table Structure, comprising the step of
compressing or decompressing the whole digital information as a whole single code input without sub-division into Processing Units.
[22] A method comprising the step of using TLBA as special signature for special processing required for compressing or decompressing digital data using Location Table Structure.
[23] A method comprising the step of using Header Section containing Header
Information as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
[24] A method comprising the step of using Header Section containing Remaining Bits
Unit as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
[25] A method comprising the step of using Header Section containing Compression
Bit as part of the Compressed Code File for compressing or decompressing digital data using Location Table Structure.
[26] A method comprising the step of using Analyzer for analyzing and collecting statistics about the data distribution of the digital information to be processed or being processed for use in compressing or decompressing digital data using Location Table Structure.
[27] A method comprising the step of using Compressor for analyzing and collecting statistics about the data distribution of the digital information to be processed or being processed for use in compressing or decompressing digital data using Location Table Structure.
[28] A method comprising the step of using Compressor for compressing digital data using Location Table Structure.
[29] A method comprising the step of using Decompressor for decompressing digital data using Location Table Structure.
PCT/IB2015/056562 2015-08-29 2015-08-29 Compression code and method by location WO2017037502A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2015/056562 WO2017037502A1 (en) 2015-08-29 2015-08-29 Compression code and method by location

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2015/056562 WO2017037502A1 (en) 2015-08-29 2015-08-29 Compression code and method by location

Publications (1)

Publication Number Publication Date
WO2017037502A1 true WO2017037502A1 (en) 2017-03-09

Family

ID=58188462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2015/056562 WO2017037502A1 (en) 2015-08-29 2015-08-29 Compression code and method by location

Country Status (1)

Country Link
WO (1) WO2017037502A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040175049A1 (en) * 2003-03-04 2004-09-09 Matsushita Electric Industrial Co., Ltd Moving image coding method and apparatus
US20100118947A1 (en) * 2007-04-04 2010-05-13 Nxp B.V. Decoder for selectively decoding predetermined data units from a coded bit stream
CN103685589A (en) * 2012-09-07 2014-03-26 中国科学院计算机网络信息中心 Binary coding-based domain name system (DNS) data compression and decompression methods and systems
CN104009984A (en) * 2014-05-15 2014-08-27 清华大学 Network flow index retrieving and compressing method based on inverted list
CN104021121A (en) * 2013-02-28 2014-09-03 北京四维图新科技股份有限公司 Method, device and server for compressing text data
CN104378119A (en) * 2014-12-09 2015-02-25 西安电子科技大学 Quick lossless compression method for file system data of embedded equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040175049A1 (en) * 2003-03-04 2004-09-09 Matsushita Electric Industrial Co., Ltd Moving image coding method and apparatus
US20100118947A1 (en) * 2007-04-04 2010-05-13 Nxp B.V. Decoder for selectively decoding predetermined data units from a coded bit stream
CN103685589A (en) * 2012-09-07 2014-03-26 中国科学院计算机网络信息中心 Binary coding-based domain name system (DNS) data compression and decompression methods and systems
CN104021121A (en) * 2013-02-28 2014-09-03 北京四维图新科技股份有限公司 Method, device and server for compressing text data
CN104009984A (en) * 2014-05-15 2014-08-27 清华大学 Network flow index retrieving and compressing method based on inverted list
CN104378119A (en) * 2014-12-09 2015-02-25 西安电子科技大学 Quick lossless compression method for file system data of embedded equipment

Similar Documents

Publication Publication Date Title
KR100894002B1 (en) Device and data method for selective compression and decompression and data format for compressed data
US6885319B2 (en) System and method for generating optimally compressed data from a plurality of data compression/decompression engines implementing different data compression algorithms
CN106027062B (en) Hardware data compressor for directly huffman coding output tags from LZ77 engine
JP7372347B2 (en) Data compression methods and computing devices
JPH07283739A (en) Method and device to compress and extend data of short block
JPH09275349A (en) Data compressor and data decoder
US20190052284A1 (en) Data compression apparatus, data decompression apparatus, data compression program, data decompression program, data compression method, and data decompression method
US20220368345A1 (en) Hardware Implementable Data Compression/Decompression Algorithm
JP2022160484A (en) digital lensing
US9479195B2 (en) Non-transitory computer-readable recording medium, compression method, decompression method, compression device, and decompression device
JP6003059B2 (en) Image processing apparatus, image processing method, and image forming apparatus
US11899934B2 (en) Compression device, compression and decompression device, and memory system
CN108880559B (en) Data compression method, data decompression method, compression equipment and decompression equipment
US6748520B1 (en) System and method for compressing and decompressing a binary code image
US20080263418A1 (en) System and Method for Adaptive Nonlinear Test Vector Compression
US11309909B2 (en) Compression device, decompression device, and method
US11196443B2 (en) Data compressor, data decompressor, and data compression/decompression system
WO2017037502A1 (en) Compression code and method by location
US9219496B1 (en) Efficient lossless data compression system, data compressor, and method therefor
US11119702B1 (en) Apparatus for processing received data
US11818376B2 (en) Memory system
Philip et al. LiBek II: A novel compression architecture using adaptive dictionary
JP5200854B2 (en) Encoding device, decoding device, and image processing system
CN109698704B (en) Comparative gene sequencing data decompression method, system and computer readable medium
CN117560013A (en) Data compression method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15902867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15902867

Country of ref document: EP

Kind code of ref document: A1