METHOD OF ENCODING BINARY DATA
FIELD OF THE INVENTION The present invention relates to the encoding of binary data for transfer between computers connected via a network.
BACKGROUND OF THE INVENTION
While computer networks, such as local area networks, have existed for many years, the problem of encoding binary data for transmission between client computers has become a widespread global consideration since the advent of the Internet and e-mail.
By way of introduction, it is known that a single byte contains up to 8 bits of information, each of which has a value between 0 and 255, as per the IBM EBCDIC (Extended Binary Coded Decimal Interchange Code). It is further known that, in the ASCII system, each of the 128 characters has a numerical code from 0-127, such that, for example, the letter "B" has a value of 66, the letter "D" has a value of 68, and so on.
Additional characters may be assigned, in an 8 bit system, predetermined numerical values from 128 to 255. However, as some data receiving protocols are based on a 7 bit data system, 8 bit data must be divided into two bytes.
A problem stemming from transmission of data between computers is caused by the fact that all computer files which contain, other than the. standard 95 ASCII characters, text, image or sound data, must be not only encoded so as to render them transmittable, but also decodable by software resident on recipient computers which may have different operating systems employing different compilers. Examples of some of the different systems used are DOS and Windows 95/98, which are used by PC's, as well as Unix and those employed by Macintosh computers.
The solutions which exist for preparing binary data for transmission over the Internet all entail the use of encoding routines which, due to the transformation of the data into a decodable binary format, expand the data. The expanded data also requires a certain transmission time corresponding, inter alia, to the volume of transmitted data. Clearly, a certain time is also taken to encode the data at the computer of origin, and to decode the data at the recipient computer, prior to rendering the data accessible thereat.
Currently, two encoding systems are used for facilitating binary data transmission,
3 to 4 byte systems (herein referred to as "3to4"), such as the so-called UUencode, "XX,"
"MIME64," and "BinHex," wherein 3 bytes are encoded into a 4 byte form, so as to have an expansion ratio of approximately 33% in the volume of data; and the "BtoA" system, wherein 4 bytes are encoded into a 5 byte form, so as to have an expansion ratio of only 25%.
In conventional communications systems, both calculation time and transmission time are factors governing the overall encode-to-decode time (where data is encoded at a first station and decoding is performed at a second station, remote from the first station). Due to the relative slowness of many public networks, however, via which many consumers access the Internet, and send electronic mail, the expansion ratio is of great significance. Clearly, the smaller the expansion ratio, the less time it will take to transmit encoded data. In known systems, the smallest expansion ratio is 25%, for the BtoA system, described above.
SUMMARY OF THE INVENTION
The present invention seeks to provide an improved method of encoding binary data for transmission between two or more computers, in which the encoded data has an expansion ratio of less than 25%, preferably 22.22 %, thereby reducing the time required for transmission of the encoded data.
There is thus provided, in accordance with a preferred embodiment of the invention, a method of encoding binary data for transfer from a computer of origin to a recipient computer, which includes the following steps:
(i) dividing a portion of unencoded data, having a predetermined number of bytes Bn into a plurality n of chunks;
(ii) mathematically processing the value of each of n-1 chunks as a function of at least a predetermined number, so as to produce a modulo MOD and an integer INT; (iii) entering a predetermined one of the MOD and INT of each of the n-1 chunks into N-x predetermined bytes of an encoded sequence having a total of N bytes, wherein x is greater than or equal to 1; and
(iv) mathematically processing the other of the MOD and INT of each of the n-1 chunks so as to produce therefor results for entry into the remaining at least one byte of the encoded sequence, wherein the ratio N to Bn is less than 1.25.
Additionally in accordance with a preferred embodiment of the present invention, the step (ii) of mathematically processing includes the step of dividing the value of at least one of the n-1 chunks by a predetermined number so as to produce a MOD whose value is equal to or greater than the predetermined number, and the method further includes the additional steps of (v) further dividing the other of the MOD and INT by the predetermined number, thereby to produce further MOD and INT, and (vi) entering the predetermined one of the further MOD and further INT into a predetermined byte of the encoded sequence; and, if the further MOD has a value which is equal to or greater than the predetermined number, repeating the step (v) of further dividing, and the step (vi) of entering, until the value of the further MOD is less than the predetermined number.
Further in accordance with a preferred embodiment of the present invention, the one or more n-1 chunks is all of the n-1 chunks.
Additionally in accordance with a preferred embodiment of the present invention, the step (iv) of mathematically processing includes the step of mathematically processing the other of the MOD and INT as a function of at least a predetermined multiplier and the predetermined number, so as to produce a modulo MOD and an integer INT.
Further in accordance with a preferred embodiment of the present invention, step (iv) includes mathematically processing the other of the MOD and INT as a function also of a predetermined subsequent chunk in the unencoded sequence.
Additionally in accordance with a preferred embodiment of the present invention, the predetermined number equals the encoding base of the present method, which, preferably is 94, such that the ratio N to Bn = 11/9.
Further in accordance with a preferred embodiment of the present invention, the predetermined number is a different, predetermined number for each of the n-1 chunks.
Additionally in accordance with a preferred embodiment of the present invention, n is greater than or equal to 3.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be more fully understood and appreciated from the following detailed description, taken in conjunction with the drawings, in which:
Figs. 1A and IB are diagrams illustrating the 9tol l encoding method of the invention, as exemplified by the Simplified Continuous routines of the invention;
Figs. 2A and 2B are diagrams illustrating the 9tol l encoding method of the invention, as exemplified by the Complex Continuous routines of the invention; and
Figs. 3A and 3B are diagrams illustrating the 9tol l encoding method of the invention, as exemplified by the Accumulator routines of the invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides an improved method of encoding binary data for transmission between two or more computers, such that the encoded data has an expansion ratio of less than 1.25. More particularly, the present invention provides a method of encoding binary data employing a 9 to 11 system, in which unencoded data has a sequence of Bn bytes, and the encoded data corresponding thereto has a sequence of N bytes, thereby to provide an expansion ratio N to Bn of 11/9 or 1.22%.
It will be appreciated that the description of 9tol l encoding is also intended to include multiples of 9tol 1, including, by way of example, 18 to 22, and 36 to 44 in which, while the amount of data being encoded at any one time is proportionately greater than in the present 9toll system, the expansion ratio remains 22.22%.
It will be particularly noted that the 9tol l expansion of the present invention, is particularly advantageous with respect to either the 3to4 or 4to5 methods, which, for a 9 byte unencoded word would result in 12 and 11.25 byte encoded words, respectively.
As will be appreciated from the following description, the encoding may be implemented by use of various novel routines, in accordance with various embodiments of the invention. However, each such routine implements the 9 to 11 method of the invention, resulting in a smaller volume of encoded data for a given unencoded data block, thus also facilitating a shorter transmission time for a given volume of unencoded data at a given rate of transmission.
The method of the present invention employs a table having 94 characters, this thus being the encoding calculation "base," as evident from the following description.
The method of the invention may employ, as described above, a plurality of different routines, which may be divided into three groups, which include:
1. Simplified Continuous Routine
2. Complex Continuous Routine
3. Accumul ator Routine
Simplified Continuous Routines
Referring now to Figs. 1 A and IB, there is depicted a method of 9tol 1 binary data encoding, in accordance with a preferred embodiment of the present invention.
The present embodiment of the invention entails dividing each 9 byte72 bit portion of unencoded data, seen at 10 in Figs. 1A and IB, into 11 chunks Cl-Cll of different sizes, seen at block 12. Preferably the chunks Cl-Cl l have 13, 6, 7, 6, 7, 6, 7, 6, 7, 6, and 1 bit, respectively.
In each of the routines, the following additional steps are performed:
1. Chunk CI is divided by a predetermined number, so as to obtain two results, integer "INT", and a modulo "MOD." As seen in Fig. 1A, in routines F and G, the predetermined number is 94, which is the base of the encoding system of the present invention. As seen in Fig. IB, however, in routine H, the predetermined number is a multiplier M, which has a value that varies as listed in TABLE I below, depending on the particular chunk.
2. A predetermined one of MOD and INT is then placed in a first byte - which may be any predetermined byte, but is exemplified here by byte 1 - of an encoded eleven byte sequence, shown as block 14 in Figs. 1A and IB. In routines F and G, the result INT is placed in byte 1 of the encoded sequence, while, in routine H, the result MOD is placed thereat.
3. A mathematical function of the other of the results is combined with the next chunk so as to obtain a further result, which is then divided by a predetermined number so as to obtain INT and MOD, as above. a) In routine F, the mathematical expression used for evaluating the further result is (INT x 2b) + Cn, wherein b is the number of bits in the next chunk, and
n is the number of the next chunk, initially C2, then C3, and so on for succeeding steps. b) In routine G, the mathematical expression used for evaluating the further result is (INT + (Cn x M)). c) In routine H, the mathematical expression used for evaluating the further result is (MOD x 2b) + Cn.
4. The further result is then divided by the predetermined number, which is either base 94 or multiplier M, as described above in step 1, so as to obtain a result having a MOD and an INT.
5. A predetermined one of MOD and INT is then placed in byte 2 of encoded eleven byte sequence 14.
Steps 3, 4 and 5 are then repeated for each of the remaining chunks C3-C11, successively, thereby to obtain an 11 byte encoded sequence.
TABLE I
Complex Continuous Routines
Referring now to Figs. 2A and 2B, there is depicted a method of 9tol 1 binary data encoding, in accordance with a further embodiment of the present invention. The present method is exemplified by routines J, K, L and M, whose basic algorithms are listed below.
It will be appreciated by persons skilled in the art, that routine K enables sorting of the encoded data, in a manner similar to that by which unencoded data may be sorted..
The present embodiment of the invention entails - as with the routines shown and described above in conjunction with Figs. 1 A and IB - dividing each 9 byte72 bit portion of unencoded data, seen at 20 in Figs. 2 A and 2B, into 3 chunks C1-C3, seen at block 22.
In routines J and K chunks CI and C2 are seen to have equal sizes, e.g. of 32 bits each, third chunk C3 having 8 bits. In routines L and M chunks CI and C2 are of different sizes, having 30 and 27 bits, respectively, third byte C3 having 15 bits.
In each of the routines J, K, L and M, the following additional steps are performed:
1. Chunk CI is divided by 94, which is the base of the encoding system of the present invention, four times, each time obtaining an integer "INT", and a modulo "MOD." As seen in Figs. 2A and 2B, each of the four MODs obtained is entered directly into a corresponding byte of the encoded sequence 12, while the INT obtained is subjected to further division by base 94, thereby to obtain a further MOD and INT, which, respectively are entered into a preselected byte of encoded sequence 12, and sent for further processing.
2. The last (fourth) INT obtained from CI is mathematically combined, as described in greater detail below, with the second chunk C2, so as to provide a new "accumulated value" of C2, referred to below also as ValAccC2.
3. ValAccC2 is then divided by base 94 repeatedly, in a similar manner to that described above for CI, so as to obtain several more INTs and MODs, which are employed as above for the INTs and MODs derived from CI .
4. The last INT obtained from ValAccC2 (which is, for example, the fourth in routines
J and L, and the fifth in routines K and M) is mathematically combined, with the third chunk C3, so as to provide a new "accumulated value" of C3, referred to below also as ValAccC3. a. In routines J and L, the mathematical expression used to obtain ValAccC2 is C2 + (CI x 2b2; and ValAccC3 is obtained by evaluating the term C3 + (ValAccC2 x 2b3),
In which b2 is the number of bits in chunk C2, and b3 is the number of bits in chunk C3.
b. In routines K and M, however, the mathematical expression used to obtain
ValAccC2 is (C2 x M2) + CI; and ValAccC3 is obtained by evaluating the term (C3 x M3) + ValAccC2,
In which M2 and M3 are predetermined multipliers for which,
In routine K, M2 = 56 and M3 = 33, and
In routine M, M2 = 14 and M3 = 25.
alAccC2 is then divided by base 94 one or more times, in a similar manner to that described above for ValAccC2, so as to obtain one or more INTs and MODs, the final INT being entered into the final byte of encoded sequence 12.
The algorithms for routines J, K, L and M, are as follows:
ROUTINE J
STEP 1 : Resultl = MOD[Cl/94] (note: place in first byte of 11 byte encoded sequence) and Cl = INT[Cl/94]
STEP 2: Result2 = MOD[Cl/94] (note: place in second byte of 11 byte encoded sequence) and Cl = INT[Cl/94]
STEP 3: Result3 = MOD[Cl/94] and CI = INT[Cl/94]
STEP 4: Result4 = MOD[Cl/94] and CI = INT[Cl/94] and C2 = C2 + Cl x 232 = ValAccC2
STEP 5: Result5 = MOD[ValAccC2/94] (note: place in fifth byte of 11 byte encoded sequence)
I and Val AccC2 = INT[VALACCC2/94]
STEP 6: Resultδ = MOD[ValAccC2/94] and ValAccC2 = INT[ValAccC2/94]
STEP 7: Result7 = MOD[ValAccC2/94] and ValAccC2 = INT[ValAccC2/94]
STEP 8: Resultδ = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94]
STEP 9: Result9 = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94] and C3 = C3 + NalAccC2 x 28 = NalAccC3
STEP 10: ResultlO = MOD[NalAccC3/94] (note: place in tenth byte of 11 byte encoded sequence) and NalAccC3 - IΝT[NalAccC3/94]
STEP 11 : Resultl 1 = Nal AccC3
End
ROUTINE K
STEP 1 : Resultl = MOD[C 1/94] and CI = INT[Cl/94]
STEP 2: Result2 = MOD[Cl/94] and CI = INT[Cl/94]
STEP 3: Result3 = MOD[Cl/94] and CI = INT[Cl/94]
STEP 4: Result4 = MOD[Cl/94] and CI = INT[Cl/94] and C2 = C2 x 56 + Cl = ValAccC2
STEP 5: Result5 = MOD[ValAccC2/94] and ValAccC2 = INT[ValAccC2/94]
STEP 6: Result6 = MOD[NalAccC2/94] and ValAccC2 = INT[ValAccC2/94]
STEP 7: Result7 = MOD[ValAccC2/94] and ValAccC2 = INT[NalAccC2/94]
STEP 8: Resultδ = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94]
STEP 9: Result9 = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94] and C3 = C3 x 33 + NalAccC2 = NalAccC3
STEP 10: ResultlO = MOD[NalAccC3/94] and NalAccC3 = IΝT[NalAccC3/94]
STEP 11 : Resultl 1 = Nal AccC3
End
ROUTINE L
STEP 1: Resultl = MOD[Cl/94] and CI = INT[Cl/94]
STEP 2: Result2 = MOD[Cl/94] and CI = INT[Cl/94]
STEP 3 : Result3 = MOD[Cl/94] and CI - INT[Cl/94]
STEP 4: Result4 = MOD[Cl/94] and CI = INT[Cl/94] and C2 = C2 + C 1 x 227 = Nal AccC2.
STEP 5: Result5 = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94]
STEP 6: Resultό = MOD[ValAccC2/94] and NalAccC2 = IΝT[NalAccC2/94]
STEP 7: Result7 = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94]
STEP 8: Resultδ = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94] and C3 = C3 + Nal AccC2 x 215 = Nal AccC3
STEP 9: Result9 = MOD[NalAccC3/94] and NalAccC3 = IΝT[NalAccC3/94]
STEP 10: ResultlO = MOD[NalAccC3/94] and NalAccC3 = IΝT[Val AccC3/94]
STEP 11 : Resultl 1 = Val AccC3
End
ROUTINE M
STEP 1 : Resultl = MOD[Cl/94] and CI = INT[Cl/94]
STEP 2: Result2 - MOD[Cl/94] and CI = INT[Cl/94]
STEP 3: Result3 = MOD[Cl/94] (and CI = INTfCl/94]
STEP 4: Result4 = MOD[Cl/94] and CI = INT[Cl/94] and C2 = C 1 + (C2 x 14) = Val AccC2
STEP 5: Result5 = MOD[ValAccC2/94] and ValAccC2 = INT[ValAccC2/94]
STEP 6: Resultδ = MOD[ValAccC2/94] and Val AccC2 = INT[ValAccC2/94]
STEP 7: Result7 = MOD[ValAccC2/94]
and NalAccC2 = IΝT[Nal AccC2/94]
STEP 8: Resultδ = MOD[NalAccC2/94] and NalAccC2 = IΝT[NalAccC2/94] and C3 = NalAccC2 + (C3 x 25) = NalAccC3
STEP 9: Result9 = MOD[NalAccC3/94] (ΝD NalAccC3 = IΝT[NalAccC3/94]
STEP 10: ResultlO = MOD[NalAccC3/94] and NalAccC3 = IΝT[NalAccC3/94]
STEP 11 : Resultl 1 = NalAccC3
End
Accumulator Routine
Referring now to Figs. 3 A and 3B, there is depicted a method of 9tol 1 binary data encoding, in accordance with a further embodiment of the present invention. The present method is exemplified by routines S, T and U, whose basic algorithms are listed below.
The present embodiment of the invention entails - as with the routines shown and described above in conjunction with Figs. 1 A - 2B - dividing each 9 byte 72 bit portion of unencoded data, seen at 30 in Figs. 3A and 3B, into a plurality of chunks. In routine S (Fig. 3 A), the unencoded data is divided into 8 chunks C1-C8, of which the seven chunks C1-C7 are typically equal, having ten bits each; chunk C8 having two bits.
In routines T and U, there are provided three chunks only. In routine T, chunks CI and C2 are seen to have equal sizes, e.g. of 32 bits each, third chunk C3 having 8 bits. In routine U, chunks CI and C2 are also equal, having 30 bits each, and chunk C3 has 12.
In each of the routines S, T and U, the following additional steps are performed:
1. The initial value "Nal Ace" of an "accumulator" chunk, referenced 34, is fixed as the value of the last byte of the unencoded block. Thus, in routine S Nal Ace initially has the value of C3, while in routines T and U, Nal Ace is initially equal to the value of C8. Preferably, accumulator 34 is a 32 bit integer.
2. Each of the remaining chunks is then divided one or more times by the base of the present encoding system, 94. The number of times by which each chunk is divided depends on the bit size of the chunk. Each division produces a MOD, which is entered into a predetermined byte the encoded sequence; and an INT.
A. As seen in Fig. 3A, in which chunks C1-C7 are ten bit chunks, the INT is smaller than the base 94 so that no further division is required, and each INT
is mathematically processed in accumulator 34, as defined in the algorithms S, T and U, listed below. B. In Fig. 3B however, there are only two chunks remaining, CI and C2, each of which has 30 bits, as mentioned above, so as to be divisible four times by base 94, until obtaining an INT which is less than 94.
3. The last INT obtained from each of the chunks except the final chunk of the unencoded sequences, namely, C1-C7 in routine S, and chunks C1-C2 in routines T and U, is mathematically combined, with the ultimate chunk C8 or C3, (according to the routine) so as to provide a new NalAcc.
In routine S, the mathematical expression used to obtain NalAcc in the accumulator routines is L-l ∑ [CL IΝT[(Bi/b) x M]]
wherein L = number of chunks in the unencoded block, B = the value of a chunk b = base 94, and M is a predetermined multiplier for which,
In routine S, M = 11, in routine T, M = 56, and in routine U, M = 14.
4. As seen in the drawings, after the initial divisions of all but the final chunk, eight bytes of the encoded sequence are occupied by encoded data. The remaining three bytes are determined by evaluating ValAcc and then dividing it twice by the base 94, each of the MODs thereby obtained being entered into two of the remaining bytes; the final INT being entered into the final remaining byte of the encoded sequence.
Algorithms for routines S, T and U, are as follows:
ROUTINE S STEP 1 : ValAcc = C8
STEP 2: Resultl = MOD[Cl/94] (note: place in first byte of 11 byte encoded sequence) and NalAcc = (NalAcc* 11) + IΝT[C 1/94]
STEP 3 : Result2 = MOD[C2/94] and NalAcc = (NalAcc* 11) + IΝT[C2/94] STEP 4: Result3 = MOD[C3/94] and NalAcc = (NalAcc* 11) + IΝT[C3/94] STEP 5 : Result4 = MOD[C4/94] and NalAcc = (NalAcc* 11) + IΝT[C4/94] STEP 6: Result5 = MOD[C5/94] and NalAcc = (NalAcc* 11) + IΝT[C5/94] STEP 7; Resultό = MOD[C6/94] and NalAcc = (NalAcc* 11) + -ΝT[C6/94] STEP 8: Result7 = MOD[C7/94] and NalAcc = (NalAcc* 11) + IΝT[C7/94] STEP 9: Result8 = MOD[Nal Acc/94] and NalAcc = IΝT[NaiAcc/94] STEP 10: Result9 = MOD [Nal Acc/94] and NalAcc = IΝT[NalAcc/94] STEP 11 : ResultlO = MOD [Nal Acc/94] and NalAcc = IΝT[NalAcc/94] STEP 12: Resultl 1 = NalAcc End
ROUTINE T STEP 1: ValAcc = C3
STEP 2: Resultl = MOD[Cl/94] and CI = INT[Cl/94] STEP 3: Result2 = MOD[Cl/94] and CI = INT[Cl/94] STEP 4: Result3 = MOD[Cl/94] and CI = INT[Cl/94] STEP 5: Result4 = MOD[Cl/94] and CI = INT[Cl/94] and ValAcc = (ValAcc *56) + Cl STEP 6: Result5 = MOD[C2/94] and C2 = INT[C2/94] STEP 7: Resultό = MOD[C2/94] and C2 = INT[C2/94] STEP 8: Result7 = MOD[C2/94] and C2 = INT[C2/94] STEP 9: Result8 = MOD[C2/94] and C2 = INT[C2/94] and ValAcc = (ValAcc *56) + C2 STEP 10: Result9 = MOD[Val Acc/94] and ValAcc = INT[ValAcc/94] STEP 11 : ResultlO = MOD [Val Acc/94] and ValAcc = INT[Val Acc/94] STEP 12: Resultl 1 = ValAcc End
ROUTINE U STEP 1: ValAcc = C3
STEP 2: Resultl = MOD[Cl/94] and CI = INT[Cl/94]
STEP 3 : Result2 = MOD[Cl/94] and CI = INT[Cl/94]
STEP 4: Result3 = MOD[Cl/94] and CI <= INT[Cl/94]
STEP 5: Result4 = MOD[Cl/94] and CI = INT[Cl/94] and ValAcc = (ValAcc *14) + INT[Cl/94]
STEP 6: Result5 = MOD[C2/94] and C2 = INT[C2/94]
STEP 7: Result6 = MOD[C2/94] and C2 = INT[C2/94]
STEP 8: Result7 = MOD[C2/94] and C2 = INT[C2/94]
STEP 9: Result8 = MOD[C2/94] and C2 = INT[C2/94] and ValAcc = (ValAcc H4) + INT[C2/94]
STEP 10: Result9 = MOD[ValAcc] and ValAcc = INT[ValAcc/94]
STEP 11 : ResultlO = MOD[Val Acc/94] and ValAcc = INT[Val Acc/94]
STEP 12: Resultl 1 = ValAcc
End
It will be appreciated by persons skilled in the art, that the scope of the present invention is not limited by what has been shown and described hereinabove, merely by way of example. Rather, the scope of the present invention is limited solely by the claims, which follow: